Zenoss | Zenoss 3.1.0 |
OS | Linux (x86_64) 2.6.18 (Linux hz1.uapps.net 2.6.18-238.9.1.el5 #1 SMP Tue Apr 12 18:10:13 EDT 2011 x86_64) |
Zope | Zope 2.12.1 |
Python | Python 2.6.2 |
Database | MySQL 5.0.77 (Ver 5.0.77) |
RRD | RRDtool 1.3.9 |
Twisted | Twisted 8.1.0 |
NetSnmp | NetSnmp 5.3.2 |
PyNetSnmp | PyNetSnmp 0.28.14 |
WMI | Wmi 1.3.13 |
So every 10 minutes during our polling for Windows services a few of our servers gives us the error "Could not read Windows services" then the next time it polls the errors clear.
I use wmic on the Zenoss box and successfully query the devices.
The devices are all random Windows servers, 2003/2003 64-bit/ 2008 64/2008 R2.
Even when the device has the error, I can still go to 'Windows Services" and monitor a service, so it appears that communication is still working.
It is random boxes across several domains.
We did performance tunning on the Zenoss box and increased the scan interval.
Also we increased the scan interval and reduced the WMI services that we monitor.
Here is the error message in Zenwin logs.
=================================================================================================
2011-07-08 07:36:08,398 DEBUG zen.collector.scheduler: Task 172.36.114.2 changing state from RUNNING to WATCHER_QUERY
2011-07-08 07:36:08,398 DEBUG zen.Watcher: Fetching events for 172.36.114.2
2011-07-08 07:36:08,401 ERROR zen.zenwin: Unable to scan device 172.36.114.2: NT code 0xc002001b
2011-07-08 07:36:08,401 DEBUG zen.Watcher: closing WMI Query for 172.36.114.2
2011-07-08 07:36:08,401 DEBUG zen.Watcher: Watcher.__del__ called for 172.36.114.2, busy=False closeRequested=False
2011-07-08 07:36:08,401 DEBUG zen.zenwin: Queueing event {'severity': 4, 'component': 'zenwin', 'agent': 'zenwin', 'summary': '\n Could not read Windows services (NT code 0xc002001b). Check your\n username/password settings and verify network connectivity.\n ', 'manager': 'hz1.uapps.net', 'device': '172.36.114.2', 'eventClass': '/Status/Wmi', 'monitor': 'localhost'}
2011-07-08 07:36:08,402 DEBUG zen.zenwin: Total of 1 queued events
2011-07-08 07:36:08,402 DEBUG zen.zenwin: Device 172.36.114.2 [172.36.114.2] scanned failed, NT code 0xc002001b
2011-07-08 07:36:08,402 DEBUG zen.collector.scheduler: Task 172.36.114.2 finished, result: <twisted.python.failure.Failure <class 'pysamba.twisted.callback.WMIFailure'>>
2011-07-08 07:36:08,402 DEBUG zen.collector.scheduler: Task 172.36.114.2 changing state from WATCHER_QUERY to IDLE
===================================================================================================
We have restarted the server, we have removed Zenpacks;
Zenpacks in use:
ZenPacks.community.DellMoncommunityEgor Puzanov2.4Yes
ZenPacks.community.VMwareDataSourcecommunityEric Enns1.1.2Yes
ZenPacks.community.VMwareESXiMonitorcommunityEric Enns1.2Yes
ZenPacks.community.WMIDataSourcecommunityEgor Puzanov2.11Yes
ZenPacks.community.WMIPerf_WindowscommunityEgor Puzanov2.5.80Yes
ZenPacks.community.deviceAdvDetailcommunityEgor Puzanov2.7Yes
ZenPacks.community.mib_browsercommunityKells Kearney & Jane Curry2.0Yes
ZenPacks.zenoss.ApacheMonitorzenossZenoss2.1.2Yes
ZenPacks.zenoss.DellMonitorzenossZenoss2.1.0Yes
ZenPacks.zenoss.DigMonitorzenossZenoss1.0.2Yes
ZenPacks.zenoss.DnsMonitorzenossZenoss2.0.2Yes
ZenPacks.zenoss.EsxTopzenossZenoss1.0.2Yes
ZenPacks.zenoss.FtpMonitorzenossZenoss1.0.2Yes
ZenPacks.zenoss.HPMonitorzenossZenoss2.1.0Yes
ZenPacks.zenoss.HttpMonitorzenossZenoss2.0.3Yes
ZenPacks.zenoss.IRCDMonitorzenossZenoss1.0.2Yes
ZenPacks.zenoss.JabberMonitorzenossZenoss1.0.2Yes
ZenPacks.zenoss.LDAPMonitorzenosszenoss1.2.3Yes
ZenPacks.zenoss.LinuxMonitorzenossZenoss1.1.5Yes
ZenPacks.zenoss.MySqlMonitorzenossZenoss2.1.2Yes
ZenPacks.zenoss.NNTPMonitorzenosszenoss1.0.2Yes
ZenPacks.zenoss.NtpMonitorzenossZenoss Team2.0.3Yes
ZenPacks.zenoss.RPCMonitorzenosszenoss1.0.2Yes
ZenPacks.zenoss.XenMonitorzenossZenoss1.0.3Yes
ZenPacks.zenoss.ZenAWSzenossZenoss1.0.3Yes
ZenPacks.zenoss.ZenJMXzenossZenoss3.5.2Yes
ZenPacks.zenoss.ZenossVirtualHostMonitorzenossZenoss2.3.6Yes
We are potentially looking at the Enterprise version of Zenoss, but if we can't fix this issue we will have to move on to another monitoring solution.
Please let me know if there are any other troubleshooting I can do...
It could possible be something in the polling that is causing the issue, but I am kinda lost on where to look next.
Thanks