2009/06/05

Grid Control Agent Crash with 'too many open files' Error

Grid Control Agent 10.2.0.4
Database: 10.2.0.2
OS Platform: RHEL Release 4 64bit

Agent crashes a lot, intermittently with 'too many open files' error inside Grid Control. Trace file 'emagent.trc' gives 'health check' error lik this:

2008-03-24 15:24:52 Thread-4124650400 ERROR fetchlets.healthCheck: GIM-00105: file not found
2008-03-24 15:24:52 Thread-4124650400 ERROR engine: [oracle_database,,health_check] : nmeegd_GetMetricData failed : Instance Health Check initialization failed due to one of the following causes: the owner of the EM agent process is not same as the owner of the Oracle instance processes; the owner of the EM agent process is not part of the dba group; or the database version is not 10g (10.1.0.2) and above.
2008-03-24 15:24:52 Thread-4124650400 WARN collector: Error exit. Error message:Instance Health Check initialization failed due to one of the following causes: the owner of the EM agent process is not same as the owner of the Oracle instance processes; the owner of the EM agent process is not part of the dba group; or the database version is not 10g (10.1.0.2) and above

Cause: Bug 5872000 - Healthcheck Error Occurs fror 32Bit Database on 64Bit OS Due to Bug4526916 Fix.
The Healthcheck file, namely $ORACLE_HOME/dbs/hc_.cat file differs in size from the memory structure used by the Agent to read it. This file is created by the database on startup time, if not present.

This happens when the database is e.g. 10.2.0.4 and the agent is 10.2.0.3 and vice versa.

Possible solutions:

1. Apply Patch 5872000 to databases on 64-bit machine.
This needs to be applied on top of 10.1 -> 10.2.0.3, and 11.1.0.6 databases. THe file $ORACLE_HOME/dbs/hc_.dat may need to be removed before starting up the database after patch application. This file is created on database start up if not present. The agent uses this file for the Healthcheck metric. By recreating the file on start up after the patch application, the file is the correct one needed by the agent.
2.Disable the healthcheck metric per database in Grid Control.
Check 379423.1 in Metalink on 'How to edit or disable the Health Check Metric Collection in Grid Control 10.2'.

Note: If the second workaround is applied, then you have to redo it everytime you add new database target into Grid Control.

Reference Doc in Metalink:
564617.1
566607.1
379423.1
469227.1

1 comment:

Unknown said...

Does this mean that if I shutdown the datbase/reboot the server and remove the $ORACLE_HOME/dbs/hc_.dat file, i would be able to start the dbconsole?