2009/06/25

ORA-04031 and SGA settings

Oracle 10.2.0.2
OS RHEL 4

Recently, I got error message from Grid Control for one of our production databases.

Failed to connect to database instance: ORA-04031: unable to allocate 4120 bytes of shared memory ("large pool","unknown object","session heap","kzctxhugi1") (DBD ERROR: OCISessionBegin).

The message is pretty obvious: the large pool is not large enough for new allocations. Since it's 10g and we are using ASMM, the SGA is self-adjusted. We put 1.5G to sga_target and didn't specify a value for large_pool_size. Oracle suggests give a minimum value to large_pool_size so that Oracle won't squeeze large pool too small.

Another question I have during the research is that what's point of having both sga_target and sga_max_size.

Well, since sga_max_size is static and sga_target is dynamic, having set a larger value for sga_max_size than sga_target gives you a tuning margin for sga_target later without restarting the database. One thing we should pay attention is that on some OS, such as Windows and Linux, memory is allocated the same size as sga_max_size when the instance is started, but on some other OS, such as Sun Solaris, memory is allocated the same size as sga_target, and the other free memory (sga_max_size - sga_target) is ready for other use.

2009/06/05

Grid Control Agent Crash with 'too many open files' Error

Grid Control Agent 10.2.0.4
Database: 10.2.0.2
OS Platform: RHEL Release 4 64bit

Agent crashes a lot, intermittently with 'too many open files' error inside Grid Control. Trace file 'emagent.trc' gives 'health check' error lik this:

2008-03-24 15:24:52 Thread-4124650400 ERROR fetchlets.healthCheck: GIM-00105: file not found
2008-03-24 15:24:52 Thread-4124650400 ERROR engine: [oracle_database,,health_check] : nmeegd_GetMetricData failed : Instance Health Check initialization failed due to one of the following causes: the owner of the EM agent process is not same as the owner of the Oracle instance processes; the owner of the EM agent process is not part of the dba group; or the database version is not 10g (10.1.0.2) and above.
2008-03-24 15:24:52 Thread-4124650400 WARN collector: Error exit. Error message:Instance Health Check initialization failed due to one of the following causes: the owner of the EM agent process is not same as the owner of the Oracle instance processes; the owner of the EM agent process is not part of the dba group; or the database version is not 10g (10.1.0.2) and above

Cause: Bug 5872000 - Healthcheck Error Occurs fror 32Bit Database on 64Bit OS Due to Bug4526916 Fix.
The Healthcheck file, namely $ORACLE_HOME/dbs/hc_.cat file differs in size from the memory structure used by the Agent to read it. This file is created by the database on startup time, if not present.

This happens when the database is e.g. 10.2.0.4 and the agent is 10.2.0.3 and vice versa.

Possible solutions:

1. Apply Patch 5872000 to databases on 64-bit machine.
This needs to be applied on top of 10.1 -> 10.2.0.3, and 11.1.0.6 databases. THe file $ORACLE_HOME/dbs/hc_.dat may need to be removed before starting up the database after patch application. This file is created on database start up if not present. The agent uses this file for the Healthcheck metric. By recreating the file on start up after the patch application, the file is the correct one needed by the agent.
2.Disable the healthcheck metric per database in Grid Control.
Check 379423.1 in Metalink on 'How to edit or disable the Health Check Metric Collection in Grid Control 10.2'.

Note: If the second workaround is applied, then you have to redo it everytime you add new database target into Grid Control.

Reference Doc in Metalink:
564617.1
566607.1
379423.1
469227.1

2009/06/02

Open File Limit on Linux

Oracle 10.2.0.2
3-node RAC
RHEL AS release 4

Our application experienced performance downgrade today and user can not get response from application for around an hour. Not long after that, when we tried to log into the system, a message 'Too many open files' showed up and we were unable to login.

Searching a little bit, I found that Linux limits the number of open file handles. I recently installed Oracle Grid Control Agent on our RAC and created new database. The open file handles was increased up to the limit, and the system was unstable.

The file /proc/sys/fs/file-nr lists the number of current allocated file handles, available file handles in the allocated file handles and maximum file handles that can be opend for the whole system.
Oracle recommends that the file handles for the entire system be set to at lease 65536.
You can alter the default setting for the maximum number of file handles without rebooting the machine by making the changes directly to the /proc file system (/proc/sys/fs/file-max) using the following:

# sysctl -w fs.file-max=65536

You should then make this change permanent by inserting the kernel parameter in the /etc/sysctl.conf startup file:

# echo "fs.file-max=65536" >> /etc/sysctl.conf

The soft limit and hard limit of file handles for oracle can be configued inside /etc/security/limits.conf file.

cat >> /etc/security/limits.conf <<EOF
oracle soft mnproc 2047
oracle hard mnproc 16384
oracle soft nofile 1024
oracle hard nofile 65536

nofile is the number of open file handles.
mnproc is hte number of processes available to a single user