Monday, June 16, 2008

Agent use many files; oracle 10.2.0.3.0, but agent 10.2.0.4.0

ORACE 10.2.0.3.0 RAC linux
.... AGENT Oracle Grid 10.2.0.4.0

Oracle RAC rebooted, After that I check system log:

Jun 15 16:02:58 db01 last message repeated 6 times
Jun 15 16:02:58 db01 kernel: VFS: file-max limit 65536 reached
Jun 15 16:02:58 db01 last message repeated 6 times
Jun 15 16:02:58 db01 kernel: VFS: file-max limit 65536 reached
Jun 15 16:02:58 db01 last message repeated 3 times
Jun 15 16:02:58 db01 kernel: VFS: file-max limit 65536 reached
Jun 15 16:02:58 db01 last message repeated 4 times
Jun 15 16:02:58 db01 kernel: VFS: file-max limit 65536 reached
Jun 15 16:02:58 db01 last message repeated 3 times
Jun 15 16:02:58 db01 logger: Oracle CSSD shell script failure. Duplicate CSSD.
Jun 15 16:02:58 db01 kernel: md: stopping all md devices.
Jun 15 16:02:58 db01 kernel: md: md0 switched to read-only mode.
Jun 15 16:02:58 db01 kernel: VFS: file-max limit 65536 reached
-----------------------------------------------------------------

I don't know something wrong [who had used many files......], So
I have to check on other nodes.

check open files in system by root user:

# lsof | awk '{print $2}' | sort -n | uniq -c | sort -n
.
.
.
64 28513
67 25226
45225 17956

# ps -aef | grep 17956

oracle 17956 17938 0 16:12 pts/2 00:00:03 /oracle/.../bin/emagent


Actually, that can check file use by:

$ cat /proc/sys/fs/file-nr

63345 0 65536


So, I restarted [Oracle Enterprise Manager 10g Release 4 Grid Control ] Agent:

$ /oracle/.../bin/emctl stop agent
Oracle Enterprise Manager 10g Release 4 Grid Control 10.2.0.4.0. Copyright (c) 1996, 2007 Oracle Corporation. All rights reserved.Stopping agent .... stopped.

$ cat /proc/sys/fs/file-nr

18120 0 65536

$ /oracle/.../bin/emctl start agent
Oracle Enterprise Manager 10g Release 4 Grid Control 10.2.0.4.0.
Copyright (c) 1996, 2007 Oracle Corporation. All rights reserved.
Starting agent .......... started.

$ cat /proc/sys/fs/file-nr

18120 0 65536

>>>

This problem is about Agent defferenct Version...... with Oracle Database Software.
>>>

I don't know, It's a BUG..... or NOT

I think I should ==> reinstall agent [because can not rollback AGENT]
Or..... Upgrade Oracle Software [Database + Cluster]


Enjoy///

1 comment:

Anonymous said...

Common bug -- see Metalink for healthcheck monitor - or check the forums - its in there too. There is a patch for the db.
Easy workaround is to set "Instance Status" metric to disabled. Bounce the agent to release those files. It wont creep past ~18 after that.