Tuesday, January 18, 2011

11.2.0.2 Grid Infrastructure Install or Upgrade may fail due to Multicasting Requirement

This is a bug to aware, If someone install Oracle Grid Infrastructure 11.2.0.2 for Cluster, and run "root.sh" command on node02 (and multicasting is not enabled on private network).
Oracle Grid Infrastructure 11.2.0.2 introduces new feature called Redundant Interconnect allowing for Oracle Supplied redundancy for the cluster interconnect. With this new feature, multicast network communication on the private interconnect network is utilized on bootstrap to establish communication with peer nodes in the cluster, once communication is established network communication is then switched to unicast. This mulitcast communication utilizes the 230.0.1.0 address (port 42424) on the private interconnect network. Therefore multicast on the private interconnect network must be enabled and properly functioning on all cluster nodes for the mulitcast address 230.0.1.0 (port 42424).
OK... my issue, I could run "root.sh" on first node. but node02, I found (after "root.sh" command).
CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node node01, number 1, and is terminating
An active cluster was found during exclusive startup, restarting to join the cluster
Failed to start Oracle Clusterware stack
Failed to start Cluster Synchorinisation Service in clustered mode at /u01/app/11.2.0/grid/crs/install/crsconfig_lib.pm line 1016.
/u01/app/11.2.0/grid/perl/bin/perl -I/u01/app/11.2.0/grid/perl/lib -I/u01/app/11.2.0/grid/crs/install /u01/app/11.2.0/grid/crs/install/rootcrs.pl execution failed
then checked /u01/app/11.2.0/grid/log/node02/cssd/ocssd.log file
2011-01-18 00:29:36.966: [ CSSD][1114298688]clssnmconnect: connecting to addr gipcha://node01:nm2_node
2011-01-18 00:29:36.967: [ CSSD][1114298688]clssscConnect: endp 0x45c - cookie 0x2aaaac080de0 - addr gipcha://node01:nm2_node
2011-01-18 00:29:36.967: [GIPCHGEN][1091209536] gipchaNodeCreate: adding new node 0xfe40810 { host 'node01', haName 'CSS_node', srcLuid 30bf5d1f-5573cf1c, dstLuid 00000000-00000000 numInf0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [0 : 0], createTime 220472644, flags 0x0 }
2011-01-18 00:29:36.967: [ CSSD][1114298688]clssnmconnect: connecting to node(1), endp(0x45c), flags 0x10002
2011-01-18 00:29:36.967: [GIPCHGEN][1083623744] gipchaNodeAddInterface: adding interface information for inf 0x2aaab0084990 { host 'node01', haName 'CSS_node', local (nil), ip '192.168.2.2', subnet '192.168.2.0', mask '255.255.255.0', numRef 0, numFail 0, flags 0x42 }
2011-01-18 00:29:36.967: [GIPCHTHR][1083623744] gipchaWorkerUpdateInterface: created remote interface for node 'node01', haName 'CSS_node', inf 'mcast://230.0.1.0:42424/192.168.2.2'
2011-01-18 00:29:36.967: [GIPCHGEN][1083623744] gipchaWorkerAttachInterface: Interface attached inf 0x2aaab0084990 { host 'node01', haName 'CSS_node', local 0x2aaaac3fada0, ip '192.168.2.2', subnet '192.168.2.0', mask '255.255.255.0', numRef 0, numFail 0, flags 0x46 }
2011-01-18 00:29:36.967: [GIPCHALO][1083623744] gipchaLowerSend: deffering startup of hdr 0x2aaab0090378 { len 232, seq 0, type gipchaHdrTypeSend (1), lastSeq 0, lastAck 0, minAck 0, flags 0x0, srcLuid 00000000-00000000, dstLuid 00000000-00000000, msgId 0 }, node 0xfe40810 { host 'node01', haName 'CSS_node', srcLuid 30bf5d1f-5573cf1c, dstLuid 00000000-00000000 numInf 1, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [0 : 0], createTime 220472644, flags 0x0 }
2011-01-18 00:29:37.966: [ CSSD][1103636800]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
2011-01-18 00:29:37.970: [ CSSD][1100482880]clssnmvDHBValidateNCopy: node 1, node01, has a disk HB, but no network HB, DHB has rcfg 190770055, wrtcnt, 2754, LATS 220473644, lastSeqNo 2753,uniqueness 1295282578, timestamp 1295285372/19031824
I found 11.2.0.2 Grid Infrastructure Install or Upgrade may fail due to Multicasting Requirement [ID 1212703.1] ,then downloaded perl script and tested.
$ perl mcasttest.pl -n node01,node02 -i bond0,bond1
########### Setup for node node01 ##########
Checking node access 'node01'
Checking node login 'node01'
Checking/Creating Directory /tmp/mcasttest for binary on node 'node01'
Distributing mcast2 binary to node 'node01'
########### Setup for node node02 ##########
Checking node access 'node02'
Checking node login 'node02'
Checking/Creating Directory /tmp/mcasttest for binary on node 'node02'
Distributing mcast2 binary to node 'node02'
########### testing Multicast on all nodes ##########
Test for Multicast address 230.0.1.0

Jan 18 02:27:21 | Multicast Failed for bond0 using address 230.0.1.0:42000
Jan 18 02:27:51 | Multicast Failed for bond1 using address 230.0.1.0:42001

Test for Multicast address 224.0.0.251

Jan 18 02:27:53 | Multicast Succeeded for bond0 using address 224.0.0.251:42002
Jan 18 02:27:54 | Multicast Succeeded for bond1 using address 224.0.0.251:42003
230.0.1.0(failed) but 224.0.0.251 (succeeded) - Oracle support suggests Patch: 9974223 (
to enable Grid Infrastructure to use the functional 224.0.0.251 multicast address
).
I downloaded patch(9974223), patched and installed 11.2.0.2 Grid Infrastructure for RAC.
$ opatch lsinventory -detail -oh /u01/app/11.2.0/grid
Patch 9974223 : applied on Tue Jan 18 02:12:27 ICT 2011
Unique Patch ID: 13084111
Created on 27 Oct 2010, 02:44:07 hrs PST8PDT
Bugs fixed:
9974223, 10073372
I installed!!!
$ /u01/app/11.2.0/grid/bin/crsctl check cluster -all
**************************************************************
node01:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
node02:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
Note:
If multicast.pl test program have failed for both the 230.0.1.0 and the 224.0.0.251 address, we must work with your System and Network Administrators to enable multicast on one of the addresses for the private interconnect network. If multicast functionality be enabled for only the 224.0.0.251 address, Patch: 9974223 is required. If multicast is enabled for the 230.0.1.0 address or for both the 230.0.1.0 and 224.0.0.251 addresses, it is not necessary to apply Patch: 9974223.

Reference: Oracle Support

^ ^

7 comments:

Mr.Wittawat said...

Surachart, Thank you so much
I installed and check cluckter -all.
I should will run "root.sh" command on node02 again?

Surachart Opun said...

yes... you should run "root.sh" node02 again.

Mr.Wittawat said...

I run "root.sh" command on node02 again complete.

Configure Oracle Grid Infrastructure for a Cluster ... succeeded

I have Question?
After Configure Oracle Grid Infrastructure install complete, I run mcasttest.pl again and found Multicast Failed same before Patch: 9974223. You think this problem effect to Grid Infrastructure?

Thank You so much.

Surachart Opun said...

multicat:
230.0.1.0 and 224.0.0.251 addresses
Before patched : grid use 230.0.1.0 (only)
then after patch ... grid can use 224.0.0.251

patch not fix 230.0.1.0 failed.
bu just help grid be able to use 224.0.0.251

However, Please read "Note"

Unknown said...

hello surachart,

you are right, that the patch will change the multicast address.

but i think, it is more important to make the connection over originally multicast address possible. so changes in network are required. but they are simply easy in most cases. (we had the problem on aix and there the whole patch installation needs nearly 18gb diskspace additionally. on the other hand, network guys just had to chance one parameter in this network segment)

have fun,

sven....

Surachart Opun said...

Thank You @morlings for sharing your experience.

Rauf said...

Hi Surachart,
I am having this issue while installing grid infrastructure for a 2 Node RAC on 11.2.0.4,my issue is that root.sh script completed successfully on the first node but it is failing in the second node,the following is from the alert log of the 2nd node,I did the checks on the mulicast and it was successful,your suggestions please


[crsd(15158)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/usolpld004/crsd/crsd.log.
2014-02-28 11:43:43.704:
[crsd(15158)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
ORA-01017: invalid username/password; logon denied
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/usolpld004/crsd/crsd.log.