Showing posts with label infiniband. Show all posts
Showing posts with label infiniband. Show all posts

Monday, May 30, 2011

InfiniBand switch - relocated Subnet Manager Master to another switch

After I posted some about sm_priority is not set to recommended value of 5 on infiniband switch. Thank You everyone for comment, documents and ideas.

I ignored about this warning. why?
Exadata Document:
Exadata Database Machine Full Racks and Oracle Exadata Database Machine X2-2 Half Racks have three Sun Datacenter InfiniBand Switch 36 switches. The switch at rack unit 1 (U1) is referred to as the spine switch. The switches at rack unit 20 (U20) and rack unit 24 (U24) in Oracle Exadata Database Machine X2-2 racks, or unit 21(U21) and rack unit 23 (U23) in Oracle Exadata Database Machine X2-8 Full Racks are referred to as leaf switches. The spine switch is the Subnet Manager Master for the InfiniBand subnet. It has priority 8.
Sun Datacenter InfiniBand Switch 36 Topic Set
By setting a Subnet Manager to a higher priority than another Subnet Manager, it becomes the primary (or Master) Subnet Manager.
Infiniband Switch (01) = spine switch (spine switch is the Subnet Manager Master) - So, It should have a higher priority and Exadata Document (The spine switch is the Subnet Manager Master for the InfiniBand subnet. It has priority 8).

However, I found something wrong, maybe I tested many thing on Exadata.
# getmaster
20110530 11:06:00 OpenSM Master on Switch : 0x0021286ccca9a0a0 ports 36 Sun DCS 36 QDR switch exasw-ib2 enhanced port 0 lid 4 lmc 0
So, Relocated Subnet Manager Master to another switch. remote to leaf switch (exasw-ib2 ) and then disable/enable SM.
# ssh exasw-ib2

# disablesm
Stopping IB Subnet Manager.. [ OK ]

# enablesm
Starting IB Subnet Manager. [ OK ]

# getmaster
20110530 11:08:31 OpenSM Master on Switch : 0x0021286cd635a0a0 ports 36 Sun DCS 36 QDR switch exasw-ib1 enhanced port 0 lid 1 lmc 0.
It's relocated to 01 (Infiniband Switch).

Thursday, May 26, 2011

exachk - WARNING!!! sm_priority is not set to recommended value of 5 on infiniband switch

I used exachk and found WARNING => sm_priority is not set to recommended value of 5 on infiniband switch exasw-ib1
So, I checked them by CheckSWProfile.sh script.
# ./CheckSWProfile.sh -I exasw-ib1,exasw-ib2,exasw-ib3
Checking if switch exasw-ib1 is pingable...
Checking if switch exasw-ib2 is pingable...
Checking if switch exasw-ib3 is pingable...
Use the default password for all switches? (y/n) [n]: y
[ERROR] OpenSM configurations mismatch for switch exasw-ib1
Found: controlled_handover=TRUE log_max_size=8 polling_retry_number=5 routing_engine=ftree sminfo_polling_timeout=1000 sm_priority=8
Required: controlled_handover=TRUE log_max_size=8 polling_retry_number=5 routing_engine=ftree sminfo_polling_timeout=1000 sm_priority=5
then I changed sm_priority (8 to 5) in /etc/opensm/opensm.conf on infiniband switch (exasw-ib1)
# vi /etc/opensm/opensm.conf
#Begin /etc/opensm/opensm.conf
.
.
# SM priority used for deciding who is the master
# Range goes from 0 (lowest priority) to 15 (highest).
sm_priority 5
.
.
.
# End /etc/opensm/opensm.conf

After changed ... restart Subnet Manager on infiniband switch
# /etc/init.d/opensmd

Usage: opensmd {start|stop|restart|status}

# /etc/init.d/opensmd restart
Stopping IB Subnet Manager.. [ OK ]
Starting IB Subnet Manager. [ OK ]
back to database server again... and used CheckSWProfile.sh script.
# ./CheckSWProfile.sh -I exasw-ib1,exasw-ib2,exasw-ib3
Checking if switch exasw-ib1 is pingable...
Checking if switch exasw-ib2 is pingable...
Checking if switch exasw-ib3 is pingable...
Use the default password for all switches? (y/n) [n]: y
[INFO] SUCCESS All switches have correct software and firmware version:
SWVer: 1.1.3-2
[INFO] SUCCESS All switches have correct opensm configuration:
controlled_handover=TRUE log_max_size=8 polling_retry_number=5 routing_engine=ftree sminfo_polling_timeout=1000 sm_priority=5