Saturday, February 26, 2011

Reliable Datagram Sockets (RDS)

Reliable Datagram Sockets (RDS) is a reliable-socket off-load driver and inter-processor communication (IPC) protocol with low overhead, low-latency, high-bandwidth. RDS enables enhanced application performance and cluster scalability.
***Contributor: www.openfabrics.org (Particularly, Oracle)***
Reference:
http://www.openfabrics.org
http://oss.oracle.com/projects/rds/

RDS protocol provides reliable datagram services multiplexing UDP packets over InfiniBand connection improving performance to Oracle RAC. It provides high performance cluster interconnect for Oracle RAC. (Interest Link)

How to know RDS is used on RAC?
SQL> oradebug setmypid
Statement processed.
SQL> oradebug ipc
Information written to trace file.
On Oracle RAC (no RDS) - In trace file:
SSKGXPT 0x6700190 flags SSKGXPT_READPENDING socket no 7 IP 192.168.99.1 UDP 48798
context timestamp 0
On Oracle RAC (RDS) - In trace file:
SKGXP:[2b4d5065d400.97]{ctx}: SSKGXPT 0x2b4d506b1d98 flags 0x0 sockno 10 IP 192.168.99.1 RDS 26387 lerr 0
SKGXP:[47611061326848.97]{ctx}: SKGXPGPID Internet address 192.168.99.1 RDS port number 26387
Or... check in alert log file:
Cluster communication is configured to use the following interface(s) for this instance
192.168.99.1
cluster interconnect IPC version:Oracle RDS/IP (generic)
IPC Vendor 1 proto 3
If database cluster (RAC) not use RDS - rebuild RAC IPC library for RDS (use "oracle" user and stop what instances use this ORACLE_HOME before):
$ cd $ORACLE_HOME/rdbms/lib
$ make -f ins_rdbms.mk ipc_rds ioracle
If need to revert back RAC to use UDP instead of RDS:
$ cd $ORACLE_HOME/rdbms/lib
$ make -f ins_rdbms.mk ipc_g ioracle
Note: some command-line on Linux (rds-tools git repo):
rds-info - display information from the RDS kernel module
rds-ping - test reachability of remote node over RDS
rds-stress - send messages between processes over RDS sockets
Example:
# rds-info -n

RDS Connections:
LocalAddr RemoteAddr NextTX NextRX Flg
192.168.99.1 192.168.99.1 117066 6 --C
192.168.99.1 192.168.99.3 1167415 1060920 --C
192.168.99.1 192.168.99.2 1104151 1018883 --C
192.168.99.1 192.168.99.5 106692 62660 --C
192.168.99.1 192.168.99.4 1129073 990390 --C
192.168.99.1 192.168.99.7 341345 413031 --C
192.168.99.1 192.168.99.6 530757 603414 --C
192.168.99.1 192.168.99.9 108975 120114 --C
192.168.99.1 192.168.99.8 491816 560933 --C
192.168.99.1 192.168.99.11 270190 283060 --C
192.168.99.1 192.168.99.10 578831 588658 --C

# rds-ping 192.168.99.2
1: 29 usec
2: 28 usec
3: 31 usec
4: 33 usec

server01
# rds-stress
waiting for incoming connection on 0.0.0.0:4000

then on server02:
# rds-stress -s 192.168.99.1 -p 4000 -t 1 -d 1 -D 1024000
connecting to 192.168.99.1:4000
negotiated options, tasks will start in 2 seconds
Starting up....
tsks tx/s rx/s tx+rx K/s mbi K/s mbo K/s tx us/c rtt us cpu %
1 1252 1252 2647.90 1250670.54 1250670.54 28.47 764.87 -1.00
1 1251 1251 2643.31 1247501.25 1249497.26 29.39 764.85 -1.00
1 1247 1247 2637.50 1246756.98 1244758.98 29.56 767.66 -1.00
1 1248 1248 2639.52 1246709.66 1246709.66 30.07 766.60 -1.00
1 1247 1248 2638.61 1245780.38 1246779.40 29.93 767.10 -1.00
^C

Check on server01
# rds-stress
waiting for incoming connection on 0.0.0.0:4000
accepted connection from 192.168.99.2:46507 on 192.168.99.1:4000
negotiated options, tasks will start in 2 seconds
Starting up....
tsks tx/s rx/s tx+rx K/s mbi K/s mbo K/s tx us/c rtt us cpu %
1 1252 1252 2648.12 1250774.24 1250774.24 31.86 761.95 -1.00
1 1250 1250 2643.83 1248746.26 1248746.26 31.46 763.10 -1.00
1 1245 1245 2635.89 1244996.27 1244996.27 32.84 766.34 -1.00
1 1247 1247 2640.09 1246980.05 1246980.05 33.06 765.58 -1.00
1 1247 1246 2639.09 1247006.24 1246006.23 32.69 765.56 -1.00
---------------------------------------------
1 1247 1247 2641.41 1247701.42 1247501.50 32.39 764.81 -1.00 (average)
On Exadata, Oracle uses RDS (Reliable Datagram Sockets) V3. Oracle has developed it and we know in name the Zero Data loss UDP (ZDP) protocol.

1 comment:

Miline said...

I don't see "rw+rr K/s" column in your RDMA rds-stress output.

as per "man rds-stress"
rw+rr K/s: The total number of bytes that are being transferred via RDMA READs and WRITEs for all children.

What does that mean?