QNET loadbalance over IP (QNX 6.4.0)

Evgeniy52 · June 23, 2009, 1:58pm

Hello all!

Is QNET over IP with loadbalancing supported in QNX 6.4.0?

I have 2 computers, each of them has 2 network ethernet adapters. IP protocol is esteblished, IP addresses are set manually, netmasks 255.255.255.0. Cross-cables are used:

hostname = host31 --------------------- hostname = host32

| en0 10.0.3.1 | << ========== >> | 10.0.3.2 en0 |

| en1 10.0.4.1 | << ========== >> | 10.0.4.2 en1 |

The aim is to set up QNET connection over IP with loadbalance. So if one cabel is disconnected (for example, 10.0.3.1(2) ), QNET has to use another connection (10.0.4.1(2) ).
I start QNET like this:

mount -Tio-pkt -o bind=ip,resolve=dns,qos_verbose=20 /lib/dll/lsm-qnet.so

io-pkt-v4-hc starts automaticly when system starts up, using devn-el900.so with shim.

ping 10.0.3.1(2), ping 10.0.4.1(2) work OK.

_CS_RESOLVE is set to lookup_file_bind

/etc/hosts (the same on both nodes):

Host Database

127.1 localhost.localdomain localhost
#::1 localhost.localdomain localhost

10.0.3.1 host31.lic.ru host31
10.0.3.2 host32.lic.ru host32

10.0.4.1 host31.lic.ru host31
10.0.4.2 host32.lic.ru host32

Question 1. Is it possible to set two IP addresses for the same host in /etc/hosts in QNX 6.4.0? Are there other ways to do it?

Then. Here is the /proc/qnetstats output :

…
**** Qnet compiled on Oct 20 2008 at 22:52:13 running on host31
**** Tx Connections:
**** Rx Connections:
**** L4 Status:
slot 0 mtu 1476 ack 1 crc 1
txd ok 0
txd bad 0
txd descr 0
txd still 0
tx timeouts 0
tx slow 0
rxd ok 0
rxd bad dr 0
rxd bad L4 0
rxd dropped 0
rxd duplic 0
rxd nacks 0
slot 1 is unused
**** Last 8192 bytes of circular qnet_error() log:
02035100: l4_init(): starting
02035100: l4_driver_init_en_iopkt(): starting
02035100: try_ifp(): ignoring interface en0 as per bind option
02035100: try_ifp(): ignoring interface en1 as per bind option
02035100: l4_driver_init_en_iopkt(): not ethernet bind, en io-pkt driver not running
02035100: l4_driver_init_ip_iopkt(): starting
02035100: l4_init(): ending
02035100: qnet_birth(): qnet_init() - complete
02035100: l4_resolve_node_up_ip_ionet(): nd 0 L4 0 has IP address 103000A in ndb
02035100: l4_resolve_node_up_ip_ionet(): saved our IP address of 103000A to L4 struct
02035100): nd_change_notify(): Node Up: nd 0 host31.lic.ru

Question 2. WHY Â«slot 1 is unusedÂ» and Â«saved our IP address of 103000A to L4 structÂ»? What about another interface/IP address? How to point to QNET to use all the interfaces avaliable? Is it possible (over IP)?

Then, if we execute on host31:
ls /net/host32

That’s all right. There are two node names in /net folders on both nodes and they are ready for work with them.
BUT !!!
Now and forward QNET uses ONLY ONE interface, with IP address 10.0.4.1(2), NOT that one witch has been saved in L4 struct!
If anoher cable ( 10.0.3.1(2) ) is connected or not - it has no diffirence. But when another cable ( 10.0.4.1(2) ) is disconnected, QNET fails and restores back to normal only when cabel is connected.

Here is the sloginfo output:

Jan 02 08:45:42 7 15 0 qnet(L4): l4_resolve_node_up_ip_ionet(): nd 1 L4 0 has IP address 203000A in ndb
Jan 02 08:45:42 7 15 0 qnet(QOS): nd_change_notify(): Node Up: nd 1 host32.lic.ru
Jan 02 08:45:42 7 15 0 qnet(kif): server_lookup(): invalid scoid 35, 0
Jan 02 08:45:42 7 15 0 qnet(QOS): tx_conn_create(): nd:1 conn id:1
Jan 02 08:45:43 7 15 0 qnet(QOS): tx_xmit_init_conn_pkt(): to nd 1 on L4 0
Jan 02 08:45:43 7 15 0 qnet(L4): ip_iopkt_tx_pkt(): TX: SRC: 0 DST: 204000A
Jan 02 08:45:43 7 15 0 qnet(QOS): tx_xmit_init_conn_pkt(): to nd 1 on L4 0 retry 1
Jan 02 08:45:43 7 15 0 qnet(L4): ip_iopkt_tx_pkt(): TX: SRC: 0 DST: 204000A
Jan 02 08:45:43 7 15 0 qnet(L4): qnet_ip_input(): RX: SRC: 204000A DST: 104000A
Jan 02 08:45:43 7 15 0 qnet(L4): ip_iopkt_tx_pkt(): TX: SRC: 0 DST: 204000A
Jan 02 08:45:43 7 15 0 qnet(L4): qnet_ip_input(): RX: SRC: 204000A DST: 104000A
Jan 02 08:45:43 7 15 0 qnet(L4): qnet_ip_input(): RX: SRC: 204000A DST: 104000A
Jan 02 08:45:43 7 15 0 qnet(QOS): rx_qos_init_pkt(): nd 1 new rx_conn 1
Jan 02 08:45:43 7 15 0 qnet(L4): ip_iopkt_tx_pkt(): TX: SRC: 0 DST: 204000A
Jan 02 08:45:43 7 15 0 qnet(L4): qnet_ip_input(): RX: SRC: 204000A DST: 104000A
Jan 02 08:45:43 7 15 0 qnet(L4): ip_iopkt_tx_pkt(): TX: SRC: 0 DST: 204000A
Jan 02 08:45:43 7 15 0 qnet(L4): qnet_ip_input(): RX: SRC: 204000A DST: 104000A
Jan 02 08:45:43 7 15 0 qnet(L4): ip_iopkt_tx_pkt(): TX: SRC: 0 DST: 204000A
Jan 02 08:45:43 7 15 0 qnet(L4): ip_iopkt_tx_pkt(): TX: SRC: 0 DST: 204000A
…

And so on, RX and TX for 10.0.4.1(2).

Question 3. Why the Server Connection ID error occurs (qnet(kif): server_lookup(): invalid scoid 35, 0) ? How to fix it?

Question 4. Should somethihg like /dev/io-net/qnet_ip and io-pkt directory appear, as it was in previous versions? I haven’t it.

Sorry for so long post. I hope for your help, it is a great problem now. If someone has expirience QNET over IP (particulary, with PPP) with loadbalance on several interfaces in 6.4.0, I would be happy:)))

Best regards,
Evgeniy.

P.S. Sorry for my English skill.

rgallen · June 28, 2009, 5:06am

I think qnet supports it, the question is whether tcpip does.

The reason you see “slot 1 is unused” is that the “bind=ip” only binds to one interface. Qnet doesn’t know that there are two interfaces, it only knows to send ip packets out “/dev/io-pkt/ip” (it does not know that the ip interface will route packets on net .4 out of a different adapter.

Conceptually, what you would need to make it work is:

mount -Tio-pkt -o bind=ip0,bind=ip1,resolve=dns, /lib/dll/lsm-qnet.so

The issue, I think, is how to get an ip0 and an ip1 within the same stack, that are bound to the respective en0 and en1 interfaces (there might be an ip_en0 and ip_en1, but I don’t have a system handy to look at this at the moment). If there is, then you could try “bind=ip_en0,bind=ip_en1”.

This might be possible, but I don’t know how (and I don’t have time to investigate the source), but I think that if you checked out the tcpip module source in io-pkt, you should be able to determine if this can be done.

Hope this helps.

Rennie

maschoen · June 28, 2009, 5:32pm

I haven’t played with this with QNX 6 yet, but I recall testing with QNX 4 and finding some very unexpected results, a reason to be cautious with QNX 6. I had two QNX 4 nodes connected with two sets of ethernet cards and separate cables. But they were also connected via QNET. What I found was that you had to be careful of whether your TCP/IP connections were really node to node. QNX 4 would allow you to connect to the TCP/IP administrator on the other node via QNET which caused a direct connection.

This was very confusing at first because I was trying test some TCP/IP load balancing software that was built into the application. So after I made the connection which I thought was over one wire, I would disconnect it. But the applications continued to communicate because the communication now routed via QNET was moving over the other wire.

I don’t know if QNX 6 has this uh problem, feature?

rgallen · June 28, 2009, 5:36pm

If you are referring to “SOCK=/alt” it sure does (and yes it is a feature, although I always love the look on peoples faces when I explain that the socket() API isn’t actually resident on the node from which they are successfully making the call

maschoen · June 28, 2009, 5:39pm

The reason I called it a problem was that it was actually quite challenging to turn it off in QNX 4 so that we could test our load balancing. But yes I agree, it is quite amazing. A little like NAT without the NAT.

Evgeniy52 · June 29, 2009, 5:53am

rgallen,
thanks for reply!

Maybe you’re right, but i don’t have “/dev/io-pkt/” and ip0, ip1, ip_en0, ip_en1 too.
I have “dev/io-net” with en0 and en1 inside.

I think that “bind=en0” is not a “QNET over IP”. To my mind it seems like “QNET over MAC” case.

Do you agree with me?
And couldn’t you tell me, please, why i have “/dev/io-net” but not the “/dev/io-pkt” with ipX inside when i’m using QNX 6.4.0 ? Is it right?
Thanks.

rgallen · June 29, 2009, 6:15am

Yeah, that’s what I suspected.

Yes, of course, bind=en0 is a binding of qnet frames directly to ethernet frames.

Whatever the prefix name is, isn’t really important. What would be interesting would be another ip interface mountpoint.

Perhaps the thinking at QSSL is that if one is riding over IP, that loadbalancing would be achieved via SCTP?

I don’t have a system to play with at the moment, but you know what is required, so just sniff around the source a bit, and I am sure you’ll find out if there is something like you want.

Why is it that you need ip anyway?

Evgeniy52 · June 29, 2009, 7:02am

I’m sorry, but I don’t understand one thing: must ip interfaces mountpoints “ip_en0” and “ip_en1” be present at “/dev/io-net/” or not?

Do you mean QNET source code?

It’s indispensable condition, i can’t change it))

rgallen · June 29, 2009, 4:41pm

Not sure I understand your question.

Yes; and the tcpip source code (io-pkt) as well.

Evgeniy52 · June 30, 2009, 5:14am

Is it necessary that “ip_en0” and “ip_en1” are present in “/dev/io-net/” directory?

“/dev/io-net” contents in QNX 6.4.0:
en0 en1

In QNX 6.3.2:
en0 en1 ip0 ip_en0

These results of “ls /dev/io-net” on the same mashine with two ethernet adapters in different operating systems loaded.

I’d like to have “ip0” and “ip1” ( or “ip_en0” and “ip_en1”). In this case i would start QNET with “bind=ip_en0,bind=ip_en1” options.

Why I don’t have them?

Evgeniy52 · June 30, 2009, 12:13pm

Here is code :

“/ qnet / kif / kif_server.c”

static struct net_server *
server_lookup(int index)
{
struct net_server *ptr = NULL;
int scoid = index & 0xffff;

if (scoid >= server_total) {
    qnet_error(&kif_module, EOK, "%s(): invalid scoid %d, %d",
	  __FUNCTION__, scoid, server_total);
    return NULL;
}	

ptr = server_array[scoid];
return ptr;

}

Error message of this function i can see in sloginfo.

In my case (see my first post):
scoid = 35,
server_total = 0.

Why?

rgallen · June 30, 2009, 6:06pm

Because they don’t exist. Sorry it must be the language barrier. I was asking if you had these mount points (as I didn’t know whether they existed or not).

rgallen · June 30, 2009, 6:13pm

This error is not a concern at the moment. The fact that you want load-balancing, but don’t have two interfaces is though. You need to check (in the source) whether:

qnet builds two “virtual” interfaces by probing the ip mountpoint (unlikely)
the tcpip module of io-pkt can be made to export an ip mountpoint specific to an interface (more likely, but still not probable)

If neither of these facilities currently exist, then you need to implement one of them (I’d think getting the tcpip module to export interface specific ip mountpoints would be easier, but it still could be significant work).

Another possibility would be to write a virtual ethernet tunnel driver (that tunnels ethernet frames over IP), but obviously that would have a performance impact (depending on your app it may not be an issue).

Evgeniy52 · July 1, 2009, 12:04pm

Yes:) I’m sorry for my English.

In 6.3.2 I can do like this:

io-net -p tcpip -p tcpip ....

and ip0, ip1, ip2 … will be appear in /dev/io-net.

How can I attach ip0 to en0 and ip1 to en1?

rgallen · July 1, 2009, 2:53pm

Cool!

So what does “ifconfig” show, when you start io-pkt like that?

(why do you show “io-net” in your example above?)

Evgeniy52 · July 2, 2009, 11:54am

I execute, for example

slay io-net

io-net -d el900 -p tcpip -d el900 -p tcpip

“ifconfig” shows lo0 only (

ls /dev/io-net

en0 en1 ip0 ip1 ip_en

I show “io-net” because this ability is in 6.3.2 only.
In 6.4.0 with io-pkt I can’t make ip0 and ip1 to appear.

(QNX is for patient people only ))) )