Patrick McHardy: memory mapped netlink and nfnetlink_queue

Patrick McHardy presents his work on a modification of netlink and nfnetlink_queue which is using memory map.

One of the problem of netlink is that netlink uses regular socket I/O and data need to be copied to the socket buffer data areas before being send. This is a problem for performance.

The basic concept or memory mapped netlink is to used a shared memory area which can be used by kernel and userspace. A ring buffer is set and instead of copying the data, we just move a pointer to the correct memory area and the userspace reads
It is necessary to synchronize kernel and user spaces to avoid a read on a non significative area. This is done by using a area ownership.

There is a RX and a TX ring and it is thus possible to send packet (or issue verdict via the TX ring). There is few advantages on the TX side, but the possibility to batch verdict by issuing multiple verdicts in one send message.

Backward compatibility with subsystem that does not support this new system is done via a standard copy and message sending and receiving.

Ordering of message was a difficult problem to solve, reading in the rings depends on the allocation time in the ring and not on the arrival date on the packet. It is thus possible to have unordered packet in the ring. To fix this, userspace can specify it cares about ordering and the kernel will then do reception of packet and copy atomically.

Multicast is currently not supported. The synchronisation of data accross clients is a big issue and most of the solution will have bad performance.

Userspace support has been done in libmnl. As usual with Patrick, the API looks clean and adding support for it in

Testing has been done only done on a loopback interface because Patrick did not have access to a 10Gbit test bed. This is a bad test case because loopback copy is less expensive and thus performance measurement on real NICs should give better result.
Anyway, the performance impact is consequent: between 200% and 300% bandwidth increase dependings on the packet size:

There is currently no known bugs and the submission to netdev should occurs soon.

4 thoughts on “Patrick McHardy: memory mapped netlink and nfnetlink_queue”

  1. I noticed that libmnl has a shared memory implementation, but how is it possible that this is lgpl? Shared memory from my understanding violates GPL or in the case of libmnl shared memory it would be infected with the ol’ regular GPL 2.0?

    Thoughts on said matter?

  2. This is something that I’ve certainly been watching this issue intently because from the GPL faq this would be an infection – now that would be a grey area, but because its not a defined interface that can have logical separation or is an API (google vs oracle).. what is the effect of it here? There is a reason most people use netlink sockets to get data to userspace from kernel (minus it being safer and easier of course)…. it could be GPL uncertainty, and I exercise caution due to this.

    I’m sure Harald Welte would have an interesting position on this…

Leave a Reply

Your email address will not be published. Required fields are marked *