Mar 112013

Open vSwitch is a multi-layer switch. It is designed to enable network automation through programmatic extension, while still supporting standard management interfaces and protocols.

Openflow is a management protocol that is supported by Open vSwitch. Openflow is has a basic support for MPLS. It features a minimum operation set to enable to configure MPLS correclty. Openflow MPLS support is partially implemented in Open vSwitch but there is some difficulties.

SOme of the operations feature update of L3+ parameter like TTL. They must be updated in same manner in the MPLS header and in the packet header. And this is quite complicated as it supposed to decode the packet below MPLS. But MPLS header does not include the encapsulated ethernet type so it is almost impossible to access correctly to the packet structure.

A possible solution is to reinject the packet after modification to modify layer by layer in each step. This is currently a work in progress.

Mar 112013

Suricata and Netfilter can be better friend as they are doing some common work like decoding packet and maintaining flow table.

In IPS mode, Suricata is receiving raw packet from libnetfilter_queue. It has to made the parsing of this packet but this kind of thing has also been done by kernel. So it should be possible to avoid to duplicate the work.

In fact Netfilter work is limited as ipheaders srtucture are used. Patrik McHardy proposed that Netfilter offset but this is not the most costly part.

The flow discussion was more interesting because conntrack is really doing a similar work as the one done by Suricata. Using the CT extension of libnetfilter_queue, Suricata will be able to get access to all the CT information. And at a first glance, it seems it contains almost all information needed. So it should be possible to remove the flow engine from suricata. The garbage operation would not be necessary as Suricata will get information via NFCT destroy event.

Jozsef Kadlecsik proposed to use Tproxy to redirect flow and provide a “socket” stream instead of individual packet to Suricata. This would change Suricata a lot but could provide a interesting alternative mode.

Mar 112013

Pablo Neira Ayuso has made a panorama of Netfilter changes since last workshop.

On user side, the first main change to be published after last workshop, is libnetfilter_cttimeout. It allows you to define different timeout policies and to apply them to connections by using the CT target.

An other important new “feature” is a possibility to disable to automatic helper assignment. More information on Secure use of iptables and connection tracking helpers.

The ‘fail-open’ option of Netfilter allow to change the default behavior of kernel when packet are queued. When the maximum length of a queue is reached, the kernel is dropping by default all incoming packets. For some software, this is not the desired behavior. The ‘fail-open’ socket option allow you to change this and to have the kernel accept the packets if they can not be queued.

An other interesting feature is userspace cthelper. It is now possible to implement a connection tracking helper in userspace.

nfnetlink_queue has now CT support. This means it is possible to queue a packet with the conntrack information attached to it.

IPv6 NAT has been added by Patrick McHardy.

In october 2012, the old QUEUE target has been removed. A switch to NFQUEUE is now required.

connlabel has been added to kernel to provide more category than what was possible by using connmark.

A new bpf match has been committed but the final part in iptables part is missing.

Logging has been improved in helpers. They can reject packets and the user did not have the packet data and can not understand the reason of the drop. The nf_log_packet infrastructure can now be used to log the reason of the drop (with the packet). This should help user to understand the reason of the drops.

Mar 112013

Martin is working for a local ISP and is facing some DDoS. SYN cookie was implemented but the performance were too low with performance below 300kpps which is not what was expected. In fact SYN is on a slow path with a single spin lock protecting the SYN backtrack queue. So the system behave like a single core system relatively to SYN attacks.

Jesper Dangaard Brouer has proposed a patch to move the syn cookie out of the lock but it has some downside and could not be accepted. In particular, the syncookie system needs to check every type of packet to see if they belong to a previous syn cookie response and thus a central point is needed.

Alternate protection methods include using filtering in Netfilter. Regarding the performance, connection tracking is very costly as it split the packets rate by 2. With conntrack activated, the rate was 757 kpps and without conntrack it was 1738 kpps.

A Netfilter module implementing offloading of SYN cookies is proposed. The idea is to fake the SYN ACK part of the TCP handshake in the module which act as a proxy for the initiation of the connection. This would allow to treat syn cookie algorithm via a small dedicated table and will provided better performances.

Mar 112013

The routing cache was maintaining a list of routing decisions. This was an hash table which was highly dynamic and was changing due to traffic. One of the major problem was the garbage collector. An other severe issue was the possibility of DoS using the increase

The routing cache has been suppressed in Linux 3.6 after a 2 years effort by David and the other Linux kernel developers. The global cache has been suppressed and some stored information have been moved to more separate resources like socket.

There was a lot of side effects following this big transformation. On user side, there is no more “neighbour cache overflow” thanks to synchronized sizes of routing and neighbour table.

Metrics were stored in the routing cache entry which has disappeared. So it has been necessary to introduce a separate TCP metrics cache. A netlink interface is available to update/delete/add entry to the cache.

A other side effect of these modifications is that, on TCP socket, xt_owner could be used on input socket but the code needs to be updated.

On security side, the Reverse path filtering has been updated. When activated it is causing up to two extra FIB lookups But when deactivated there is now no overhead at all.

Mar 112013

Kronosnet is a “I conceived it when drunk but it works well” VPN implementation. It is using an Ether TAP for the VPN to provide a lyaer 2 vpn. To avoid reinventing the wheel, it is delegating most of the work to the kernel. It supports multilink and redundancy of servers. On multilink side, 8 links can be done per-host to help redundancy.

One of the use of this project is the creation of private network in the cloud as it can be easily setup to provide redundancy and connection for a lot of clients (64k simultaneous clients). And because a layer 2 VPN is really useful for this type of usage.

Configuration is made via a CLI comparable to the one of classical routers.

Fabio has run a demonstration on 4 servers and shows that cutting link has no impact on a ping flood thanks to the multilink system.

Mar 112013


I’ve made yesterday a presentation of ulogd2 at Open Source Days in Copenhagen. After a brief history of Netfilter logging, I’ve described the key features of ulogd2 and demonstrate two interfaces, nf3d and djedi.

The slides are available: Ulogd2, Netfilter logging reloaded.


This video demonstrates some features of nf3d:

This screencast is showing some of the capabilities of djedi:

Thanks a lot to the organizers for this cool event.