At times it is necessary to flush UNREPLIED connection tracking entries for connectionless protocols if there are NAT rules involved. For example this is the case when a ipsec or a ppp connection goes up. Without doing that the connection are not correctly NATed because the topology change has not been taken into account.
Doing this in userspace with the conntrack-tools was taking long like minutes on some setup. They thus decide to put in kernel space and this is now only taking milliseconds instead of minutes.
Holger wants to know if somebody has another solution for this problem (or if someone see generic usage of their features).
Discussion shows that the explanation of the slowness was the fact that conntrack-tools force you to delete connection one by one. Other points were discussed like the fact that connection tracking could in someway react to this topology change. The discussion is planned to continue during the way back to the hotel.
There is a fixed number of connection tracking entries. When reaching the maximum, new connections are simply dropped. Default maximum size is ridicully too low like using 20Mbytes oon a 12GB memory computer.
Kernel syslog message
"nf_conntrack: table full, dropping, packet" is not correct because packet have just no state relatively to conntrack. Usually they get blocked by invalid rules but an adapted ruleset could let them go through.
One other problem is that adjusting the connection tracking size does not change the hash size. This results in longer search because conntrack has often to go through a list.
Mostly being out of entries is due to connection in end of life. But as the timeout are big, the number of entries can be important. Lowering the timeout when connection tracking is almost full can help to release the pressure. An automatic change of the parameters is something that could be thought of but finding a correct logic is not easy.
Destruction of non-important connection tracking entry is something that could really help, but it is necessary to find an adapted logic. Adjusting timeout dynamically requires to do a full scan of the list and this is really costly. This algorithm has also to be resistant to DoS attack. Finding a generic strategy is difficult. Pablo proposes to try a userspace solution. This could be used to experiment different policies and it could also use information taken from other subsystems or/and from configuration file.
Samir suggest to lower the
nf_conntrack_tcp_timeout_syn_sent when being under attack. This could made the bad entries to disappear faster.
Jan starts its presentation by talking about its Distro Availability Matrix of Netfilter tech page. It contains the software and their versions in a lot of distributions.
Next subject is the discussion about maintaining translations of iptables man page. The team is international and could translate in a few language the man pages. But the question is about finding volunteers in the long term. Jan is alright with taking in charge the synchronization of translation. Any volunteers for translation is welcome.
Then, Jan starts a discussion about hs work on Xtables2. The discussed point is switching iptables to netlink. The issue is that iptables command are huge in size and the size of a netlink pakcet is limited. There is thus an issue to solve. One of the possibility is to use continuation message which are supported by netlink. But the problem of cutting the message in the correct place is not easy. During the discussion, clarifications on how to forge huge netlink message appear.
Last subject is about maintening Netfilter. David Miller post a message on netdev complaining about Netfilter mainteners. Patrick and Pablo are currently working on having a git tree that they could share. This should help to speed up reaction of the mainteners. Doing a lot of work on iptables, Jan will soon have a account on Netfilter to be able to push patches to iptables official git tree.
Reverse Path filtering is currently only implemented in IPv4. Eric Leblond sends a patch to add support for IPv6 but it was refused by David Miller who, among other points, wanted to get rid of rp_filter and would like to see it in the Netfilter code.
Reverse patch filter implementation is a single function called fib_validate_source. Looking at the problem, it seem relatively simple to implement because, it is just to reverse source and destination and then get the output interface. if it match with the incoming interface, then this is ok.
But API is not that simple and implementing it in Netfilter is not easy. For example, in PRE_ROUTING we don’t have the output interface and thus we can not guess it easily by using simple Netfilter function. A implementation using standard function from the routing part is thus necessary. But there is still issue with multipath routing in IPV4. Florian then has tried a second implementation which mimic the behaviour of fib_validate_source.
Some implementation questions are discussed. The main part are about how to handle special cases. Patrick proposes to modify the code in PREROUTING to be able to access all interfaces. This will then permit to do a more Netfilter based implementation.
Regarding userspace syntax, this is a match and a specific iptables rules will have to be added to benefit from the functionality.
I just gave a presentation to explain that it is necessary to implement carefully reverse path filtering in IPv4 and IPv6.
More to come later.