nftables – To Linux and beyond !

Using DOM with nftables

DOM and SSH honeypot

DOM is a solution comparable to fail2ban but it uses Suricata SSH log instead of SSH server logs. The goal of DOM is to redirect the attacker based on its SSH client version. This allows to send attacker to a honeypot like pshitt directly after the first attempt. And this can be done for a whole network as Suricata does not need to be on the targeted box.

Using DOM with nftables

I’ve pushed a basic nftables support to DOM. Instead of adding element via ipset it uses a nftables set.

It is simple to use it as you just need to add a -n flag to specify which table the set has been defined in:

./dom -f /usr/local/var/log/suricata/eve.json -n nat -s libssh -vvv -i -m OpenSSH

To activate the network address translation based on the set, you can use something like:

table ip nat {
        set libssh { 
                type ipv4_addr
        }

        chain prerouting {
                 type nat hook prerouting priority -150;
                 ip saddr @libssh ip protocol tcp counter dnat 192.168.1.1:2200
        }
}

A complete basic ruleset

Here’s the ruleset running on the box implementing pshitt and DOM:

table inet filter {
        chain input {
                 type filter hook input priority 0;
                 ct state established,related accept
                 iif lo accept
                 ct state new iif != lo tcp dport {ssh, 2200} tcp flags == syn counter log prefix "SSH attempt" group 1 accept
                 iif br0 ip daddr 224.2.2.4 accept
                 ip saddr 192.168.0.0/24 tcp dport {9300, 3142} counter accept
                 ip saddr 192.168.1.0/24 counter accept
                 counter log prefix "Input Default DROP" group 2 drop
        }
}

table ip nat {
        set libssh { 
                type ipv4_addr
        }

        chain prerouting {
                 type nat hook prerouting priority -150;
                 ip saddr @libssh ip protocol tcp counter dnat 192.168.1.1:2200
        }

        chain postrouting {
                 type nat hook postrouting priority -150;
                 ip saddr 192.168.0.0/24 snat 192.168.1.1
        }
}

There is a interesting rule in this ruleset. The first is:

ct state new iif != lo tcp dport {ssh, 2200} tcp flags == syn counter log prefix "SSH attempt" group 1 accept

It uses a negative construction to match on the interface iif != lo which means interface is not lo. Note that it also uses an unamed set to define the port list via tcp dport {ssh, 2200}. That way we have one single rule for normal and honeypot ssh. At least, this rule is logging and accepting and the logging is done via nfnetlink_log because of the group parameter. This allows to have ulogd to capture log message triggered by this rule.

Suricata and Nftables

Iptables and suricata as IPS

Building a Suricata ruleset with iptables has always been a complicated task when trying to combined the rules that are necessary for the IPS with the firewall rules. Suricata has always used Netfilter advanced features allowing some more or less tricky methods to be used.

For the one not familiar with IPS using Netfilter, here’s a few starting points:

IPS receives the packet coming from kernel via rules using the NFQUEUE target
The IPS must received all packets of a given flow to be able to handle detection cleanly
The NFQUEUE target is a terminal target: when the IPS verdicts a packet, it is or accepted (and leave current chain)

So the ruleset needs to send all packets to the IPS. A basic ruleset for an IPS could thus looks like:

iptables -A FORWARD -j NFQUEUE

With such a ruleset, all packets going through the box are sent to the IPS.

If now you want to combine this with your ruleset, then usually your first try is to add rules to the filter chain:

iptables -A FORWARD -j NFQUEUE
iptables -A FORWARD -m conntrack --ctstate ESTABLISHED -j ACCEPT
# your firewall rules here

But this will not work because of point 2: All packets sent via NFQUEUE to the IPS are or blocked or if accepted, leave the FORWARD chain directly and are going for evaluation to the next chain (mangle POSTROUTING in our case).
With such a ruleset, the result is that there is no firewall but an IPS in place.

As mentioned before there is some existing solutions (see Building a Suricata ruleset for extensive information). The simplest one is to dedicated one another chain such as mangle to IPS:

iptables -A FORWARD -t mangle -j NFQUEUE
iptables -A FORWARD -m conntrack --ctstate ESTABLISHED -j ACCEPT
# your firewall rules here

No conflict here but you have to be sure nothing in your system will use the the mangle table or you will have the same problem as the one seen previously in the filter chain. So there was no universal and simple solution to implement an IPS and a firewall ruleset with iptables.

IPS the easy way with Nftables

In Nftables, chains are defined by the user using nft command line. The user can specify:

The hook: the place in packet life where the chain will be set. See this diagram for more info.
- prerouting: chain will be placed before packet are routed
- input: chain will receive packets going to the box
- forward: chain will receive packets routed by the box
- postrouting: chain will receive packets after routing and before sending packets
- output: chain will receive packet sent by the host
The chain type: define the objective of the chain
- filter: chain will filter packet
- nat: chain will only contains NAT rules
- route: chain is containing rule that may change the route (previously now as mangle)
The priority: define the evaluation order of the different chains of a given hook. It is an integer that can be freely specified. But it also permits to place chain before or after some internal operation such as connection tracking.

In our case, we want to act on forwarded packets. And we want to have a chain for filtering followed by a chain for IPS. So the setup is simple of chain is simple

nft -i
nft> add table filter
nft> add chain filter firewall { type filter hook forward priority 0;}
nft> add chain filter IPS { type filter hook forward priority 10;}

With this setup, a packet will reach the firewall chain first where it will be filtered. If the packet is blocked, it will be destroy inside of the kernel. It the packet is accepted it will then jump to the next chain following order of increasing priority. In our case, the packet reaches the IPS chain.

Now, that we’ve got our chains we can add filtering rules, for example:

nft add rule filter firewall ct state established accept
nft add rule filter firewall tcp dport ssh counter accept
nft add rule filter firewall tcp dport 443 accept
nft add rule filter firewall counter log drop

And for our Suricata IPS, that’s just trivial:

nft add rule filter IPS queue

A bit more details

The queue target in nftables

The complete support for the queue target will be available in Linux 3.14. The syntax looks as follow:

nft add rule filter output queue num 3 total 2 options fanout

This rule sends matching packets to 2 load-balanced queues (total 2) starting at 3 (num 3).
fanout is one of the two queue options:

fanout: When used together with total load balancing, this will use the CPU ID as an index to map packets to the queues. The idea is that you can improve perfor mance if there’s a queue per CPU. This requires total with a value superior to 1 to be specified.
bypass: By default, if no userspace program is listening on an Netfilter queue,then all packets that are to be queued are dropped. When this option is used, the queue rule behaves like ACCEPT instead, and the packet will move on to the next table.

For a complete description of queueing mechanism in Netfilter see Using NFQUEUE and libnetfilter_queue.

If you want to test this before Linux 3.14 release, you can get nft sources from nftables git and use next-3.14 branch.

Chain priority

For reference, here are the priority values of some important internal operations and of iptables static chains:

NF_IP_PRI_CONNTRACK_DEFRAG (-400): priority of defragmentation
NF_IP_PRI_RAW (-300): traditional priority of the raw table placed before connection tracking operation
NF_IP_PRI_SELINUX_FIRST (-225): SELinux operations
NF_IP_PRI_CONNTRACK (-200): Connection tracking operations
NF_IP_PRI_MANGLE (-150): mangle operation
NF_IP_PRI_NAT_DST (-100): destination NAT
NF_IP_PRI_FILTER (0): filtering operation, the filter table
NF_IP_PRI_SECURITY (50): Place of security table where secmark can be set for example
NF_IP_PRI_NAT_SRC (100): source NAT
NF_IP_PRI_SELINUX_LAST (225): SELInux at packet exit
NF_IP_PRI_CONNTRACK_HELPER (300): connection tracking at exit

For example, one can create in nftables an equivalent of the raw PREROUTING chain of iptables by doing:

# nft -i
nft> add chain filter pre_raw { type filter hook prerouting priority -300;}

Why you will love nftables

Linux 3.13 is out

Linux 3.13 is out bringing among other thing the first official release of nftables. nftables is the project that aims to replace the existing {ip,ip6,arp,eb}tables framework aka iptables.
nftables version in Linux 3.13 is not yet complete. Some important features are missing and will be introduced in the following Linux versions.
It is already usable in most cases but a complete support (read nftables at a better level than iptables) should be available in Linux 3.15.

nftables comes with a new command line tool named nft. nft is the successor of iptables and derivatives (ip6tables, arptables). And it has a completely different syntax.
Yes, if you are used to iptables, that’s a shock. But there is a compatibility layer that allow you to use iptables even if filtering is done with nftables in kernel.

There is only really few documentation available for now. You can find my nftables quick howto and there is some other initiatives that should be made public soon.

Some command line examples

Multiple targets on one line

Suppose you want to log and drop a packet with iptables, you had to write two rules. One for drop and one for logging:

iptables -A FORWARD -p tcp --dport 22 -j LOG
iptables -A FORWARD -p tcp --dport 22 -j DROP

With nft, you can combined both targets:

nft add rule filter forward tcp dport 22 log drop

Easy set creation

Suppose you want to allow packets for different ports and allow different icmpv6 types. With iptables, you need to use something like:

ip6tables -A INPUT -p tcp -m multiport --dports 23,80,443 -j ACCEPT
ip6tables -A INPUT -p icmpv6 --icmpv6-type neighbor-solicitation -j ACCEPT
ip6tables -A INPUT -p icmpv6 --icmpv6-type echo-request -j ACCEPT
ip6tables -A INPUT -p icmpv6 --icmpv6-type router-advertisement -j ACCEPT
ip6tables -A INPUT -p icmpv6 --icmpv6-type neighbor-advertisement -j ACCEPT

With nft, sets can be use on any element in a rule:

nft add rule ip6 filter input tcp dport {telnet, http, https} accept
nft add rule ip6 filter input icmpv6 type { nd-neighbor-solicit, echo-request, nd-router-advert, nd-neighbor-advert } accept

It is easier to write and it is more efficient on filtering side as there is only one rule added for each protocol.

You can also use named set to be able to make them evolve other time:

# nft -i # use interactive mode
nft> add set global ipv4_ad { type ipv4_address;}
nft> add element global ipv4_ad { 192.168.1.4, 192.168.1.5 }
nft> add rule ip global filter ip saddr @ipv4_ad drop

And later when a new bad boy is detected:

# nft -i
nft> add element global ipv4_ad { 192.168.3.4 }

Mapping

One advanced feature of nftables is mapping. It is possible to use to different type of data and to link them.
For example, we can associate iface and a dedicated rule set (stored in a chain and created before). In the example, the chains are named low_sec and high_sec:

# nft -i
nft> add map filter jump_map { type ifindex : verdict; }
nft> add element filter jump_map { eth0 : jump low_sec; }
nft> add element filter jump_map { eth1 : jump high_sec; }
nft> add rule filter input iif vmap @jump_map

Now, let’s say you have a new dynamic interface ppp1, it is easy to setup filtering for it. Simply add it in the jump_map mapping:

nft> add element filter jump_map { ppp1 : jump low_sec; }

On administration and kernel side

More speed at update

Adding a rule in iptables was getting dramatically slower with the number of rules and that’s explained why script using iptables call are taking a long time to complete. This is not anymore with nftables which is using atomic and fast operation to update rule sets.

Less kernel update

With iptables, each match or target was requiring a kernel module. So, you had to recompile kernel in case you forgot something or want to use something new.
this is not anymore the case with nftables. In nftables, most work is done in userspace and kernel only knows some basic instruction (filtering is implemented in a pseudo-state machine).
For example, icmpv6 support has been achieved via a simple patch of the nft tool.
This type of modification in iptables would have required kernel and iptables upgrade.

Pablo Neira Ayuso, nftables strikes back

Introduction

This is a new kernel packet filtering framework. The only change is on iptables. Netfilter hooks, connection tracking system, NAT are unchanged.
It provides a backward compatibility. nftables was released in March 2009 by Patrick Mchardy. It has been revived in the precedent months by Pablo Neira Ayuso and other hackers.

Architecture

It uses a pseudo-state machine in kernel-space which is similar to BPF:

4 registers: 4 general purpose (128 bits long each) + 1 verdict
provides instruction set (which can be extended)

Here’s a example of existing instructions:

reg = pkt.payload[offset, len]
reg = cmp(reg1, reg2, EQ)
reg = byteorder(reg1, NTOH)
reg = pkt.meta(mark)
reg = lookup(set, reg1)
reg = ct(reg1, state)
reg = lookup(set, data)

Assuming this complex match can be developed in userspace by using this instruction set. For example, you can check if source and destination port are equal by doing two pkt.payload operation and then do a comparison:

reg1 = pkt.payload[offset_src_port, len]
reg2 = pkt.payload[offset_dst_port, len]
reg = cmp(reg1, reg2, EQ)

This was strictly impossible to do without coding a C kernel module with iptables.
C module still needs to be implemented if they can not be expressed with the existing instructions (for example the hashlimit module can not be expressed with current instructions). In this case, the way to do will be to provide a new instruction that will allow more combination in the futur.

nftables shows a great flexibility. The rules counters are now optional. And they come in two flavors, one system-wide counter with a lock (and so with performance impact) and a per-cpu counter. You can choose which type you want when writing the rule.

Nftables operation

All communications are made through Netlink messages and this is the main change from a user perspective. This change is here to be able to do incremental changes. iptables was using a different approach where it was getting the whole ruleset from kernel in binary format making update to this binary blob and sending it back to the kernel. So the size of the ruleset has a big impact on update performance.

Rules have a generation masks attached coding if the rules is currently active, will be active in next generation or destroyed in next generation. Adding rules is starting a transaction, sending netlink messages adding or deleting rules. Then the commit operation will ask the kernel to switch the ruleset.
This allow to have the active ruleset to change at once without any transition state where only a part of the rules are deployed.

Impact of the changes for users

The iptables backward comptability will be provided and so there is no impact on users. xtables is a binary which is using the same syntax as iptables and that is provided by nftables. Linking the xtables binary to iptables and ip6tables will be enough to assure compatibility of existing scripts.

User will have more and more interest in switching due to new features that will be possible in nftables but not possible with iptables.

A high level library will be provided to be able to handle rules. It will allow easy rule writing and will have an official public API.

The main issue in the migration process will be for frontends/tools which are using the IPTC library which is not officially public but is used by some. As this library is an internal library for iptables using binary exchange, it will not work on a kernel using nftables. So, application using IPTC will need to switch to the upcoming nftables high level library.

The discussion between the attendees also came to the conclusion that providing a easy to use search operation in the library will be a good way to permit improvement of different firewall frontends. By updating their logic by doing a search and then adding a rule if needed, they will avoid conflict between them as this is for example currently the case with network manager which is destroying any masquerading policy in place by blindly inserting a masquerading rule on top when he thinks he needs one.

Release date

A lot of works is still needed but there is good chance the project will be proposed upstream before the end of 2013. The API needs to be carefully review to be sure it will be correct on a long time. The atomic rule swapping systems needs also a good review.

More and more work power is coming to the project and with luck it may be ready in around 6 months.