Jan 202014
 

Linux 3.13 is out

Linux 3.13 is out bringing among other thing the first official release of nftables. nftables is the project that aims to replace the existing {ip,ip6,arp,eb}tables framework aka iptables. nftables version in Linux 3.13 is not yet complete. Some important features are missing and will be introduced in the following Linux versions. It is already usable in most cases but a complete support (read nftables at a better level than iptables) should be available in Linux 3.15.

nftables comes with a new command line tool named nft. nft is the successor of iptables and derivatives (ip6tables, arptables). And it has a completely different syntax. Yes, if you are used to iptables, that’s a shock. But there is a compatibility layer that allow you to use iptables even if filtering is done with nftables in kernel.

There is only really few documentation available for now. You can find my nftables quick howto and there is some other initiatives that should be made public soon.

Some command line examples

Multiple targets on one line

Suppose you want to log and drop a packet with iptables, you had to write two rules. One for drop and one for logging:

iptables -A FORWARD -p tcp --dport 22 -j LOG
iptables -A FORWARD -p tcp --dport 22 -j DROP

With nft, you can combined both targets:

nft add rule filter forward tcp dport 22 log drop
Easy set creation

Suppose you want to allow packets for different ports and allow different icmpv6 types. With iptables, you need to use something like:

ip6tables -A INPUT -p tcp -m multiport --dports 23,80,443 -j ACCEPT
ip6tables -A INPUT -p icmpv6 --icmpv6-type neighbor-solicitation -j ACCEPT
ip6tables -A INPUT -p icmpv6 --icmpv6-type echo-request -j ACCEPT
ip6tables -A INPUT -p icmpv6 --icmpv6-type router-advertisement -j ACCEPT
ip6tables -A INPUT -p icmpv6 --icmpv6-type neighbor-advertisement -j ACCEPT

With nft, sets can be use on any element in a rule:

nft add rule ip6 filter input tcp dport {telnet, http, https} accept
nft add rule ip6 filter input icmpv6 type { nd-neighbor-solicit, echo-request, nd-router-advert, nd-neighbor-advert } accept

It is easier to write and it is more efficient on filtering side as there is only one rule added for each protocol.

You can also use named set to be able to make them evolve other time:

# nft -i # use interactive mode
nft> add set global ipv4_ad { type ipv4_address;}
nft> add element global ipv4_ad { 192.168.1.4, 192.168.1.5 }
nft> add rule ip global filter ip saddr @ipv4_ad drop
And later when a new bad boy is detected:
# nft -i
nft> add element global ipv4_ad { 192.168.3.4 }
Mapping

One advanced feature of nftables is mapping. It is possible to use to different type of data and to link them. For example, we can associate iface and a dedicated rule set (stored in a chain and created before). In the example, the chains are named low_sec and high_sec:

# nft -i
nft> add map filter jump_map { type ifindex : verdict; }
nft> add element filter jump_map { eth0 : jump low_sec; }
nft> add element filter jump_map { eth1 : jump high_sec; }
nft> add rule filter input iif vmap @jump_map

Now, let’s say you have a new dynamic interface ppp1, it is easy to setup filtering for it. Simply add it in the jump_map mapping:

nft> add element filter jump_map { ppp1 : jump low_sec; }

On administration and kernel side

More speed at update

Adding a rule in iptables was getting dramatically slower with the number of rules and that’s explained why script using iptables call are taking a long time to complete. This is not anymore with nftables which is using atomic and fast operation to update rule sets.

Less kernel update

With iptables, each match or target was requiring a kernel module. So, you had to recompile kernel in case you forgot something or want to use something new. this is not anymore the case with nftables. In nftables, most work is done in userspace and kernel only knows some basic instruction (filtering is implemented in a pseudo-state machine). For example, icmpv6 support has been achieved via a simple patch of the nft tool. This type of modification in iptables would have required kernel and iptables upgrade.

  35 Responses to “Why you will love nftables”

  1. Please add destination-unreachable and packet-too-big to your icmpv6 accept list as it will save people copy/pasting your ruleset a LOT of time.
    Not having them breaks path mtu discovery, causes long timeouts of all sorts and are often hard to troubleshoot.

    I would recommend

    nft add rule filter input icmpv6 type { nd-neighbor-solicit, nd-neighbor-advert, nd-router-advert, destination-unreachable, packet-too-big, param-problem, mld-listener-query, mld-listener-report, mld-listener-reduction, echo-request, echo-reply } accept

    as a default rule for any ipv6 host.

  2. Can you comment about these “other initiatives that should be made public soon”? One of the things that people is criticizing about nftables today in forums is the lack of documentation

  3. Netfilter coreteam and the community are currently making a really big development effort to have the code inside kernel and to have userspace tool ready. We will try to make available ASAP a wiki with the official documentation. But with our limited resources, documentation has to sadly come after code.

  4. “With iptables, each match or target was requiring a kernel module. So, you had to recompile kernel in case you forgot something or want to use something new.”

    Well you would usually be compiling matches or targets as loadable kernel modules so you’d only have to compile one or two files in case you forgot any module.

  5. You are right Peteris. What you win with nftables is not thinking about the kernel to get new features by only rebuilding user space parts.

  6. Nftables have gotten me very excited as I’m a huge iptables fan. I’m compiling 3.13 right now to try them out. I think I’ll write some tutorials on getting started with nftables as well.

    Regit, I wonder what the overall transition process to nftables is going to be? Are iptables going to be deprecated and removed from the kernel in favor of nftables in the future?

    Another question I’ve is how do you queue packets to user space with nftables like with `-j QUEUE` in iptables? Or is that something that still needs to be written for nftables?

  7. Hi again Peteris!

    Really good to hear that you wanna write some tutorials! Don’t hesitate if you have any questions.

    A compatibility layer has been developed so we should be able to use iptables syntax in userspace and nftables in kernel space for some time (read years).

    I’ve ported the NFQUEUE target to nftables, it will be available for Linux 3.14. In Linux 3.13, I think you can only use the queue target that will send packet to queue 0.

  8. How will nftables cope with the more complex iptables solutions. (recent, ipset, helpers, cluster, tracking/marking, cpu, limit, owner, policy, quota, string, statistic, tcpmss, time, tos, ttl; notrack, redirect, snat/dnat to name some of the more interesting ones).

  9. For “nft add rule ip6 filter tcp dport {telnet, http, https} accept”, how does nft know to add that rule to the “input chain”? Is the default “chain” the “input chain”?

  10. Hello question. Sorry it is a mistake i’ve made, you need to specify the chain. I’ve updated the blog post with complete commands.

  11. Maps look pretty broken.
    Firstly, the following will not parse as input (presumably because of a cyclic reference) although nft list table filter produces this as apparently valid output. Its possible to declare this in interfactive mode:


    table ip filter {
    map iface_verdict {
    type ifindex : verdict
    elements = { virbr0 : jump physical, wlp3s0 : jump physical, p4p1 : jump physical}
    }

    chain input {
    type filter hook input priority 0;
    ct state { related, established} accept
    ip protocol icmp icmp type { echo-request} accept
    ip daddr { 192.168.50.0/24, 192.168.100.0/24, 192.168.122.0/24} iif vmap @iface_verdict
    reject
    }

    chain virtual {
    }

    chain physical {
    }
    }

    Worse, if you define a verdict map with jumps in such as above, then delete a chain being used as a verdict in the map – the kernel will panic.

    Finally, (this is more of an annoyance than anything), it would seem there is no way to ‘reset’ tables in nft, that is if I have a nft declaration in a file and want to re-run it I first have to flush the table, delete the map, delete all the chains then delete the table. There should be a simple way to just clear the table definitions being used so you can reapply a saved config.

  12. Hello Matthew,

    Thanks for the feedback. I will try to have a close look in the following days.

  13. The kernel bug I reported for the BUG message is now fixed, it would be great if you update your post to indicate the key/value delimeter to be : instead of => as the grammer was changed.

    Its good to see iptables die, this grammer is much more concise, iptables rulesets can become horrendous to follow.

  14. Hello Matthew. What I was thinking for kernel bug. I’ve updated the syntax in the articles. Thanks for your feedback.

  15. […] If you like Packet Filter, you’ll be happy. If you’re not sure of the advantages of it, simply read that short comparison and you’ll be convinced : https://home.regit.org/2014/01/why-you-will-love-nftables/ […]

  16. How would a full configuration look like in order to block any type of communication, except the loopback interface which should be unrestricted, and as for the eth0, allow output for http and https only, plus allow input/output for all established connections and the icmp needed to manage those connections, plus allow the minimum needed for handling the dynamic dhcp. Basically, the typical laptop/desktop configuration that wants to be fully locked down, except allowing browsing the web. Can you please post a full configuration for this scenario? Thanks

  17. Hello Sorin. You can have a look at ‘Building a basic ruleset’ on https://home.regit.org/netfilter-en/nftables-quick-howto/. It is almost what you want.

  18. what about l7-filter and alike?
    can I match on sip packets for instance with nftable?

  19. My choice of a strict firewall:


    #! nft -f

    delete table ip firewall
    delete table ip6 firewall

    table ip firewall {
    chain incoming {
    type filter hook input priority 0;

    #bad tcp
    tcp flags & (fin|syn) == (fin|syn) drop
    tcp flags & (syn|rst) == (syn|rst) drop
    tcp flags & (fin|syn|rst|psh|ack|urg) < (fin) drop # == 0 would be better, not supported yet.
    tcp flags & (fin|syn|rst|psh|ack|urg) == (fin|psh|urg) drop

    ct state {established, related} accept
    ct state invalid drop
    iifname lo accept
    icmp type {echo-request} drop
    icmp accept
    udp sport bootps dport bootpc accept
    ip saddr 127.0.0.1 tcp dport {http, postgresql, ipp} accept
    reject
    }

    chain outgoing {
    type filter hook output priority 0;
    ct state {established, related} accept
    ct state invalid drop
    oifname lo accept
    icmp type {echo-reply} drop
    icmp accept
    udp sport bootpc dport bootps accept
    ip daddr 127.0.0.1 tcp dport {http, postgresql, ipp} accept
    udp dport dns accept
    tcp dport {dns, http, ntp, https, 9418} accept
    reject
    }

    chain forwarding {
    type filter hook forward priority 0;
    reject
    }
    }

    table ip6 firewall {
    chain incoming {
    type filter hook input priority 0;

    #bad tcp
    tcp flags & (fin|syn) == (fin|syn) drop
    tcp flags & (syn|rst) == (syn|rst) drop
    tcp flags & (fin|syn|rst|psh|ack|urg) < (fin) drop # == 0 would be better, not supported yet.
    tcp flags & (fin|syn|rst|psh|ack|urg) == (fin|psh|urg) drop

    ct state {established, related} accept
    ct state invalid drop
    iifname lo accept
    icmpv6 type {echo-request} drop
    icmpv6 accept
    udp sport bootps dport bootpc accept
    ipv6 saddr ::1 tcp dport {http, postgresql, ipp} accept
    reject
    }

    chain outgoing {
    type filter hook output priority 0;
    ct state {established, related} accept
    ct state invalid drop
    oifname lo accept
    icmpv6 type {echo-reply} drop
    icmpv6 accept
    udp sport bootpc dport bootps accept
    ip daddr ::1 tcp dport {http, postgresql, ipp} accept
    udp dport dns accept
    tcp dport {dns, http, ntp, https, ipp, 9418} accept
    reject
    }

    chain forwarding {
    type filter hook forward priority 0;
    reject
    }
    }

  20. Hello Simon,

    Thanks really interesting proposal!
    I would only change a few things/make a few comments:
    Regarding your rule on tcp flags. You can set them in the prerouting chain by creating and using a dedicated chain before connection tracking:
    type filter hook prerouting priority -300;
    This way you filter these packets before the connection tracking work.

    Other minor point (can be discuss), we don’t need the forward chain. But if you want to avoid routing packet if something activate routing that’s a good move.

    Rules like ‘ip daddr ::1 tcp dport {http, postgresql, ipp} accept’ are useless as you have a ‘iifname lo accept’ before.

  21. The idea I’m following is to block everything, except what I need to keep open. In my case, I run a PostgreSQL (port postgresql) and a Nginx web server (port http) for my own use and therefore should only be accessible as localhost:*localhost:80. Neither the database nor the web server run on the loopback interface. They’re on eth0. That why I added the ‘ip daddr ::1 tcp dport {http, postgresql, ipp} accept’ rule. The printer on the other hand (port ipp), is also accessible by localhost only (not on the local network too) and it’s on both interfaces (loback for printing, eth0 for settings.) You can notice even the outgoing ports are blocked, except the ones I need: http, https, ntp, dns, git. If somehow a program like Pidgin (or a virus) manages to run without my knowledge, it won’t be able to communicate because it’s port is blocked, unless it chooses one of the open ports. Another thing I’m thinking adding would be a limit rate per minute for different protocols especially icmp. Your first point, regarding prerouting is great indeed.

  22. Better structured firewall, per protocol, so it’s easier to keep track. This one keeps PostgreSQL, web and printer servers only accessible locally, not over the LAN or Internet and allow basic things like browsing (http, https), and like time servers, DHCP, DNS.


    #! nft -f

    delete table ip firewall
    delete table ip6 firewall

    table ip filter {
    chain prerouting {
    type filter hook prerouting priority -300;

    #bad tcp
    tcp flags & (fin|syn) == (fin|syn) drop
    tcp flags & (syn|rst) == (syn|rst) drop
    tcp flags & (fin|syn|rst|psh|ack|urg) < (fin) drop # == 0 would be better, not supported yet.
    tcp flags & (fin|syn|rst|psh|ack|urg) == (fin|psh|urg) drop
    }

    chain input {
    type filter hook input priority 0;

    ct state {established, related} accept
    ct state invalid drop
    iifname lo accept # LOOPBACK INTERFACE

    icmp type {echo-request} drop # ICMP
    icmp limit rate 6/minute accept # ICMP

    udp sport bootps dport bootpc limit rate 6/minute accept # DHCP *:bootpc(68) - *:bootps(67)
    {udp, tcp} sport domain ip daddr 127.0.0.1 accept # DNS localhost:* - *:domain(53)
    {udp, tcp} sport ntp ip daddr 127.0.0.1 dport ntp accept # NTP localhost:ntp(123) - *:ntp(123)
    tcp ip saddr 127.0.0.1 sport ipp ip daddr 127.0.0.1 accept # PRINT localhost:* - localhost:ipp(631)
    tcp sport http ip daddr 127.0.0.1 accept # HTTP localhost:* - *:http(80)
    tcp ip saddr 127.0.0.1 sport postgresql ip daddr 127.0.0.1 accept # POSTGRESQL localhost:* - localhost:postgresql(5432)
    tcp sport git ip daddr 127.0.0.1 accept # GIT localhost:* - *:git(9418)

    reject
    }

    chain output {
    type filter hook output priority 0;

    ct state {established, related} accept
    ct state invalid drop
    oifname lo accept # LOOPBACK INTERFACE

    icmp type {echo-reply} drop # ICMP
    icmp limit rate 6/minute accept # ICMP

    udp sport bootpc dport bootps limit rate 6/minute accept # DHCP
    {udp, tcp} ip saddr 127.0.0.1 dport domain accept # DNS
    {udp, tcp} ip saddr 127.0.0.1 sport ntp dport ntp accept # NTP
    tcp ip saddr 127.0.0.1 ip daddr 127.0.0.1 dport ipp accept # PRINT
    tcp ip saddr 127.0.0.1 dport http accept # HTTP
    tcp ip saddr 127.0.0.1 ip daddr 127.0.0.1 dport postgresql accept # POSTGRESQL
    tcp ip saddr 127.0.0.1 dport git accept # GIT

    reject
    }

    chain forward {
    type filter hook forward priority 0;
    reject
    }
    }

    table ip6 filter {
    chain prerouting {
    type filter hook prerouting priority -300;

    #bad tcp
    tcp flags & (fin|syn) == (fin|syn) drop
    tcp flags & (syn|rst) == (syn|rst) drop
    tcp flags & (fin|syn|rst|psh|ack|urg) < (fin) drop # == 0 would be better, not supported yet.
    tcp flags & (fin|syn|rst|psh|ack|urg) == (fin|psh|urg) drop
    }

    chain input {
    type filter hook input priority 0;

    ct state {established, related} accept
    ct state invalid drop
    iifname lo accept # LOOPBACK INTERFACE

    icmpv6 type {echo-request} drop # ICMP
    icmpv6 limit rate 6/minute accept # ICMP

    udp sport bootps dport bootpc limit rate 6/minute accept # DHCP
    {udp, tcp} sport domain ipv6 daddr ::1 accept # DNS
    {udp, tcp} sport ntp ipv6 daddr ::1 dport ntp accept # NTP
    tcp ipv6 saddr ::1 sport ipp ipv6 daddr ::1 accept # PRINT
    tcp sport http ipv6 daddr ::1 accept # HTTP
    tcp ipv6 saddr ::1 sport postgresql ipv6 daddr ::1 accept # POSTGRESQL
    tcp sport git ipv6 daddr ::1 accept # GIT

    reject
    }

    chain output {
    type filter hook output priority 0;

    ct state {established, related} accept
    ct state invalid drop
    oifname lo accept # LOOPBACK INTERFACE

    icmpv6 type {echo-reply} drop # ICMP
    icmpv6 limit rate 6/minute accept # ICMP

    udp sport bootpc dport bootps limit rate 6/minute accept # DHCP
    {udp, tcp} ipv6 saddr ::1 dport domain accept # DNS
    {udp, tcp} ipv6 saddr ::1 sport ntp dport ntp accept # NTP
    tcp ipv6 saddr ::1 ipv6 daddr ::1 dport ipp accept # PRINT
    tcp ipv6 saddr ::1 dport http accept # HTTP
    tcp ipv6 saddr ::1 ipv6 daddr ::1 dport postgresql accept # POSTGRESQL
    tcp ipv6 saddr ::1 dport git accept # GIT

    reject
    }

    chain forward {
    type filter hook forward priority 0;
    reject
    }
    }

  23. Yes Simon, I understand your point but one of your first rule accept all traffic for lo interface. So all rules using ::1 or 127.0.0.1 in filter below are useless because the traffic is going through lo. Add counter to these rules, you will see there is no packet.

    By the way in the case of lo filtering it is more efficient to use iif or oif as the interface name to index mapping is constant for this interface which is not recreated during an uptime.

  24. Hmm… looks neat but how do I make port-forwarding on my NAT box?

    Say I’m running NAT with nftables with “ip saddr 10.1.2.0/24 meta oif eth0 snat 192.168.7.8″ in postrouting chain – and it works nicely. Now I want all the connections to my external 192.168.7.8:8081 to be forwarded to one f the boxes behind the NAT: 10.1.2.3:8081 and logged/counted.

    Which chain should I use? Which rule?

  25. Hi Lork,

    To do destination NAT, use something like:

    nft add chain nat pre { type nat hook prerouting priority 0 \; }
    nft add rule nat pre tcp dport 8081 ip daddr 192.168.7.8 ip dnat 10.1.2.3:8081

  26. Erm… I’m doing it wrong probably: “syntax error – unexpected dnat”

    Here is my table:

    #! nft -f

    table nat {
    chain prerouting {
    type nat hook prerouting priority -150;
    udp dport 5060 ip daddr 192.168.10.148 ip dnat 192.168.56.10:5060;
    }
    chain postrouting {
    type nat hook postrouting priority -150;
    ip saddr 192.168.56.0/24 meta oif eth0 snat 192.168.10.148;
    }
    }

  27. hello Lork,

    Yes, you have a ‘ip’ before ‘dnat’ which should be removed.

  28. Thanks, works nicely now :)

    Related question – how do I log that in both directions? I mean I can just add “log” to dnat rule and get the data on packets going into this tunnel from outside. What about returning packets?

  29. Hi Lork,

    No change on NAT side for nftables, the table only sees the first packet of a connection. The rest is handled transparently by the conntrack. So a log in dnat will only generate one message per connection.

  30. I haven’t had too much of a chance to explore nf-tables, but I have a few questions – what is the performance of nf-tables compared to iptables on a similar system? Has the latency improved, especially when talking about the latency thats added with a bridge?

    I saw NFQUEUE was ported – can you have multiple queues? Is NFLOG ported as well?

  31. So how does one mutate a set during an actual rule execution?

    Is there an target such that e.g. “blah blah match blah +setname{srcip} drop” — or less perl-ish “add element setname { srcip }” target — so that e.g. the source ip can be added to @setname?

    More relaistically, suppose you see some TCP shenagans from someone and you wan to ban their future actions right at the start of your chain:
    srcip @bad_actors drop; #don’t even bother with the bad actors
    # complex rules here
    tcp flags & (fin|syn) == (fin|syn) +bad_actors{srcip} drop;

    I suppose canonically long form would be
    tcp flags & (fin|syn) == (fin|syn) add element bad_actors { srcip } drop;

    Sets without in-rule mutability seems pretty limited.

  32. I like what I see so far, but it could really use some punctuation. Of course I’m the kind of freak who’s iptables script usese long form like –jump and –append instead of the single letter equivelents.

    In particular I’d like to see (optional?) commas between the various targets. Since not all targets are single words and some can be quite wordy, “counter drop” and “counter, drop” are worlds apart for readability when you get into territory like

    “ip dnat hostname jump nospoof counter”

    The human readability seems like it’s headed into the land of madness…

  33. Simon:

    Don’t “reject”, use “drop” at the ends of your chains. If you use “reject” you make yourself a reflection host. That is, bad actors can send you well-crafted packets so that your computer then spams a third party with rejection packets. While that doesn’t make you an attack amplifier it does make your box an attack anonymizer.

    Reject is for friendly rejections, such as busy services you offer or will offer later. Dropping is for bad actors and nonsense. In your script you clearly think of people getting to the end of your chain as bad actors, or at least nonsense.

  34. Anybody know why the type stanza needs a semicolon and nothing else does? Arent each of “type fliter” “hook whatever” and “priority number” unique and contextually unambiguous to parse?

    Actally I’d prefer that all directives have semicolons at the end in the dump/load format — e.g. the presence of enclosing braces. The use of whitespace is as a directive separator is very python, but not very fun as an element of a parseable source code file.

 Leave a Reply

(required)

(required)

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>