Nftables quick howto

 

Introduction

This document is between a dirty howto and a cheat sheet. For a short description of some interesting nftables features, you can read Why you will love nftables.

For a description of architecture and ideas behind Nftables, please read the announce of the first release of nftables. For more global information, you can also watch the talk I’ve made at Kernel Recipes: Eric Leblond, OISF – Nftables.

Building nftables

Libraries
The following libraries are needed It is possible that your distribution already include libmnl. But it is easy to build both libraries as they build with the standard:
./autogen.sh
./configure
make
make install
ldconfig
nftables
First install dependencies:
aptitude install libgmp-dev libreadline-dev
If you want to build the documentation:
aptitude install docbook2x docbook-utils
git clone git://git.netfilter.org/nftables
cd nftables
./autogen.sh
./configure
make
make install
kernel
If you do not have already a Linux git tree, run:
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
If you already have a Linux git tree, you can just update to latest sources
cd linux
git pull --rebase
Now that you have the source, you can choose nftables option:
$ make oldconfig

Netfilter Xtables support (required for ip_tables) (NETFILTER_XTABLES) [M/y/?] m
Netfilter nf_tables support (NF_TABLES) [N/m] (NEW) m
  Netfilter nf_tables payload module (NFT_PAYLOAD) [N/m] (NEW) m
  Netfilter nf_tables IPv6 exthdr module (NFT_EXTHDR) [N/m] (NEW) m
  Netfilter nf_tables meta module (NFT_META) [N/m] (NEW) m
  Netfilter nf_tables conntrack module (NFT_CT) [N/m] (NEW) m
  Netfilter nf_tables rbtree set module (NFT_RBTREE) [N/m] (NEW) m
  Netfilter nf_tables hash set module (NFT_HASH) [N/m] (NEW) m
  Netfilter nf_tables counter module (NFT_COUNTER) [N/m] (NEW) m
  Netfilter nf_tables log module (NFT_LOG) [N/m] (NEW) m
  Netfilter nf_tables limit module (NFT_LIMIT) [N/m] (NEW) m
  Netfilter nf_tables nat module (NFT_NAT) [N/m] (NEW) m
  Netfilter x_tables over nf_tables module (NFT_COMPAT) [N/m/?] (NEW) m

IPv4 nf_tables support (NF_TABLES_IPV4) [N/m] (NEW) m
  nf_tables IPv4 reject support (NFT_REJECT_IPV4) [N/m] (NEW) m
  IPv4 nf_tables route chain support (NFT_CHAIN_ROUTE_IPV4) [N/m] (NEW) m
  IPv4 nf_tables nat chain support (NFT_CHAIN_NAT_IPV4) [N/m] (NEW) m

IPv6 nf_tables support (NF_TABLES_IPV6) [M/n] m
  IPv6 nf_tables route chain support (NFT_CHAIN_ROUTE_IPV6) [M/n] m
  IPv6 nf_tables nat chain support (NFT_CHAIN_NAT_IPV6) [M/n] m

Ethernet Bridge nf_tables support (NF_TABLES_BRIDGE) [N/m/y] (NEW) m
Now, you can build your kernel with the usual commands.

On a debian, you can do on a dual core machine:

make -j 2 deb-pkg
Or you can alternately use the old method:
CONCURRENCY_LEVEL=2 make-kpkg --revision 0.1 --rootcmd fakeroot  --initrd   --append-to-version nftables kernel_image kernel_headers

Debian users can also get kernel build from git sources: Other related packages are available in this directory.

Running it

Initial setup
To get a iptables like chain setup, use the ipv4-filter file provided in the source
nft -f files/nftables/ipv4-filter
You can then list the resulting chain:
nft list table filter
Note that filter as well as output or input are used as chain and table name. Any other string could have been used.
Basic rule handling
To drop output to a destination
nft add rule ip filter output  ip daddr 1.2.3.4 drop
Rule counters are optional with nftables and the counter keyword need to be used to activate it:
nft add rule ip filter output  ip daddr 1.2.3.4 counter drop
To add a rule to a network, you can directly use:
nft add rule ip filter output ip daddr 192.168.1.0/24 counter
To drop packet to port 80 the syntax is the following:
nft add rule ip filter input tcp dport 80 drop
To accept ICMP echo request:
nft add rule  filter input icmp type echo-request accept
To combine filtering, you just have to specify multiple time the ip syntax:
nft add rule ip filter output ip protocol icmp  ip daddr 1.2.3.4 counter drop
To delete all rules in a chain:
nft delete rule filter output
To delete one specific rule, you need to use the -a flag on nft to get the handle number:
# nft list table filter -a
table filter {
        chain output {
                 ip protocol icmp ip daddr 1.2.3.4 counter packets 5 bytes 420 drop # handle 10
...
You can then delete rule 10 with:
nft delete rule filter output handle 10
You can also flush the filter table:
nft flush table filter
It is possible to insert a rule:
nft insert rule filter input tcp dport 80 counter accept
It is possible to insert or add a rule at a specific position. To do so you need to get the handle of the rule where you want to insert or add a new one. This is done by using the -a flag in the list operation:
# nft list table filter -n  -a
table filter {
        chain output {
                 type filter hook output priority 0;
                 ip protocol tcp counter packets 82 bytes 9680 # handle 8
                 ip saddr 127.0.0.1 ip daddr 127.0.0.6 drop # handle 7
        }
}
# nft  add rule filter output position 8 ip daddr 127.0.0.8 drop 
# nft list table filter -n -a
table filter {
        chain output {
                 type filter hook output priority 0;
                 ip protocol tcp counter packets 190 bytes 21908 # handle 8
                 ip daddr 127.0.0.8 drop # handle 10
                 ip saddr 127.0.0.1 ip daddr 127.0.0.6 drop # handle 7
        }
}
Here, we’ve added a rule after the rule with handle 8. To add before the rule with a given handle, you can use:
nft insert rule filter output position 8 ip daddr 127.0.0.12 drop
If you only want to match on a protocol, you can use something like:
nft insert rule filter output ip  protocol tcp counter
IPv6
Like for IPv4, you need to create some chains. For that you can use:
nft -f files/nftables/ipv6-filter
You can then add rule:
nft add rule ip6 filter output ip6 daddr home.regit.org counter
The listing of the rules can be made with:
nft list table ip6 filter
To accept dynamic IPv6 configuration and neighbor discovery, one can use:
nft add rule ip6 filter input icmpv6 type nd-neighbor-solicit accept
nft add rule ip6 filter input icmpv6 type nd-router-advert accept
Connection tracking
To accept all incoming packets of an established connection:nft ins
nft insert rule filter input ct state established accept
Filter on interface
To accept all packets going out on loopback interface:
nft insert rule filter output oif lo accept
And for packet coming in on eth2:
nft insert rule filter input iif eth2 accept
Please note that oif is in reality a match on the integer which is the index of the interface inside of the kernel. Userspace is converting the given name to the interface index when the nft rule is evaluated (before being sent to kernel). A consequence of this is that the rule can not be added if the interface does not exist. An other consequence, is that if the interface is removed and created again, the match will not occur as the index of added interfaces in kernel is monotonically increasing. Thus, oif is a fast filter but it can lead to some issues when dynamic interfaces are used. It is possible to do a filter on interface name but it has a performance cost because a string match is done instead of an integer match. To do a filter on interface name, one has to use oifname:
nft insert rule filter input oifname ppp0 accept
Logging

Logging is made via a log keyword. A typical log and accept rule will look like:

nft add rule filter input tcp dport 22 ct state new log prefix \"SSH for ever\" group 2 accept
With nftables, it is possible to do in one rule what was split in two with iptables (NFLOG and ACCEPT). If the prefix is just the standard prefix option, the group option is containing the nfnetlink_log group if this mode is used as logging framework.

In fact, logging in nftables is using the Netfilter logging framework. This means the logging is depending on the loaded kernel module. Kernel module available are:

  • xt_LOG: printk based logging, outputting everything to syslog (same module as the one used for iptables LOG target)
  • nfnetlink_log: netlink based logging requiring to setup ulogd2 to get the events (same module as the one used for iptables NFLOG target)
To use one of the two modules, load them with modprobe.

You can then setup logging on a per-protocol basis. The configuration is available in /proc:

# cat /proc/net/netfilter/nf_log 
 0 NONE (nfnetlink_log)
 1 NONE (nfnetlink_log)
 2 nfnetlink_log (nfnetlink_log,ipt_LOG)
 3 NONE (nfnetlink_log)
 4 NONE (nfnetlink_log)
 5 NONE (nfnetlink_log)
 6 NONE (nfnetlink_log)
 7 nfnetlink_log (nfnetlink_log)
 8 NONE (nfnetlink_log)
 9 NONE (nfnetlink_log)
10 nfnetlink_log (nfnetlink_log,ip6t_LOG)
11 NONE (nfnetlink_log)
12 NONE (nfnetlink_log)
Here nfnetlink_log was loaded first and ulogd was started. For example, if you want to use ipt_LOG for IPv4 (2 in the list), you can do:
echo "ipt_LOG" >/proc/sys/net/netfilter/nf_log/2 
This will active ipt_LOG for IPv4 logging:
# cat /proc/net/netfilter/nf_log 
 0 NONE (nfnetlink_log)
 1 NONE (nfnetlink_log)
 2 ipt_LOG (nfnetlink_log,ipt_LOG)
 3 NONE (nfnetlink_log)
 4 NONE (nfnetlink_log)
 5 NONE (nfnetlink_log)
 6 NONE (nfnetlink_log)
 7 nfnetlink_log (nfnetlink_log)
 8 NONE (nfnetlink_log)
 9 NONE (nfnetlink_log)
10 nfnetlink_log (nfnetlink_log,ip6t_LOG)
11 NONE (nfnetlink_log)
12 NONE (nfnetlink_log)

If you want to do some easy testing, simply load xt_LOG module before nfnetlink_log. It will bind to IPv4 and IPv6 protocol and provide you logging.

Using one single chain
The chains are defined by user and can be arranged in any way. For example, on a single box, it is possible for example to use one single chain for input. To do so create a file onechain with:
#! nft -f

table global {
        chain one { 
                type filter hook input priority   0;
        }
}
and run
nft -f onechain
You can then add rule like:
nft add rule ip global one ip daddr 192.168.0.0/24

The advantage of this setup is that Netfilter filtering will only be active for packets coming to the box.

Set
You can used non named set with the following syntax:
nft add rule ip Filter Output ip daddr {192.168.1.1, 192.168.1.4} drop
Named set can be used in a file. For example, you can create a simple file:
define ip_set = {192.168.1.2, 192.168.2.3}
add rule filter output ip daddr $ip_set counter
and running:
nft -f simple
It is also possible to use named set. To declare one set containing ipv4 address:
nft add set filter ipv4_ad { type ipv4_address\;}
To add elements to the set:
nft add element filter ipv4_ad { 192.168.3.4 }
nft add element filter ipv4_ad { 192.168.1.4, 192.168.1.5 }
Listing the set is done via:
nft list set filter ipv4_ad
The set can then be used in rule:
nft add rule ip filter input ip saddr @ipv4_ad drop
It is possible to remove element from an existing set:
nft delete element filter ipv4_ad { 192.168.1.5 }
and to delete a set:
nft delete set Filter myset
Mapping
Mapping are a specific type of set which behave like a dictionary. For example, it is possible to map ipv4_address to a verdict:
# nft -i
nft> add map filter verdict_map { type ipv4_address : verdict; }
nft> add element filter verdict_map { 1.2.3.5 : drop}
nft> add element filter verdict_map { 1.2.3.4 : accept}

nft> add rule filter output ip daddr vmap @verdict_map

To delete one element of a mapping, you can use the same syntax as the set operation:

nft> delete element filter verdict_map 1.2.3.5
To delete one set you can use:
nft delete set filter verdict_map
Mapping can also be used in a anonymous way:
nft add rule filter output ip daddr vmap {192.168.0.0/24 : drop, 192.168.0.1 : accept}

To list a specific mapping:

nft list set filter nat_map -n

NAT

First of all, the nat module is needed:
modprobe nft_nat
Next, you need to make the kernel aware of NAT for the protocol (here IPv4):
modprobe nft_chain_nat_ipv4
Now, we can create NAT dedicated chain:
nft add table nat
nft add chain nat post { type nat hook postrouting priority 0 \; }
nft add chain nat pre { type nat hook prerouting priority 0 \; }
We can now add NAT rules:
nft add rule nat post ip saddr 192.168.56.0/24 oif wlan0 snat 192.168.1.137
nft add rule nat pre udp dport 53 ip saddr 192.168.56.0/24 dnat 8.8.8.8:53
First one is NATing all trafic from 192.168.56.0/24 outgoing to wlan0 interface to the IP 192.168.1.137. Second one is redirecting all DNS trafic from 192.168.56.0/24 to the 8.8.8.8 server. It is possible to NAT to a range of address:
nft add rule nat post ip saddr 192.168.56.0/24 oif wlan0 snat 192.168.1.137-192.168.1.140
IPv6 NAT is possible too. First, you need to load the module to declare the NAT capability for IPv6:
modprobe nft_chain_nat_ipv6
Once done, you can add rules like:
table ip6 nat {
    chain postrouting {
        type nat hook postrouting priority -150; 
        ip6 saddr 2::/64 snat 1::3;
    }
}

Building a basic ruleset

The following ruleset is a typical ruleset to protect one laptop in IPv4 and IPv6:
# IPv4 filtering
table Filter {
        chain Input {
                 type filter hook input priority 0;
                 ct state established accept
                 ct state related accept
                 iif lo accept
                 tcp dport ssh counter accept
                 counter log drop
        }

        chain Output {
                 type filter hook output priority 0;
                 ct state established accept
                 ct state related accept
                 oif lo accept
                 ct state new counter accept
        }
}
#IPv6 filtering
table ip6 Filter {
        chain Input {
                 type filter hook input priority 0;
                 ct state established accept
                 ct state related accept
                 iif lo accept
                 tcp dport ssh counter accept
                 icmpv6 type { nd-neighbor-solicit, echo-request, nd-router-advert, nd-neighbor-advert } accept
                 counter log drop
        }

        chain Output {
                 type filter hook output priority 0;
                 ct state established accept
                 ct state related accept
                 oif lo accept
                 ct state new counter accept
        }

}
 Posted by at 17:39

  71 Responses to “Nftables quick howto”

  1. I get this error:

    nft list table filter
    :1:1-17: Error: Could not receive sets from kernel: Address family not supported by protocol
    list table filter
    ^^^^^^^^^^^^^^^^^

    is this related to netlink?

  2. Hello Matteo,

    You need to create the table first (with ‘nft add table filter’). Command to handle ruleset (and avoid this kind of weird messages) should be available in upcoming release.

  3. […] 到目前为止,只有非常少的文档资料。你可以找到我的nftables快速开始,其他的一些初步文档很快就会公开。 […]

  4. Some logging clarification needed:
    Is there a way to configure system to use nfnetlink_log by default for all the protocols? I want ulogd2 to handle all my logging.
    Is there a way to use sysctl for that? – seems more convenient to use at boot time than echo “” to proc.
    How does protocol correspond to magic number in proc? I mean ” for IPv4 (2 in the list)” – what corresponds to 1? 3? 0? etc. – where is it documented?

  5. Hi Lork,

    Yes load nfnetlink_log, it should install itself as logging method for all protocols.

    Regarding protocol value, this is in fact the one defined in socket.h: http://students.mimuw.edu.pl/SO/Linux/Kod/include/linux/socket.h.html

  6. Sadly just loading nfnetlink_log is not sufficient – I see it listed by lsmod but all the /proc entries are still giving me NONE until I manually echo proper value in there.

    Is there some sysctl option or something like that to control nfnetlink_log module behavior?

    Btw, are there plans to make use of more human-readable names in addition to numbers from socket.h? Symlinks in /proc perhaps? And what about something like /proc/net/netfilter/nf_log/all – file writing to which would be equivalent to writing to every single number-file?

  7. https://github.com/devkid/nftables-systemd – this is yet another variant how to start nftables automatically. Works like a charm on Debian.

  8. How do I accomplish this in nft?
    # MAKE SURE NEW INCOMING TCP CONNECTIONS ARE SYN PACKETS; OTHERWISE WE NEED TO DROP THEM
    iptables -A INPUT -p tcp ! –syn -m state –state NEW -j LOGNDROP

  9. Hello Dave,

    Excellent question. The answer is:

    nft add rule filter input tcp flags != syn ct state new log prefix "Not a full new" drop
    

    This is using the binary operation on the tcp flags which is seen as a bit field.

  10. Question:
    When will all this be part of the running kernel on CentOS and Ubuntu? This is awesome.

  11. Hi

    If the functionality is there yet, can you make a little post on how one can configure a router with nftables?

    I cant seem to get the forwarding correctly.

    Best regards
    Kristoffer

  12. […] его конфигурирования отсутсвует в ubuntu 14.04, её можно собрать или взять из […]

  13. Hi,

    I’ve setup NAT like this:

    table ip nat {
    chain postrouting {
    type nat hook postrouting priority 0;
    ip saddr 192.168.20.0/24 oif eth0 snat 192.168.1.7
    }
    }

    but it only works if I also load the module iptable_nat

    is it normal?

  14. Hi,

    I would like to know if nftables handle target like target CLASSIFY. Sample iptables rules as below.

    iptables -t mangle -A POSTROUTING -p tcp -m multiport –sports 80,443,8080 -m physdev –physdev-out eth0 -j CLASSIFY –set-class 1:2

    Thanks.

  15. […] целиком из файла на диске. Синтаксис правил не похож на iptables и отличается использованием иерархических […]

  16. […] целиком из файла на диске. Синтаксис правил не похож на iptables и отличается использованием иерархических […]

  17. I am trying to configure a map/dict for use in the nat table. The idea is to have a map of address translations for fast lookup. I can do translations with individual lines like this:

    nft add table nat
    nft add chain nat output { type nat hook output priority 0 \; }
    nft add chain nat input { type nat hook input priority 0 \; }
    nft add rule nat output ip daddr 1.1.1.1 dnat 192.168.0.1
    nft add rule nat output ip daddr 2.2.2.2 dnat 8.8.8.8

    This works, verified with tcpdump. But I want to use a map/dict because my intended use will have thousands of translation entries, using pre/postrouting instead of input/output.
    I tried making a map like this:

    nft add maps nat fakes {type ipv4_addr: ipv4_addr \; }
    nft add an element fakes nat {1.1.1.1: 192.168.0.1}
    nft add an element fakes nat {2.2.2.2: 8.8.8.8}

    but in neither case can I work out the syntax to use the map in a nat rule: nft add rule nat output ???.
    Is it even possible to use a map for daddr -> dnat address like this?

    Yours LinuxBox

  18. hi all,

    I am trying to make a packet sniffing program in C but the codes am trying to run from various public sources are not returning traffic from my Ethernet pot ’em1′ but always reading ‘nflog interface’.

    Code is as follows :-

    /*************************************************** * file: testpcap1.c * Date: Thu Mar 08 17:14:36 MST 2001 * Author: Martin Casado * Location: LAX Airport (hehe) * * Simple single packet capture program *****************************************************/
    include
    include
    include /* if this gives you an error try pcap/pcap.h */
    include
    include
    include
    include
    include /* includes net/ethernet.h */

    int main(int argc, char **argv) { int i; char dev; //char dev[] = “em1”; / Device to sniff on / char errbuf[PCAP_ERRBUF_SIZE]; pcap_t descr; const u_char packet; struct pcap_pkthdr hdr; / pcap.h */ struct ether_header eptr; / net/ethernet.h */

    u_char *ptr; /* printing out hardware header info */

    /* grab a device to peak into… */
    dev = pcap_lookupdev(errbuf);

    if(dev == NULL)
    {
    printf(“%s\n”,errbuf);
    exit(1);
    }

    printf(“DEV: %s\n”,dev);

    /* open the device for sniffing.

    pcap_t *pcap_open_live(char *device,int snaplen, int prmisc,int to_ms,
    char *ebuf)

    snaplen – maximum size of packets to capture in bytes
    promisc – set card in promiscuous mode?
    to_ms – time to wait for packets in miliseconds before read
    times out
    errbuf – if something happens, place error string here

    Note if you change “prmisc” param to anything other than zero, you will
    get all packets your device sees, whether they are intendeed for you or
    not!! Be sure you know the rules of the network you are running on
    before you set your card in promiscuous mode!! */

    descr = pcap_open_live(dev,BUFSIZ,0,-3,errbuf);

    if(descr == NULL)
    {
    printf(“pcap_open_live(): %s\n”,errbuf);
    exit(1);
    }

    /*
    grab a packet from descr (yay!)
    u_char *pcap_next(pcap_t *p,struct pcap_pkthdr *h)
    so just pass in the descriptor we got from
    our call to pcap_open_live and an allocated
    struct pcap_pkthdr */

    packet = pcap_next(descr,&hdr);

    if(packet == NULL)
    {/* dinna work *sob* */
    printf(“Didn’t grab packet\n”);
    exit(1);
    }

    /* struct pcap_pkthdr {
    struct timeval ts; time stamp
    bpf_u_int32 caplen; length of portion present
    bpf_u_int32; lebgth this packet (off wire)
    }
    */

    printf(“Grabbed packet of length %d\n”,hdr.len);
    printf(“Recieved at ….. %s\n”,ctime((const time_t*)&hdr.ts.tv_sec));
    printf(“Ethernet address length is %d\n”,ETHER_HDR_LEN);

    /* lets start with the ether header… */
    eptr = (struct ether_header *) packet;

    /* Do a couple of checks to see what packet type we have..*/
    if (ntohs (eptr->ether_type) == ETHERTYPE_IP)
    {
    printf(“Ethernet type hex:%x dec:%d is an IP packet\n”,
    ntohs(eptr->ether_type),
    ntohs(eptr->ether_type));
    }else if (ntohs (eptr->ether_type) == ETHERTYPE_ARP)
    {
    printf(“Ethernet type hex:%x dec:%d is an ARP packet\n”,
    ntohs(eptr->ether_type),
    ntohs(eptr->ether_type));
    }else {
    printf(“Ethernet type %x not IP”, ntohs(eptr->ether_type));
    exit(1);
    }

    /* copied from Steven’s UNP */
    ptr = eptr->ether_dhost;
    i = ETHER_ADDR_LEN;
    printf(” Destination Address: “);
    do{
    printf(“%s%x”,(i == ETHER_ADDR_LEN) ? ” ” : “:”,*ptr++);
    }while(–i>0);
    printf(“\n”);

    ptr = eptr->ether_shost;
    i = ETHER_ADDR_LEN;
    printf(” Source Address: “);
    do{
    printf(“%s%x”,(i == ETHER_ADDR_LEN) ? ” ” : “:”,*ptr++);
    }while(–i>0);
    printf(“\n”);

    return 0;

    }

    and it returns the output as :-

    DEV : nflog

    and then it keeps blinking.

    WHat can be the issue here?

    I also tried stating the Dev as em1 but the output was as follows:-

    DEV: em1 Didn’t grab packet

    Tcpdump -D gives the following output :-

    tcpdump -D 1.nflog (Linux netfilter log (NFLOG) interface) 2.nfqueue (Linux netfilter queue (NFQUEUE) interface) 3.em1 4.usbmon1 (USB bus number 1) 5.usbmon2 (USB bus number 2) 6.usbmon3 (USB bus number 3) 7.usbmon4 (USB bus number 4) 8.any (Pseudo-device that captures on all interfaces) 9.lo

  19. […] синтаксиса iptables и постепенно внедряя новый синтаксис nft. Работающая поверх nftables прослойка включает в себя […]

  20. […] que sous linux, avec iptables (bientôt deprecated au passage, au profit de nftables (wiki, howto)), on peut jouer au parano en quelques […]

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

(required)

(required)