Using NFQUEUE and libnetfilter_queue

Introduction

NFQUEUE is an iptables and ip6tables target which delegate the decision on packets to a userspace software. For example, the following rule will ask for a decision to a listening userpsace program for all packet going to the box:

iptables -A INPUT -j NFQUEUE --queue-num 0

In userspace, a software must used libnetfilter_queue to connect to queue 0 (the default one) and get the messages from kernel. It then must issue a verdict on the packet.

Inner working

To understand NFQUEUE, the easiest way is to understand the architecture inside Linux kernel. When a packet reach an NFQUEUE target it is en-queued to the queue corresponding to the number given by the --queue-num option. The packet queue is a implemented as a chained list with element being the packet and metadata (a Linux kernel skb):

  • It is a fixed length queue implemented as a linked-list of packets.
  • Storing packet which are indexed by an integer
  • A packet is released when userspace issue a verdict to the corresponding index integer
  • When queue is full, no packet can be enqueued to it

This has some implication on userspace side:

  • Userspace can read multiple packets and wait for giving a verdict. If the queue is not full there is no impact of this behavior.
  • Packets can be verdict without order. Userspace can read packet 1,2,3,4 and verdict at 4,2,3,1 in that order.
  • Too slow verdict will result in a full queue. Kernel will then drop incoming packets instead of en-queuing them.

A few word about the protocol between kernel and userspace

The protocol used between kernel and userspace is nfnetlink. This is a message based protocol which does not involved any shared memory. When a packet is en-queued, the kernel sends a nfnetlink formatted message containing packet data and related information to a socket and userspace reads this message. To issue a verdict, userspace format a nfnetlink message containing the index number of the packet and send it to the communication socket.

Using libnetfilter_queue in C

The main source of information for libnetfiler_queue is the Doxygen generated documentation.

There is three step in the usage of the library:

If you want to look at production code, you can have a look at source-nfq.c in suricata which is a multithread implementation of libnetfilter_queue.

Example software architecture

The simplest architecture is made of one thread reading the packet and issuing the verdict. The following code is not complete but show the logic of the implementation.
[C]
/* Definition of callback function */
static int cb(struct nfq_q_handle *qh, struct nfgenmsg *nfmsg,
         struct nfq_data *nfa, void *data)
{
int verdict;
    u_int32_t id = treat_pkt(nfa, &verdict); /* Treat packet */
    return nfq_set_verdict(qh, id, verdict, 0, NULL); /* Verdict packet */
}

/* Set callback function */
qh = nfq_create_queue(h, 0, &cb, NULL);
for (;;) {
if ((rv = recv(fd, buf, sizeof(buf), 0)) >= 0) {
        nfq_handle_packet(h, buf, rv); /* send packet to callback */
        continue;
}
}
[/C]

It is also possible to have a reading thread and a verdict thread:
[C]

PacketPool *ppool;

/* Definition of callback function */
static int cb(struct nfq_q_handle *qh, struct nfgenmsg *nfmsg,
         struct nfq_data *nfa, void *data)
{
/* Simply copy packet date and send them to a packet pool */
return push_packet_to_pool(ppool, nfa);
}

int main() {
/* Set callback function */
qh = nfq_create_queue(h, 0, &cb, NULL);
/* create reading thread */
pthread_create(read_thread_id, NULL, read_thread, qh);
/* create verdict thread */
pthread_create(write_thread_id, NULL, verdict_thread, qh);
/* … */
}

static void *read_thread(void *fd)
{
for (;;) {
if ((rv = recv(fd, buf, sizeof(buf), 0)) >= 0) {
         nfq_handle_packet(h, buf, rv); /* send packet to callback */
         continue;
}
}
}

static void *verdict_thread(void *fd)
{
for (;;) {
Packet p = fetch_packet_from_pool(ppool);
u_int32_t id = treat_pkt(nfa, &verdict); /* Treat packet */
     nfq_set_verdict(qh, id, verdict, 0, NULL); /* Verdict packet */
}
}
[/C]

Other languages

Pierre Chifflier (aka pollux) has developed bindings for libnetfilter_queue which can be used in most high level languages (python, perl, ruby, …): nfqueue-bindings.

Advanced features

Multiqueue

--queue-balance is an NFQUEUE option which has been added by Florian Westphal to be able to load balanced packet queued by the same iptables rules to multiple queues. The usage is fairly simple. For example, to load balance INPUT traffic to queue 0 to 3, the following rule can be used.

iptables -A INPUT -j NFQUEUE --queue-balance 0:3

One point that worht to be mentioned is that the load-balancing is made with respect to the flow and all packet of a flow are sent to the same queue.

The extension is available since Linux kernel 2.6.31 and iptables v1.4.5.

–queue-bypass

--queue-bypass is on other NFQUEUE option by Florian Westphal. It change the behavior of a iptables rules when no userspace software is connected to the queue. Instead of dropping packets, the packet are authorized if no software is listening to the queue.

The extension is available since Linux kernel 2.6.39 and iptables v1.4.11.

This feature is broken from kernel 3.10 to 3.12: when using a recent iptables, passing the option --queue-bypass has no effect on these kernels.

fail-open

This option is available since Linux 3.6 and allow to accept packet instead of dropping them when the queue is full. An
example usage can be found in suricata.

batching verdict

Since Linux 3.1, it is possible to use batch verdict. Instead of sending a verdict for one packet, it is possible to send a verdict to all packets with an id inferior to a given id. To do so, one must used the nfq_set_verdict_batch or nfq_set_verdict_batch2 functions.

This system has performance advantage as the limitation of messages increase the packet rate. But it can introduce latency as packets are verdict at once. It is thus responsibility of the userspace software to find adaptive techniques to limit latency by issuing verdict faster and notably in the case where there is few packets.

Misc

nfnetlink_queue entry in /proc

nfnetlink_queue has a dedicated entry in /proc: /proc/net/netfilter/nfnetlink_queue

cat /proc/net/netfilter/nfnetlink_queue 
   40  23948     0 2 65531     0     0      106  1

The content is the following:

  • queue number
  • peer portid: good chance it is process ID of software listening to the queue
  • queue total: current number of packets waiting in the queue
  • copy mode: 0 and 1 only message only provide meta data. If 2 message provide a part of packet of size copy range.
  • copy range: length of packet data to put in message
  • queue dropped: number of packets dropped because queue was full
  • user dropped: number of packets dropped because netlink message could not be sent to userspace. If this counter is not zero, try to increase netlink buffer size. On the application side, you will see gap in packet id if netlink message are lost.
  • id sequence: packet id of last packet
  • 1

Frequently Asked Questions

libnetfilter_queue and multithread

libnetfilter_queue is dependent of message sent to a socket. The send/recv operation need to be protected by lock to avoid concurrent writing. This means that the nfq_set_verdict2 and nfq_handle_packet function needs to be protected by lock mechanism.

Receiving a message and sending a message are completely separate operations which don’t share any memory. In particular, the verdict only use the packet index as information. So as soon as locking is made different threads can verdict for any packet in the queue.

Packet reordering

packet reordering can be easily made with NFQUEUE as the verdict operation can be made for any en-queued packet. Although, one thing to consider is that the kernel implementation of queued packet is made with a linked list. So it is costly to verdict packet which are not at start of the list (oldest packet is first).

libnetfilter_queue and zero copy

As communication between kernel and userspace is based upon messages sent to a netlink socket, there is no such thing as zero copy. Patrick McHardy has started a memory mapped implementation of netlink and, thus, zero copy may be possible in the future.

66 thoughts on “Using NFQUEUE and libnetfilter_queue”

  1. Excellent information.

    Do you know what the performance impacts of iptables is?
    Is there an overhead for ip_filter in the absence of an rules? (an engineer in our company claimed 10-15% performance impact)
    What is the cost for iptables to select packets (is there a per rule evaluation cost? )
    What is cost to enqueue and send to user space.
    Verdict cost ? (best / worst case?)

    I have been trying to determine the issues you documented, as well as these performance issues.

    Thanks for your analysis,and documentation.

  2. The performance cost of iptables is totally dependent on how many rules – including, how many matches and targets – you execute. There is also a static cost for entering the filter hook and then not do any work due to an empty ruleset, which is why iptables successors allow to disconnect the hook.

  3. very good information

    You say : “it is possible to use batch verdict. Instead of sending a verdict for one packet, it is possible to send a verdict to all packets with an id inferior to a given id. To do so, one must used the nfq_set_verdict_batch or nfq_set_verdict_batch2 functions.”

    ok ,by this defination anyway we can’t buffer the packets and process them in userspace and then set verdict for every packet.
    we can just process packets one by one and set the verdict for a packet or set the verdict for a packet and inferior packets to this packet ,is that right ?

    we can’t have parallel processing on packets and set the verdict ?

    best kaiwan

  4. Hello kaiwan,

    You can have the following architecture: [READING thread] (Packet queue) [VERDICT thread]
    This one is for example used in some suricata running mode. So you can have parallel processing on packets and verdict.

  5. thanks to answer Regit

    can u explain some more.
    i wanna use GPU to create a lote of thread for example 20 thread for 20 packets, and to do this i need to buffer the packets.
    and i used libnetfilter_queue to accept or drop these packets.
    my processing code doesn’t run on cpu (i mean the processing of comparison packets with ruleset), it run on GPU

    best

  6. Hello again kaiwan!

    Did you see the two code samples I’ve just added ? It should show you how this can be done.

    You can buffer the packets by copying data, packet id, and needed metadata to a custom structure (used as buffer). Then to issue the verdict on one packet, just issue the verdict on the packet id.

  7. hello again,sorry to bother u
    this is my code, it’s not parallel ,first i want to buffer the packets in callback function ( cb function), about 10 packet (icmp packet), then set verdict by calling the function filter (struct nfq_data *tb, struct nfq_q_handle * qh) , but i just give the reply of one packet in this code ,
    i read ur code maybe my code is same ,if it’s not plz tell me what’s the problem, can i work by this library (libnetfilter_queue) or for this application i have to change the library to get the packets and buffer them and finaly reinject them .

    best

    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include

    #include
    #include

    static void filter (struct nfq_data *tb, struct nfq_q_handle * qh);
    static int cb(struct nfq_q_handle *qh, struct nfgenmsg *nfmsg,
    struct nfq_data *nfa, void *data);

    struct nfq_handle
    {
    struct nfnl_handle *nfnlh;
    struct nfnl_subsys_handle *nfnlssh;
    struct nfq_q_handle *qh_list;
    };
    struct nfq_data {
    struct nfattr **data;
    };
    struct nfq_q_handle * queuehandle;
    struct nfq_data nfqdata[10];

    #define buffer 10
    int counter;

    static void filter (struct nfq_data *tb, struct nfq_q_handle * qh)
    {
    int packet_id = 0;
    unsigned char *full_packet;
    struct nfqnl_msg_packet_hdr *ph;
    struct iphdr *ip;
    struct in_addr ipa;
    //struct nfq_data *packet = (nfq_data *)malloc(sizeof(nfq_data));
    int i;
    for (i = 0; i saddr;
    strcpy(src_ip_str, inet_ntoa(ipa));
    ipa.s_addr = ip->daddr;
    strcpy(dst_ip_str, inet_ntoa(ipa));

    ph = nfq_get_msg_packet_hdr(tb + i);
    if (ph)
    packet_id = ntohl(ph->packet_id);
    nfq_set_verdict(qh, packet_id, NF_ACCEPT, 0, NULL);
    }

    }

    static int cb(struct nfq_q_handle *qh, struct nfgenmsg *nfmsg,
    struct nfq_data *nfa, void *data)
    {

    memcpy(nfqdata + counter, nfa,sizeof(nfq_data));
    counter++;
    if(counter == buffer){
    printf(“inside the callback\n”);
    filter(nfqdata, qh);
    counter = 0;
    }
    return 1;
    }

    int main(int argc, char **argv)
    {
    struct nfq_handle *h;
    struct nfq_q_handle *qh;
    int fd;
    int rv;
    char buf[4096] __attribute__ ((aligned));

    printf(“opening library handle\n”);
    h = nfq_open();
    if (!h) {
    fprintf(stderr, “error during nfq_open()\n”);
    exit(1);
    }

    printf(“unbinding existing nf_queue handler for AF_INET (if any)\n”);
    if (nfq_unbind_pf(h, AF_INET) < 0) {
    fprintf(stderr, "error during nfq_unbind_pf()\n");
    exit(1);
    }

    printf("binding nfnetlink_queue as nf_queue handler for AF_INET\n");
    if (nfq_bind_pf(h, AF_INET) < 0) {
    fprintf(stderr, "error during nfq_bind_pf()\n");
    exit(1);
    }

    printf("binding this socket to queue '0'\n");
    qh = nfq_create_queue(h, 0, &cb, NULL);
    if (!qh) {
    fprintf(stderr, "error during nfq_create_queue()\n");
    exit(1);
    }

    printf("setting copy_packet mode\n");
    if (nfq_set_mode(qh, NFQNL_COPY_PACKET, 0xffff) = 0)
    {
    nfq_handle_packet(h, buf, rv);
    }
    ////////////////////////////////////////////////////////////////////////////////////////////////////
    //filter(nfqdata, qh);

    printf(“unbinding from queue 0\n”);
    nfq_destroy_queue(qh);

    #ifdef INSANE
    /* normally, applications SHOULD NOT issue this command, since
    * it detaches other programs/sockets from AF_INET, too ! */
    printf(“unbinding from AF_INET\n”);
    nfq_unbind_pf(h, AF_INET);
    #endif

    printf(“closing library handle\n”);
    nfq_close(h);

    exit(0);
    }

  8. Regit,
    Very useful information indeed…
    I have a query here.. what about the performance issues in case NFQUEUE is being used along with a proxy ?

  9. yes, useful iformation
    if u ask me, i didn’t work on performance until now but i want
    at the first time i do the performance testing i’ll take the result here

    so thanks
    best

  10. Hi Eric,

    I was exploring the NFQUEUEs with tcp-proxy and came accross with an open source implementation “pepsal” .
    It uses libetfilter_queues and implements the split-proxy approach for TCP.
    I have few observations which i would like to share with you

    1. It is queuing the tcp syn packets from PREROUTING chain of Mangle Table.
    2. It is also redirecting all tcp segments to a random port(say 5000) on the prerouting chain of nat table.
    3. On the Application part, there are three major blocks/threads.
    a) it is running a tcp server on INADDR_ANY with port 5000
    b) In the nfqueue callback function, it is extracting the dst ip and dst port which is being requested from a client machine and creating another tcp session with the end server which was being requested from client.
    c) It have worker threads for passing the data to client via tcp server running on 5000.

    Now what is confusing here is, How the TCP Server can connect to client ? as the client has requested for server ip of endserver with a different port.

    Trying below to show the network topology

    Now the tcp client server break-up

    Client Machine
    Proxy Machine
    Server Machine

    My doubt is how the session can be established between Proxy machine tcp server and client machine TCP client. Is it something to do with the nfqueues as we queuing the tcp syn packets coming from client and redirecting the tcp segments to port 5000. Does NF_ACCEPT verdict for syn packtes can change something that can make it acceptable by tcp server of INADDR_ANY:5000?

    Please help.
    I can post my sample source code also if required. However the open source link of pepsal is http://sourceforge.net/projects/pepsal/

    Thanks,
    Rakesh

  11. Hi Eric,

    I was exploring the NFQUEUEs with tcp-proxy and came accross with an open source implementation “pepsal” .
    It uses libetfilter_queues and implements the split-proxy approach for TCP.
    I have few observations which i would like to share with you

    1. It is queuing the tcp syn packets from PREROUTING chain of Mangle Table.
    2. It is also redirecting all tcp segments to a random port(say 5000) on the prerouting chain of nat table.
    3. On the Application part, there are three major blocks/threads.
    a) it is running a tcp server on INADDR_ANY with port 5000
    b) In the nfqueue callback function, it is extracting the dst ip and dst port which is being requested from a client machine and creating another tcp session with the end server which was being requested from client.
    c) It have worker threads for passing the data to client via tcp server running on 5000.

    Now what is confusing here is, How the TCP Server can connect to client ? as the client has requested for server ip of endserver with a different port.

    Trying below to show the network topology

    =Client Machine : IP 1 = = IP1 : Proxy Machine : IP2 = = IP2 : End Server =

    Now the tcp client server break-up

    Client Machine = TCP Client Requesting for IP2:Port1
    Proxy Machine = [TCP Server listening on ANDDR_ANY:5000] [ TCP Client Requesting for IP2:PORT1 ] =
    Server Machine = TCP Server listening on IP2:Port1 =

    My doubt is how the session can be established between Proxy machine tcp server and client machine TCP client. Is it something to do with the nfqueues as we queuing the tcp syn packets coming from client and redirecting the tcp segments to port 5000. Does NF_ACCEPT verdict for syn packtes can change something that can make it acceptable by tcp server of INADDR_ANY:5000?

    Please help.
    I can post my sample source code also if required. However the open source link of pepsal is http://sourceforge.net/projects/pepsal/

    Thanks,
    Rakesh

  12. hello!
    But I have a question.What kind of packet is queued in nfqueue?
    mac,ip,or tcp/udp packet?
    many thanks.

  13. I am curious regarding the modification of packets. If the payload provided is an IP packet, does it mean the modification done cannot increase the packet size to more than 64K ? (limit of an IPv4 packet)

    Also, does libnetfilter_queue provide the packets in order received by the machine, or in order according to sequence number of the tcp header ?

    If we modify the payload of the IP packet, do we have to keep the headers consistent (fix the IP/TCP checksum, length fields, ack/seq numbers, etc…) ?

    Do we get the packets before IPSEC decryption or after ?

    I’m mostly interested in modifying TCP streams. Windows (Vista & higher) allows doing this through the Windows Filtering Platform APIs and libnetfilter_queue seems to be similar, but from the APIs I’m not clear that I can get data at the TCP layer directly. It seems rather that it’s the IP layer only, is that correct ?

  14. Hello Hassan,

    You can’t have packet size > 64k. nfnetlink_queue act at the IP level so it has almost no sense. And splitting data is not made inside kernel where a test on this maximum size is done when reinjecting the packet.

    Packet are received in order received by the machine.

    For checksums, you can set them to 0 and they will be computed inside kernel. For other fields, you’ve got to do it by yourself.

    Regarding IPSec, all depends of the iptables rules you are using to send packets to userspace.

    Yes, it is IP level packet. No direct stream access.

  15. Dear Regit,

    Thanks for all your helpful explanations about libnetfilter_queue.

    I’m working on reordering software for TCP packets using an agregate connexion that will pass through two simultaneous links.

    I have 2 questions :

    1) sometimes, there is a little gap in my ID numbers :

    – New packet (socket:111616) id=80807, dev=3, payload=1500, seq=645648501, awaited_seq=645648501, diff=0
    In order delivery, id=80807, seq=645648501, checking for further deliveries.
    Buffered delivery, id=80779, seq=645649949, buffer_gauge=0/100, index=55.

    – New packet (socket:111616) id=80819, dev=2, payload=1380, seq=645664429, awaited_seq=645651397, diff=13032
    missing 11 ids : 80808 to 80818
    Out-of-order pkt, id=80819, seq=645664429, buffering : 1/100, index=55.

    –> is that because I have my queue is full and I’m dropping packets ?

    2) if I ask the socket where I’m reading packets :

    while ((rv = recv(fd, buf, sizeof(buf), 0)) && rv >= 0) {

    getsockopt(fd,SOL_SOCKET,SO_RCVBUF,(void *)&n, &m);
    printf(“\n- New packet (socket:%d) “,n);

    for it size, the socket responds with the default size of the TCP windows receiver :

    cat /proc/sys/net/core/rmem_default
    gives me : 111616

    It sounds strange because I feel like I’m filtering packets BEFORE they arrive at the TCP level.

    Thank you in advance if you can answer any of these questions.

    Best regards,
    Cedric.

  16. Hello Cedric,

    1) It is not a queue full problem. In case of queue full, the counter is not incremented (but you have a kernel message). The gap in packet id is only possible if netlink message failed to be sent. In that case the queue_user_dropped counter is incremented. It is the 7th number in /proc/net/netfilter/nfnetlink_queue (first being the queue number).

    2) I’m not sure I’m understanding this question very well. From my point of view, it may more likely be a coincidence.

  17. Hello Regit,
    I am new to libnetfilter_queue library, How can I see the payload of packet. For example If I type “hello world” in google, then the packet will be captured and How can I see the payload part “Hello world” ?

  18. Hello Regit,

    I forgot to mention that when I give command “sudo iptables -A INPUT -i eth0 -j NFQUEUE –queue-num 0”, my internet does not work. I cannot open any website, Will you please explain this, why is it so ?

  19. hello thanks for your artical.
    when i set iptables -A OUTPUT -j NFQUEUE –queue-num 0, and return nfq_set_verdict(qh, id, accept, 0, NULL);
    i can not access internet.
    and when i set iptables -A INPUT -j NFQUEUE –queue-num 0, and return nfq_set_verdict(qh, id, accept, 0, NULL);
    it works well.
    i need control the traffic like http (based its payload in userspace),and i must detect both out and input traffic, how i can manage it. any idea suggest ? thank a lot .

  20. Hi Regit,

    thanks a lot for sharing this. I’m just running against a wall, trying to figure out what I’m missing. I’m pretty sure my code is (somewhat) correct, also the packets nfq_process_packet sends to the callback are absolutely fine – but after sending any verdict I’m getting a netlink reply like this:

    0000:34 00 00 00 02 00 00 00 00 00 00 00 59 10 00 00
    0010:ff ff ff ff 20 00 00 00 01 03 01 00 00 00 00 00
    0020:00 00 00 00 00 00 00 00 0c 00 02 00 00 00 00 01
    0030:00 00 00 0b

    the ff ff ff ff is -1, marking an error, followed by the original verdict message.
    Length 20 (matches with nfq_set_verdict return value), and then type 01 03, When browsing netfilter sources, NFQNL_MSG_VERDICT is 1, looking at netlink sources the subsystem ID apparently is 3, which I verified by debugging the nfq handles. The following 01 00 is Netlink REQ Flag, so that’s probably correct, too.

    A) The Netlink Sequence _always_ is zero (on both, error message and quoted message) – this does not seem correct to me. From what I see, it should preincrement when filling the header using nfnl_fill_hdr in __set_verdict, so zero should not appear – Is this correct?

    B) The Verdict message appears too long to me; it seems to me there are 4 (zero-)bytes in excess before the nfgenmsg struct? Am I missing something?

    I’m totally confused, tbh, unable to find out what’s going wrong. (of course, the queue fills up immediately because no verdict works – I’m on Debian 8)

    Maybe I’m just missing something and my mistake is pretty simple. I just can’t find out, so I’d appreciate _any_ clue why I can’t issue verdict messages.

    remerciant par avance

    Reseau-Rizzo 🙂

  21. Nevermind… sometimes it just helps to write things down and click “submit”.

    To everone facing this, too:

    Don’t drop superuser privileges after setting up the queue. If you do you’ll get screwed like described above *sigh*

  22. Hi all:

    I met a issue when I try to use fail-open feature.

    1) suricata 2.0.1 work in IPS mode and binding to queue 20;
    2) iptables 1.4.21 with following:
    iptables -A FORWARD -p tcp -d 1.11.1.1 –dport 80 -j QUEUE –queue-num 20

    I configured suriticata to turn on fail-open and run it. the log message show us the fail-open is ok.
    But when I saw NF_QUEUE full message in /var/log/messages, I saw that the packet is still sent to suricita.

    Can you help give me some remider where and how to debug it?

    thanks

  23. I don’t understand your problem. Suricata will still receive packets. To check if fail-open is working, you need to check that no packets are dropped when queue is full.

  24. Hello Regit,

    I have few queries
    How are you maintaining the index of the packet of the packet,as i can see it is linklist mentioned above can i see the code for that ,please direct me to the code that copy the incoming packet to the linklist.
    2>If i want to store the index no in the header how can we do that ?
    Please guide.

  25. Hello Peterson, regarding the index you need to extract it from kernel message and store using you own code. Kernel maintain internally a per-queue list with packets waiting with their index attached. To verdict you just need that index. You can discard all the rest. Regarding second question I don’t understand what you want to do

  26. Hello Regit,

    Thanks for the reply,actually i am looking for the source code where you are getting the packet and putting it into the link list.

  27. Hey, really good article! thanks for the information.
    I’m trying to implement an steganographic program with llibnetfilter_queue, so far I’m able to hook the packets, but it seems like they are not being send after the modification. My proc output gives me ” 0 17668 104 2 2048 0 1180 1285 1″ and the user drop column can keep growing. In the article you mention “increase netlink buffer size” for this problem, but im not sure… is this the buffer sent into the nfq_set_mode() function, or in the recv() function?
    Is there any other solution? It might be because the aplication do quite a job changing the packet
    I’m not really into C programming (so don’t know a lot about sockets and stuff, but i’m learning!) any help will be relly good.
    Thanks in advance

  28. Hello Regit,
    I hope you are well.

    I am working on bandwidth profile, bandwidth Controller. In my application, Packets come from NFQUEUE through iptables.
    I process the packet & extract various information. I also detect application layer data & extract them. I have seen my application run well if user numbers are small. If users are increase then there are packet loss. I have tested for 60 users.
    I also use queue specific calBack function, but no proper solution. Would you please suggest me, how can I increase the application performance?

  29. I have an issue that is perplexing me to no end. I want to use suricata (via nfqueue) on my linux firewall box and want all traffic to go through it first. This requires me to put the iptables rule for NFQUEUE in the raw table as shown below –

    iptables -traw -I PREROUTING ! -i lo -j NFQUEUE –queue-num 0

    The problem I am running into is that without this rule in the iptables the firewall throughput is about 85Mbps/85Mbps upload/download speed (tested using speedtest.net from a LAN client). Once this rule is introduced the upload speed drops drastically down to 5Mbps. The download speed stays at 85Mbps.

    I even stripped down all my firewall rules completely and just kept the MASQUERADE rule to just be able to forward traffic and still the same. Not sure what is going on here? Am I missing something?

  30. Hello Carvaka Guru,

    Did you get a look to Suricata IPS mode regarding possible NFQUEUE usage with suricata ? Filtering in PREROUTING raw should work but that could also capture trafic that is not for the FORWARD chain.

    On Suricata side, do you have any drop rules ?

  31. Hello Regit,

    Thanks for your response.

    Don’t have any drop rules enabled on suricata except one for blocking facebook just to verify that suricata is intercepting the traffic.

    Per your recommendation I tried using the “repeat” mode but the behavior was the same. I also moved the rule from the raw PREROUTING chain to the mangle PREROUTING chain with the same results. The upload speed gets throttled down to 5Mbps.

    Only when I put the rule in the filter FORWARD chain, the upload speed gets back to the normal 85Mbps. Now why the same rule in the PREROUTING chains reduces the upload speed to 5Mbps, is something that is really perplexing.

  32. Actually let me explain what I am trying to do –

    I am building a debian linux router which has basic iptables based stateful firewall, suricata for IPS and Squid for transparent proxying of all HTTP traffic. Now squid requires that traffic be redirected to it via a port (3128 in my case) which means the iptables rule for it will have to be in the “nat” PREROUTING chain.

    Also since I want all traffic going from LAN to WAN and viceversa to go through suricata first, the only place I can put the associated NFQUEUE iptables directive is in the raw or mangle PREROUTING chains.

    To put it simply I want all forwarded traffic to go through iptables firewall and suricata, and only HTTP traffic to also go through squid (apart from iptables and suricata).

    On my debian wheezy linux router the interface br0 is the bridge interface for all the LAN ports and eth0 is the WAN interface. The script I run to setup the iptables is as follows –

    service iptables-persistent flush

    iptables -P INPUT ACCEPT
    iptables -P FORWARD DROP
    iptables -P OUTPUT ACCEPT

    iptables -t raw -I PREROUTING ! -i lo -j NFQUEUE –queue-num 0

    iptables -t nat -A PREROUTING -i br0 -p tcp –match multiport –dport http,http-alt -j REDIRECT –to-ports 3128

    iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE

    iptables -A FORWARD -i eth0 -o br0 -m state –state ESTABLISHED,RELATED -j ACCEPT
    iptables -A FORWARD -i br0 -o eth0 -j ACCEPT

    iptables -A INPUT -i eth0 -m state –state ESTABLISHED,RELATED -j ACCEPT
    iptables -A OUTPUT -o eth0 -j ACCEPT

    service iptables-persistent save

  33. Hi Sanju,

    Simply use ‘suricata -q 0’ if you are queuing packet with NFQUEUE on queue 0.

  34. Hi,
    Is there a way to have statistics on the /proc/net/netfilter/nfnetlink_queue, may be MAX, MIN and AVG value in order to know if we have to adjust or not the size of the buffer. How to know which value we need to tune ? I’ve in some cases counter queue dropped and user dropped which increase:
    – queue dropped: means queue is full ? I have to increase NFQLENGTH ? How to know the right value ?
    – user dropped: means buffer is too small and I’ve receive a message > 4096 ??? I have to increase BUFSIZE ? How to know the right value ?

    Thanks in advance for you advises,

    #define NFQLENGTH 5000
    #define BUFSIZE 4096

    gtp_nfqh[i] = nfq_create_queue(gtp_nfh[i], i, &traffic_callback, NULL); //bind to queue
    (…)
    nfq_set_mode(gtp_nfqh[i], NFQNL_COPY_PACKET, 0xffff)
    (…)
    /* Set kernel queue maximum length */
    nfq_set_queue_maxlen(gtp_nfqh[i], NFQLENGTH);
    (…)
    /* Increase the nfnetlink buffer space size */
    nfnl_rcvbufsiz(nfq_nfnlh(gtp_nfh[i]), NFQLENGTH * BUFSIZE);

  35. Hi Regit,

    Do you have any example how to implement the PacketPool (from your code), so I could later access all the previosuly stored packets and issue the verdict?

    Cheers,
    Kunik

  36. Hi Regit,

    Thank you for your good article.

    I am writing an application to profile users data usages. aim to handle 300 users, using single queue to process but problem is with above 60 mbps speed my iptables queue size getting full even if just verdict packet without any processing from callback.

    seeking suggestion from you –

    I am thinking some solution –

    1. queue balance —

    – how this work? if one queue is full then packet shifted to another queue?

    2. parallel processing – batch verdict
    – i am receiving from single descriptor, how it could be parallel ?
    – is it possible to receive with multiple descriptor?

    3. maintaining queue in my application side

    – If i maintain queue from my side, i think it causing packet delay right?

    aiming 500 user with 600 Mbps speed, how could i? Please suggest me a solution.

    another thing –

    is it possible to control bandwidth by just sitting behind iptables?

    looking forward from you.

    Regards,
    Nazmul

  37. Great tutorial man, this library is what my need for my project: a basic QoS service.
    Anyway, threads confuse me a bit: are you sure there is no need for synchronization mechanisms like mutex or condition?
    In my code I have three queues where I store the packets (with –queue-number 0, 1 and 2), I copy each packet in a custom struct I made and then I NF_DROP every packet.
    Since nfq_set_verdict can be called only once per packet, I need something else to re-inject my packets: hence, raw sockets.
    Your PacketPool *ppool is basically my linked list of custom structs, each of one is a copied packet.
    Are you sure there is no need for mutex or condition, working with threads which put and get from that queues?

  38. Hi,

    Thanks for the nicr article! But i would like to know if you have idea about “how” i can change/inject the received packet.

    E.g: I need to intercept all bootp/dhcp packets and change a specific field…

    But I cant figure out how can do that.

  39. Question about multithreading. The FAQ says:

    “The send/recv operation need to be protected by lock to avoid concurrent writing. This means that the nfq_set_verdict2 and nfq_handle_packet function needs to be protected by lock mechanism.”

    Doesn’t that mean that there can be at most 2 threads? What if there are multiple queues? Can each thread recv() and nfq_handle_packet() separate queues?

    BTW, I found some information that says that recv() / send() are atomic:
    http://stackoverflow.com/questions/1981372/are-parallel-calls-to-send-recv-on-the-same-socket-valid
    http://cboard.cprogramming.com/c-programming/150774-parallel-threads-socket-send-recv.html

    But others are saying that it’s not.

    Stepping back for a second, I want to make sure that my NFQUEUE processor is properly parallelized. Which of the following solutions would be most effective:

    – single queue, multiple threads (one process)
    – multiple queues, multiple threads (one process)
    – multiple queues, separate process per queue

    thanks,

    Eugene

  40. Another question: is it valid to call nfq_open() multiple times? Would that create separate sockets?

  41. Hi Ragit,

    Thanks for the article and the examples.

    I’m using libnetfilter_queue for firewall purpose

    iptables -A INPUT -p tcp -m state –state NEW -j NFQUEUE –queue-num 0
    iptables -A OUTPUT -p tcp -m state –state NEW -j NFQUEUE –queue-num 0

    these are iptable rules to filter out incoming and outgoing connection packets..

    But I dont know how to filter/tap TCP socket close packets..

    Thanks,
    Nilesh

  42. Hello Regit,
    I have some additional questions.

    1.—-
    My process creates 2 queues with no 0 & 1.
    The output of ‘cat /proc/net/netfilter/nfnetlink_queue’ is following:

    0 31989 7 2 65531 0 0 9337053 1
    1 -4267 66 2 65531 0 0 12303961 1

    The number ‘31989’ (line 1 column 2) is pid of the process (I’ve checked it)

    Q1.: what is ‘-4267’ value?

    According to your description is should be also 31989 as the same process creates both queues

    2.—
    The meaning of 7th column is unclear: ‘user dropped: number of packets dropped because netlink message could not be sent to userspace’

    Q2.: what could be the reason the “netlink message could not be sent to userspace'”?

    3.—
    As you can see in my case current number of packets waiting in the queue contains value ’66’ (line 2 col 3).
    It means that the queue keeps non-handled packets.
    Observing this value I notice that it is not constant but very slowly grows up (around 1-3 per hour) causing finally buffer overflow with kernel message like this: ‘nf_queue: full at 10240 entries, dropping packets(s)’

    Q3a.: what way to discard this non-handled packets to avoid buffer overflow?

    The loss of 1 -3 packet per hour is not a problem whereas buffer overflow after a several weeks creates a big problem for me.

    Q3b.: how to get this value(s) form my program without accessing /proc ?

    Such access is time consuming, what is more /proc can be not mapped in file system.

    4. —-
    Doubt concerned with ‘nfq_set_mode()’ function

    I understand that if I call it as
    nfq_set_mode(qh, NFQNL_COPY_PACKET, 0xFFFF)
    the next
    nfq_get_payload()
    function returns me the packet length

    Q4a.: what is the relation between returned value and IP packet field ‘Total length’ ? It is the same? (IMHO should be)

    I understand also that if I use it as
    nfq_set_mode(qh, NFQNL_COPY_PACKET, 40)
    only 40 bytes will be copied with function
    nfq_get_payload()
    and the function returns value 40 (assuming the packet is so long)

    Q4b.: it will be still possible to get the information about (whole) packet length from ‘Total length’ field?
    Please confirm.

    I am asking about because I want to speed up packet copying process (less data) and I am not interested in packet data itself – I need to know only which proto/port and how much data is used.

    Rgds,
    Dorian

  43. i am new in nftables.i am not getting proper information about arp request packet filtering how to filter arp packet for particular dest ip using nft table if it is possible please provide me some example

    i have try with this but i have not received any packet

    table bridge bridgetable {
    chain forward {
    type filter hook forward priority 0;
    arp operation 256 counter packets 0 bytes 0 accept
    }
    }

  44. I’m using this command:

    iptables -A INPUT -j NFQUEUE –queue-balance 0:3

    But I only see the packet goes to queue#0, not other 3 queues. looks like iptables is not distributing the packets.

    my iptables is v1.4.21

    Anyone have any idea what could cause this problem ?

    Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *