Introduction
NFQUEUE is an iptables and ip6tables target which delegate the decision on packets to a userspace software. For example, the following rule will ask for a decision to a listening userpsace program for all packet going to the box:
iptables -A INPUT -j NFQUEUE --queue-num 0
In userspace, a software must used libnetfilter_queue to connect to queue 0 (the default one) and get the messages from kernel. It then must issue a verdict on the packet.
Inner working
To understand NFQUEUE, the easiest way is to understand the architecture inside Linux kernel. When a packet reach an NFQUEUE target it is en-queued to the queue corresponding to the number given by the --queue-num
option. The packet queue is a implemented as a chained list with element being the packet and metadata (a Linux kernel skb):
- It is a fixed length queue implemented as a linked-list of packets.
- Storing packet which are indexed by an integer
- A packet is released when userspace issue a verdict to the corresponding index integer
- When queue is full, no packet can be enqueued to it
This has some implication on userspace side:
- Userspace can read multiple packets and wait for giving a verdict. If the queue is not full there is no impact of this behavior.
- Packets can be verdict without order. Userspace can read packet 1,2,3,4 and verdict at 4,2,3,1 in that order.
- Too slow verdict will result in a full queue. Kernel will then drop incoming packets instead of en-queuing them.
A few word about the protocol between kernel and userspace
The protocol used between kernel and userspace is nfnetlink. This is a message based protocol which does not involved any shared memory. When a packet is en-queued, the kernel sends a nfnetlink formatted message containing packet data and related information to a socket and userspace reads this message. To issue a verdict, userspace format a nfnetlink message containing the index number of the packet and send it to the communication socket.
Using libnetfilter_queue in C
The main source of information for libnetfiler_queue is the Doxygen generated documentation.
There is three step in the usage of the library:
- Library setup where the software connect to a given queue and setup some options.
- The message receiving phase where for each packet received a callback is called.
- The exit phase where nfq_close is called.
If you want to look at production code, you can have a look at source-nfq.c in suricata which is a multithread implementation of libnetfilter_queue.
Example software architecture
The simplest architecture is made of one thread reading the packet and issuing the verdict. The following code is not complete but show the logic of the implementation.
[C]
/* Definition of callback function */
static int cb(struct nfq_q_handle *qh, struct nfgenmsg *nfmsg,
        struct nfq_data *nfa, void *data)
{
int verdict;
   u_int32_t id = treat_pkt(nfa, &verdict); /* Treat packet */
    return nfq_set_verdict(qh, id, verdict, 0, NULL); /* Verdict packet */
}
/* Set callback function */
qh = nfq_create_queue(h, 0, &cb, NULL);
for (;;) {
if ((rv = recv(fd, buf, sizeof(buf), 0)) >= 0) {
        nfq_handle_packet(h, buf, rv); /* send packet to callback */
        continue;
}
}
[/C]
It is also possible to have a reading thread and a verdict thread:
[C]
PacketPool *ppool;
/* Definition of callback function */
static int cb(struct nfq_q_handle *qh, struct nfgenmsg *nfmsg,
        struct nfq_data *nfa, void *data)
{
/* Simply copy packet date and send them to a packet pool */
return push_packet_to_pool(ppool, nfa);
}
int main() {
/* Set callback function */
qh = nfq_create_queue(h, 0, &cb, NULL);
/* create reading thread */
pthread_create(read_thread_id, NULL, read_thread, qh);
/* create verdict thread */
pthread_create(write_thread_id, NULL, verdict_thread, qh);
/* … */
}
static void *read_thread(void *fd)
{
for (;;) {
if ((rv = recv(fd, buf, sizeof(buf), 0)) >= 0) {
        nfq_handle_packet(h, buf, rv); /* send packet to callback */
        continue;
}
}
}
static void *verdict_thread(void *fd)
{
for (;;) {
Packet p = fetch_packet_from_pool(ppool);
u_int32_t id = treat_pkt(nfa, &verdict); /* Treat packet */
    nfq_set_verdict(qh, id, verdict, 0, NULL); /* Verdict packet */
}
}
[/C]
Other languages
Pierre Chifflier (aka pollux) has developed bindings for libnetfilter_queue which can be used in most high level languages (python, perl, ruby, …): nfqueue-bindings.
Advanced features
Multiqueue
--queue-balance
is an NFQUEUE option which has been added by Florian Westphal to be able to load balanced packet queued by the same iptables rules to multiple queues. The usage is fairly simple. For example, to load balance INPUT traffic to queue 0 to 3, the following rule can be used.
iptables -A INPUT -j NFQUEUE --queue-balance 0:3
One point that worht to be mentioned is that the load-balancing is made with respect to the flow and all packet of a flow are sent to the same queue.
The extension is available since Linux kernel 2.6.31 and iptables v1.4.5.
–queue-bypass
--queue-bypass
is on other NFQUEUE option by Florian Westphal. It change the behavior of a iptables rules when no userspace software is connected to the queue. Instead of dropping packets, the packet are authorized if no software is listening to the queue.
The extension is available since Linux kernel 2.6.39 and iptables v1.4.11.
This feature is broken from kernel 3.10 to 3.12: when using a recent iptables, passing the option --queue-bypass
has no effect on these kernels.
fail-open
This option is available since Linux 3.6 and allow to accept packet instead of dropping them when the queue is full. An
example usage can be found in suricata.
batching verdict
Since Linux 3.1, it is possible to use batch verdict. Instead of sending a verdict for one packet, it is possible to send a verdict to all packets with an id inferior to a given id. To do so, one must used the nfq_set_verdict_batch or nfq_set_verdict_batch2 functions.
This system has performance advantage as the limitation of messages increase the packet rate. But it can introduce latency as packets are verdict at once. It is thus responsibility of the userspace software to find adaptive techniques to limit latency by issuing verdict faster and notably in the case where there is few packets.
Misc
nfnetlink_queue entry in /proc
nfnetlink_queue has a dedicated entry in /proc: /proc/net/netfilter/nfnetlink_queue
cat /proc/net/netfilter/nfnetlink_queue 40 23948 0 2 65531 0 0 106 1
The content is the following:
- queue number
- peer portid: good chance it is process ID of software listening to the queue
- queue total: current number of packets waiting in the queue
- copy mode: 0 and 1 only message only provide meta data. If 2 message provide a part of packet of size copy range.
- copy range: length of packet data to put in message
- queue dropped: number of packets dropped because queue was full
- user dropped: number of packets dropped because netlink message could not be sent to userspace. If this counter is not zero, try to increase netlink buffer size. On the application side, you will see gap in packet id if netlink message are lost.
- id sequence: packet id of last packet
- 1
Frequently Asked Questions
libnetfilter_queue and multithread
libnetfilter_queue is dependent of message sent to a socket. The send/recv operation need to be protected by lock to avoid concurrent writing. This means that the nfq_set_verdict2
and nfq_handle_packet
function needs to be protected by lock mechanism.
Receiving a message and sending a message are completely separate operations which don’t share any memory. In particular, the verdict only use the packet index as information. So as soon as locking is made different threads can verdict for any packet in the queue.
Packet reordering
packet reordering can be easily made with NFQUEUE as the verdict operation can be made for any en-queued packet. Although, one thing to consider is that the kernel implementation of queued packet is made with a linked list. So it is costly to verdict packet which are not at start of the list (oldest packet is first).
libnetfilter_queue and zero copy
As communication between kernel and userspace is based upon messages sent to a netlink socket, there is no such thing as zero copy. Patrick McHardy has started a memory mapped implementation of netlink and, thus, zero copy may be possible in the future.
Hello mates, how is everything, and what you
want to say regarding this paragraph, in my view its really
awesome in favor of me.
amazing article
thx for shring your knowledge and shrink it to one page
help me a lot
thank you!!!
Typo in:
“In userspace, a software must used libnetfilter_queue to connect to…”
“must use” is probably what you intended.
Best,
Hi,
I have a doubt regarding the packet flow after a verdict is set on the packet. In my case the packet enters the next chain instead of entering the next table .
E,G if NFQUEUE is catching packets in MANGLE table of PREROUTING chain , if then throws the packet in FORWARD chain instead if NAT table of PREROUTING chain.
Hi,
I have a problem as:
suricata version is 4.1.4.
run commond: suricata -c suricata.yaml -q 0
In suricata.yaml,the nfq config:
mode:accept
fail-open:yes
But when I set the nf_queue size is 1,and send packets to test “fail-open” function,there is no effect.Packets were dropped when suricata couldn’t keep pace.
I had saw the libnetfilter_queue source code and source_nfq.c.There is no problem in those code.
So I want to know some ways to solve the problem.
Thank you very much!
Hi when I enabled queue internet is not working.
What to do
This article is actually a nice one it helps new net users, who are wishing in favor
of blogging.