Netfilter – To Linux and beyond !

Nftables port knocking

One of the main advantage of nftables over iptables is its native handling of set. They can be used for multiple purpose and thanks to the timeout capabilities it is easy to do some fun things like implementing port knocking in user space.

The idea of this technic is fairly simple, a closed port is dynamically opened if the user send packets in order to a predetermine series of ports.

To implement that on a system, let’s start from ruleset where Input is closed but for a SSH rule using a set of IPv4 address as parameter:

# nft list ruleset
table inet Filter {
	set Client_SSH_IPv4 {
		type ipv4_addr
		flags timeout
		elements = { 192.168.1.1, 192.168.1.254 }
	}

	chain Input {
		type filter hook input priority 0; policy drop;
		ct state established counter accept
		tcp dport ssh ip saddr @Client_SSH_IPv4 counter accept
		iif "lo" accept
		counter log prefix "Input Dft DROP " drop
	}
}

This gives a control of which IP get access to the SSH server. It contains two fixed IP but there is a timeout flag set allowing the definition of entry with timeout.

To implement a 3 steps port knocking we define two sets with a short timeout. Hitting one port make you enter in the first set then if you are in the first set you can enter the second set by hitting the second port.

To get the first step done we use the fact that nftables can add element to a set in the packet path:

# nft add inet Filter Raw tcp dport 36 set add ip saddr @Step1 drop

Once this is done, second step is a simple derivative of first one:

# nft add inet Filter Raw ip saddr @Step1 tcp dport 15 set add ip saddr @Step2 drop

We add these rules in the prerouting table so kernel won’t do any connection tracking for these packets.

This gets us the following ruleset:

	set Step1 {
		type ipv4_addr
		timeout 5s
	}

	set Step2 {
		type ipv4_addr
		timeout 5s
	}

	chain Raw {
		type filter hook prerouting priority -300; policy accept;
		tcp dport 36 set add ip saddr @Step1 drop
		ip saddr @Step1 tcp dport 15 set add ip saddr @Step2 drop
		ip saddr @Step2 tcp dport 42 set add ip saddr timeout 10s @Client_SSH_IPv4 drop
	}

Poor man testing can be

nmap -p 36 IPKNOCK & sleep 1;  nmap -p 15 IPKNOCK & sleep 1;  nmap -p 42 IPKNOCK

We run this and our IP is in the Client_SSH_IPv4 set:

# nft list set inet Filter Client_SSH_IPv4
table inet Filter {
	set Client_SSH_IPv4 {
		type ipv4_addr
		flags timeout
		elements = { 192.168.2.1 expires 8s, 192.168.1.1,
			     192.168.1.254 }
	}
}

and we can SSH to IPKNOCK 😉

And here is the complete ruleset:

# nft list ruleset
table inet Filter {
	set Client_SSH_IPv4 {
		type ipv4_addr
		flags timeout
		elements = { 192.168.1.1, 192.168.1.254 }
	}

	set Step1 {
		type ipv4_addr
		timeout 5s
	}

	set Step2 {
		type ipv4_addr
		timeout 5s
	}

	chain Input {
		type filter hook input priority 0; policy drop;
		ct state established counter accept
		tcp dport ssh ip saddr @Client_SSH_IPv4 counter packets 0 bytes 0 accept
		iif "lo" accept
		counter log prefix "Input Dft DROP " drop
	}

	chain Raw {
		type filter hook prerouting priority -300; policy accept;
		tcp dport 36 set add ip saddr @Step1 drop
		ip saddr @Step1 tcp dport netstat set add ip saddr @Step2 drop
		ip saddr @Step2 tcp dport nameserver set add ip saddr timeout 10s @Client_SSH_IPv4 drop
	}
}

For more information on nftables feel free to visit the Nftables wiki.

Slides of my talks at Lecce

I’ve been invited by SaLUG to Lecce to give some talks during their Geek Evening. I’ve done a talk on nftables and one of suricata.

The nftables talk was about the motivation behind the change from iptables.

Here are the slides: Nftables

The talk on Suricata was explaining the different feature of Suricata and was showing how I’ve used it to make a study of SSH bruteforce.

Here are the slides:
Suricata, Netfilter and the PRC.

Thanks a lot to Giuseppe Longo, Luca Greco and all the SaLUG team, you have been wonderful hosts!

Slides of my nftables talk at Kernel Recipes

I’ve been lucky enough to do a talk during the third edition of Kernel Recipes. I’ve presented the evolution of nftables durig the previous year.

You can get the slides from here: 2014_kernel_recipes_nftables.

Thanks to Hupstream for uploading the video of the talk:

Not much material but this slides and a video of the work done during the previous year on nftables and its components:

Using DOM with nftables

DOM and SSH honeypot

DOM is a solution comparable to fail2ban but it uses Suricata SSH log instead of SSH server logs. The goal of DOM is to redirect the attacker based on its SSH client version. This allows to send attacker to a honeypot like pshitt directly after the first attempt. And this can be done for a whole network as Suricata does not need to be on the targeted box.

Using DOM with nftables

I’ve pushed a basic nftables support to DOM. Instead of adding element via ipset it uses a nftables set.

It is simple to use it as you just need to add a -n flag to specify which table the set has been defined in:

./dom -f /usr/local/var/log/suricata/eve.json -n nat -s libssh -vvv -i -m OpenSSH

To activate the network address translation based on the set, you can use something like:

table ip nat {
        set libssh { 
                type ipv4_addr
        }

        chain prerouting {
                 type nat hook prerouting priority -150;
                 ip saddr @libssh ip protocol tcp counter dnat 192.168.1.1:2200
        }
}

A complete basic ruleset

Here’s the ruleset running on the box implementing pshitt and DOM:

table inet filter {
        chain input {
                 type filter hook input priority 0;
                 ct state established,related accept
                 iif lo accept
                 ct state new iif != lo tcp dport {ssh, 2200} tcp flags == syn counter log prefix "SSH attempt" group 1 accept
                 iif br0 ip daddr 224.2.2.4 accept
                 ip saddr 192.168.0.0/24 tcp dport {9300, 3142} counter accept
                 ip saddr 192.168.1.0/24 counter accept
                 counter log prefix "Input Default DROP" group 2 drop
        }
}

table ip nat {
        set libssh { 
                type ipv4_addr
        }

        chain prerouting {
                 type nat hook prerouting priority -150;
                 ip saddr @libssh ip protocol tcp counter dnat 192.168.1.1:2200
        }

        chain postrouting {
                 type nat hook postrouting priority -150;
                 ip saddr 192.168.0.0/24 snat 192.168.1.1
        }
}

There is a interesting rule in this ruleset. The first is:

ct state new iif != lo tcp dport {ssh, 2200} tcp flags == syn counter log prefix "SSH attempt" group 1 accept

It uses a negative construction to match on the interface iif != lo which means interface is not lo. Note that it also uses an unamed set to define the port list via tcp dport {ssh, 2200}. That way we have one single rule for normal and honeypot ssh. At least, this rule is logging and accepting and the logging is done via nfnetlink_log because of the group parameter. This allows to have ulogd to capture log message triggered by this rule.

pshitt: collect passwords used in SSH bruteforce

Introduction

I’ve been playing lately on analysis SSH bruteforce caracterization. I was a bit frustrated of just getting partial information:

ulogd can give information about scanner settings
suricata can give me information about software version
sshd server logs shows username

But having username without having the password is really frustrating.

So I decided to try to get them. Looking for a SSH server honeypot, I did find kippo but it was going too far for me
by providing a fake shell access. So I’ve decided to build my own based on paramiko.

pshitt, Passwords of SSH Intruders Transferred to Text, was born. It is a lightweight fake SSH server that collect authentication data sent by intruders. It basically collects username and password and writes the extracted data to a file in JSON format. For each authentication attempt, pshitt is dumping a JSON formatted entry:

{"username": "admin", "src_ip": "116.10.191.236", "password": "passw0rd", "src_port": 36221, "timestamp": "2014-06-26T10:48:05.799316"}

The data can then be easily imported in Logstash (see pshitt README) or Splunk.

The setup

As I want to really connect to the box running ssh with a regular client, I needed a setup to automatically redirect the offenders and only them to pshitt server. A simple solution was to used DOM. DOM parses Suricata EVE JSON log file in which Suricata gives us the software version of IP connecting to the SSH server. If DOM sees a
software version containing libssh, it adds the originating IP to an ipset set.

So, the idea of our honeypot setup is simple:

Suricata outputs SSH software version to EVE
DOM adds IP using libssh to the ipset set
Netfilter NAT redirects all IP off the set to pshitt when they try to connect to our ssh server

Getting the setup in place is really easy. We first create the set:

ipset create libssh hash:ip

then we start DOM so it adds all offenders to the set named libssh:

cd DOM
./dom -f /usr/local/var/log/suricata/eve.json -s libssh

A more accurate setup for dom can be the following. If you know that your legitimate client are only based on OpenSSH then you can
run dom to put in the list all IP that do not (-i) use an OpenSSH client (-m OpenSSh):

./dom -f /usr/local/var/log/suricata/eve.json -s libssh -vvv -i -m OpenSSH

If we want to list the elements of the set, we can use:

ipset list libssh

Now, we can start pshitt:

cd pshitt
./pshitt

And finally we redirect the connection coming from IP of the libssh set to the port 2200:

iptables -A PREROUTING -m set --match-set libssh src -t nat -i eth0 -p tcp -m tcp --dport 22 -j REDIRECT --to-ports 2200

Some results

Here’s an extract of the most used passwords when trying to get access to the root account:

And here’s the same thing for the admin account attempt:

Both data show around 24 hours of attempts on an anonymous box.

Conclusion

Thanks to paramiko, it was really fast to code pshitt. I’m now collecting data and I think that they will help to improve the categorization of SSH bruteforce tools.

Suricata and Ulogd meet Logstash and Splunk

Some progress on the JSON side

Suricata 2.0-rc2 is out and it brings some progress on the JSON side. The logging of SSH protocol has been added:

and the format of timestamp has been updated to be ISO 8601 compliant and it is now named timestamp instead of time.

Ulogd, the Netfilter logging daemon has seen similar change as it is now also using a ISO 8601 compliant timestamp for the . This feature is available in git and will be part of ulogd 2.0.4.

Thanks to this format change, the integration with logstash or splunk is easier and more accurate.
This permit to fix one problem regarding the timestamp of an event inside of the event and logging manager. At least in logstash, the used date was the one of the parsing which was not really
accurate. It could even be a problem when logstash was parsing a file with old entries because the difference in timestamp could be huge.

It is now possible to update logstash configuration to have a correct parsing of the timestamp. After doing this the internal @timestamp and the timestamp of the event are synchronized as show on the following screenshot:

Logstash configuration

To configure logstash, you simply needs to tell him that the timestamp field in JSON message is a date. To do so, you need to add a filter:

      date {
        match => [ "timestamp", "ISO8601" ]
      }

A complete logstash.conf would then looks like:

input {
   file {
      path => [ "/usr/local/var/log/suricata/eve.json", "/var/log/ulogd.json" ]
      codec =>   json
      type => "json-log"
   }
}

filter {
   if [type] == "json-log" {
      date {
        match => [ "timestamp", "ISO8601" ]
      }
   }
}

output {
  stdout { codec => rubydebug }
  elasticsearch { embedded => true }
}

Splunk configuration

In splunk, auto detection of the file format is failing and it seems you need to define a type to parse JSON in
$SPLUNK_DIR/etc/system/local/props.conf:

[suricata]
KV_MODE = json
NO_BINARY_CHECK = 1
TRUNCATE = 0

Then you can simply declare the log file in $SPLUNK_DIR/etc/system/local/inputs.conf:

[monitor:///usr/local/var/log/suricata/eve.json]
sourcetype = suricata

[monitor:///var/log/ulogd.json]
sourcetype = suricata

you can now build search events and build dashboard based on Suricata or Netfilter packet logging:

Nftables and the Netfilter logging framework

Nftables logging

If nftables is bringing a lot of changes on user side, this is also true in the logging area.
There is now only one single keyword for logging: log and this target is using the Netfilter logging framework.
A corollary of that is that why you may not see any log messages even if a rule with log is matching because the Netfilter logging framework has to be configured.

Netfilter logging framework

The Netfilter logging framework is a generic way of logging used in Netfilter components. This framework is implemented in two different
kernel modules:

xt_LOG: printk based logging, outputting everything to syslog (same module as the one used for iptables LOG target). It can only log packets for IPv4 and IPv6
nfnetlink_log: netlink based logging requiring to setup ulogd2 to get the events (same module as the one used for iptables NFLOG target). It can log packet for any family.

To use one of the two modules, you need to load them with modprobe. It is possible to have both modules loaded and in this case, you can then setup logging on a per-protocol basis. The active configuration is available for reading in /proc:

# cat /proc/net/netfilter/nf_log 
 0 NONE (nfnetlink_log)
 1 NONE (nfnetlink_log)
 2 nfnetlink_log (nfnetlink_log,ipt_LOG)
 3 NONE (nfnetlink_log)
 4 NONE (nfnetlink_log)
 5 NONE (nfnetlink_log)
 6 NONE (nfnetlink_log)
 7 nfnetlink_log (nfnetlink_log)
 8 NONE (nfnetlink_log)
 9 NONE (nfnetlink_log)
10 nfnetlink_log (nfnetlink_log,ip6t_LOG)
11 NONE (nfnetlink_log)
12 NONE (nfnetlink_log)

The syntax is the following FAMILY ACTIVE_MODULE (AVAILABLE_MODULES). Here nfnetlink_log was loaded first and xt_LOG was loaded afterward (xt_LOG is aliased to ipt_LOG and ip6t_LOG).

Protocol family numbers can look a bit strange. It is in fact mapped on the socket family name that is used in underlying code.
The list is the following:

#define AF_UNSPEC	0
#define AF_UNIX		1	/* Unix domain sockets 		*/
#define AF_INET		2	/* Internet IP Protocol 	*/
#define AF_AX25		3	/* Amateur Radio AX.25 		*/
#define AF_IPX		4	/* Novell IPX 			*/
#define AF_APPLETALK	5	/* Appletalk DDP 		*/
#define	AF_NETROM	6	/* Amateur radio NetROM 	*/
#define AF_BRIDGE	7	/* Multiprotocol bridge 	*/
#define AF_AAL5		8	/* Reserved for Werner's ATM 	*/
#define AF_X25		9	/* Reserved for X.25 project 	*/
#define AF_INET6	10	/* IP version 6			*/
#define AF_MAX		12	/* For now.. */

To update the configuration, you need to write in the file corresponding to the family in /proc/sys/net/netfilter/nf_log/ directory.
For example, if you want to use ipt_LOG for IPv4 (2 in the list), you can do:

echo "ipt_LOG" >/proc/sys/net/netfilter/nf_log/2

This will active ipt_LOG for IPv4 logging:

# cat /proc/net/netfilter/nf_log 
 0 NONE (nfnetlink_log)
 1 NONE (nfnetlink_log)
 2 ipt_LOG (nfnetlink_log,ipt_LOG)
 3 NONE (nfnetlink_log)
 4 NONE (nfnetlink_log)
 5 NONE (nfnetlink_log)
 6 NONE (nfnetlink_log)
 7 nfnetlink_log (nfnetlink_log)
 8 NONE (nfnetlink_log)
 9 NONE (nfnetlink_log)
10 nfnetlink_log (nfnetlink_log,ip6t_LOG)
11 NONE (nfnetlink_log)
12 NONE (nfnetlink_log)

Netfilter framework is used internally by Netfilter for some logging. For example, the connection tracking is using it to send messages when invalid packets are seen. These messages are useful because they contain the reason of the reject. For example, one of the message is “nf_ct_tcp: ACK is under the lower bound (possible overly delayed ACK)”.
This logging messages are only sent if the logging of invalid packet is asked. This is done by doing:

echo "255"> /proc/sys/net/netfilter/nf_conntrack_log_invalid

More information on the magical 255 value are available in kernel documentation of nf_conntrack sysctl.
If nfnetlink_log module is used for the protocol, then the used group is 0. So if you want to activate these messages, it could be a good idea to use non 0 nfnetlink group in the log rules.
This way you will be able to differentiate the log sources in a software like ulogd.

Logging with Nftables

As mentioned before, logging is made via a log keyword. A typical log and accept rule will look like:

nft add rule filter input tcp dport 22 ct state new log prefix \"SSH for ever\" group 2 accept

This rule is accepting packet to port 22 in the state NEW and it is logging them with prefix SSH for ever on group 2.
Here the group is only used when the active logging kernel module is nfnetlink_log. The option has no effect if xt_LOG is used. In fact, when used with xt_LOG, the only available option is prefix (at least for nftables 0.099).

The available options when using nfnetlink_log module are the following (at least for nftables 0.099):

prefix: A prefix string to include in the log message, up to 64 characters long, useful for distinguishing messages in the logs.
group: The netlink group (0 – 2^16-1) to which packets are (only applicable for nfnetlink_log). The default value is 0.
snaplen: The number of bytes to be copied to userspace (only applicable for nfnetlink_log). nfnetlink_log instances may specify their own range, this option overrides it.
queue-threshold: Number of packets to queue inside the kernel before sending them to userspace (only applicable for nfnetlink_log). Higher values result in less overhead per packet, but increase delay until the packets reach userspace. The default value is 1.

Note: the description are extracted from iptables man pages.

If you want to do some easy testing with nftables, simply load xt_LOG module before nfnetlink_log. It will bind to IPv4 and IPv6 protocol and provide you logging. For more fancy stuff involving nfnetlink_log, you can have a look at Using ulogd and JSON output.

Happy logging to all!

Logging connection tracking event with ulogd

Motivation

I’ve recently met @aurelsec and we’ve discussed about the interest of logging connection tracking entries. This is indeed a undervalued information source in a network.

Quoting Wikipedia: “Connection tracking allows the kernel to keep track of all logical network connections or sessions, and thereby relate all of the packets which may make up that connection. NAT relies on this information to translate all related packets in the same way, and iptables can use this information to act as a stateful firewall.”

Connection tracking being linked with Network Address Translation has a direct impact: it stores both side of each connection. If we use conntrack tool from conntrack-tools to list connections:

# conntrack  -L
tcp      6 431999 ESTABLISHED src=192.168.1.129 dst=19.1.16.7 sport=53400 dport=443 src=19.1.16.7 dst=1.2.3.4 sport=443 dport=53500 [ASSURED] mark=0 use=1
...

We have the two sides of a connection:

Orig: here 192.168.1.129:53400 to 19.1.16.7:443. This is the packet information as seen by the firewall when it reaches him. There is no translation at all.
Reply: here 19.1.16.7:443 to 1.2.3.4:53500. This is how will look like a answer coming from the server. The destination has been changed to the public IP of the firewall (here 1.2.3.4). And there is also a change of the destination port to the one used by the firewall when doing the initial mapping. In fact, as multiple client could use the same port at the same time, the firewall may have to rewrite the initial source port.

So the connection tracking stores all NAT transformations. This information is important because this is the only way to know which IP in a private network is responsible of something in the outside world. For example, let’s suppose that 19.1.16.7 has been attacked by our internal client (here 192.168.1.129). If the admin of this server sees the attack, it will only see the 1.2.3.4 IP address and port source 53500. If an authority asks you for the IP address responsible in your internal network you have no instrument but the conntrack to know that this was in fact 192.168.1.129.

That’s why logging connection tracking event is one of the only effective way to store the information necessary to get back to the internal IP address in case of external query. Let’s now do this with ulogd 2.

Ulogd setup

Ulogd installation

Ulogd 2 is able to get information from the connection tracking and to log them in files or database.
If your distribution is not providing ulogd and if you don’t know how to install it, you can check this post Using ulogd and JSON output.
To be sure that you will be able to log connection tracking event, you need to have NFCT plugin to yes at the end of configure output.

Ulogd configuration:
  Input plugins:
    NFLOG plugin:			yes
    NFCT plugin:			yes

Kernel setup

All functionalities are standard since kernel 2.6.14. You only need to load the following module:

modprobe nf_conntrack_netlink

It is the one in charge of kernel and userspace information exchange regarding connection tracking.
It provides features to dump the conntrack table or modify entries in the conntrack. For example the conntrack tool mentioned before is using that communication method to get the listing of connection tracking entries. But the feature that interest us in ulogd is the event mode. For each event in the life of a connection, a message is sent to the userspace. Ulogd is able to listen to these messages and this gives it the ability to store all information on the life of the connection in connection tracking.

Depending on the protocol you have on your network, you may need to run on of the following:

modprobe nf_conntrack_ipv4
modprobe nf_conntrack_ipv6

Ulogd setup

Our first objective will simply be to log all NAT decisions to a syslog-like file on disk. In term of connection tracking, this means we will log all connection in the NEW state. This way we will get information about any packet going through the firewall with the associated NAT transformation.

If you install from sources, copy ulogd.conf at the root of ulogd sources to your config directory (usually /usr/local/etc/. And start your favorite editor on it.

Ulogd is doing logging based on stack definition. A stack is one chain of plugins starting from a input plugin, finishing with an output one, and with filter in the middle. In our case, we want to get packet from Netfilter conntrack and the corresponding plugin is NFCT. The first example of stack containing NFCT in the ulogd.conf file is the one we are interested in, so we uncomment it:

stack=ct1:NFCT,ip2str1:IP2STR,print1:PRINTFLOW,emu1:LOGEMU

We are not sure that the setup of input and output plugin will be correct. For now, let’s just check the output:

[emu1]
file="/var/log/ulogd_syslogemu.log"
sync=1

As you may have seen, emu1 is also used by packet logging. So it may be a good idea that we have our own output file for connection tracking event.
To do that, we update the stack:

stack=ct1:NFCT,ip2str1:IP2STR,print1:PRINTFLOW,emunfct1:LOGEMU

and create a new config below emu1:

[emunfct1]
file="/var/log/ulogd_nfct.log"
sync=1

We have changed file name and keep the sync option which permit to avoid the a delay in write due to buffering effect during write which can be very annoying when debugging a setup.

Now, we can test:

ulogd -v

In /var/log/ulogd_nfct.log, we see things like

Feb 22 10:50:36 ice-age2 [DESTROY] ORIG: SRC=61.174.51.209 DST=192.168.1.129 PROTO=TCP SPT=6000 DPT=22 PKTS=0 BYTES=0 , REPLY: SRC=192.168.1.129 DST=61.174.51.209 PROTO=TCP SPT=22 DPT=6000 PKTS=0 BYTES=0

So we only have destruction messages. This is not exactly what we wanted to have. We are interested in NEW message that will allow us to have a correct timing of the event. Reading ulogd.conf file, it seems there is no information about chossing the event types. But let’s ask to the NFCT input plugin its capabilities. To do that we use option -i of ulogd:

# ulogd -v -i /usr/local/lib/ulogd/ulogd_inpflow_NFCT.so 
Name: NFCT
Config options:
        Var: pollinterval (Integer, Default: 0)
        Var: hash_enable (Integer, Default: 1)
        Var: hash_buckets (Integer, Default: 8192)
        Var: hash_max_entries (Integer, Default: 32768)
        Var: event_mask (Integer, Default: 5)
        Var: netlink_socket_buffer_size (Integer, Default: 0)
        Var: netlink_socket_buffer_maxsize (Integer, Default: 0)
        Var: netlink_resync_timeout (Integer, Default: 60)
        Var: reliable (Integer, Default: 0)
        Var: accept_src_filter (String, Default: )
        Var: accept_dst_filter (String, Default: )
        Var: accept_proto_filter (String, Default: )
...

The listing start with the configuration keys. One of them is event_mask. This is a the one controlling which events are sent from kernel to userspace.
The value is a mask combining some of the following values:

NF_NETLINK_CONNTRACK_NEW: 0x00000001
NF_NETLINK_CONNTRACK_UPDATE: 0x00000002
NF_NETLINK_CONNTRACK_DESTROY: 0x00000004

So default value of 5 is to listen to NEW and DESTROY events.
Clever reader will then ask: why did we only see DESTROY messages in that case. This is because ulogd NFCT plugin is running by default in hash_enable mode. In this mode, one single message is output for each connection (at end) and a hash is maintained in the kernel to store the info (here initial timestamp of the connection).
Our setup don’t need this feature because we only want to get the NAT transformation so we switch the hash feature off and limit the events to NEW:

[ct1]
event_mask=0x00000001
hash_enable=0

We can now restart ulogd and check the log file:

Feb 22 11:59:34 ice-age2 [NEW] ORIG: SRC=2a01:e35:1394:5bd0:da50:b6ff:fe3c:4250 DST=2001:41d0:1:9598::1 PROTO=TCP SPT=51162 DPT=22 PKTS=0 BYTES=0 , REPLY: SRC=2001:41d0:1:9598::1 DST=2a01:e35:1394:5bd0:da50:b6ff:fe3c:4250 PROTO=TCP SPT=22 DPT=51162 PKTS=0 BYTES=0
Feb 22 11:59:43 ice-age2 [NEW] ORIG: SRC=192.168.1.129 DST=68.232.35.139 PROTO=TCP SPT=60846 DPT=443 PKTS=0 BYTES=0 , REPLY: SRC=68.232.35.139 DST=1.2.3.4 PROTO=TCP SPT=443 DPT=60946 PKTS=0 BYTES=0

This is exactly what we wanted, we have a trace of all NAT transformation.

Maintain an history of connection tracking

Objective

We want to log all information describing a connection so we have a trace of what is going on the firewall. This means we need at least:

IP information for orig and reply way
Timestamp of start and end of connection
Bandwidth used by the connection

Kernel setup

By default, recent kernel have a limited handling of connection tracking. Some useful fields are not stored for performance reason. This is the case of the accounting (number of packets and bytes) and the case of the timestamp of the connection creation. The advantage of getting accounting information is trivial as you get information on bandwidth usage. Regarding timestamp, the interest is on implementation side. It allows ulogd to get all information needed for describing a connection in one single message (the DESTROY one). And ulogd does not need anymore to maintain a hash table to get the info and propagate it at exit.

To activate both features, you have to do:

 echo "1"> /proc/sys/net/netfilter/nf_conntrack_acct
 echo "1"> /proc/sys/net/netfilter/nf_conntrack_timestamp

Ulogd setup

For following setup, you will need ulogd build from git or a ulogd at a version superior or equal to 2.0.4.

Let’s first use JSON output to get the information in a readable format. We need to define a stack:

stack=ct2:NFCT,ip2str1:IP2STR,jsonnfct1:JSON

On ct2 side, we don’t want to use the hash and we only want to get DESTROY message, so our configuration looks like:

[ct2]
hash_enable=0
event_mask=0x00000004

Regarding, jsonnfct1 we could have reused the default JSON configuration but for ease of testing we will dedicate a file to the NFCT logging:

[jsonnfct1]
sync=1
file="/var/log/ulogd_nfct.json"

After a ulogd restart, we’ve got this type of entries:

{"reply.ip.daddr.str": "2a01:e35:1394:5ad0:da50:e6ff:fe3c:1250", "oob.protocol": 0, "dvc": "Netfilter", "timestamp": "Sat Feb 22 12:27:04 2014", "orig.ip.protocol": 6, "reply.raw.pktcount": 20, "flow.end.sec": 1393068424, "orig.l4.sport": 51384, "orig.l4.dport": 22, "orig.raw.pktlen": 5600, "ct.id": 1384991512, "orig.raw.pktcount": 23, "reply.raw.pktlen": 4328, "reply.ip.protocol": 6, "reply.l4.sport": 22, "reply.l4.dport": 51384, "ct.mark": 0, "ct.event": 4, "flow.start.sec": 1393068302, "flow.start.usec": 637516, "flow.end.usec": 403240, "reply.ip.saddr.str": "2001:41d0:1:9598::1", "oob.family": 10, "src_ip": "2a01:e35:1394:5ad0:da50:e6ff:fe3c:1250", "dest_ip": "2001:41d0:1:9598::1"}

The fields we wanted are here:

flow.start.* keys store the timestamp of flow start
flow.end.* keys store the end of the connection
*.raw.pkt* keys store the accounting information

You can then add this file to the file parsed by logstash. For that if you can use information from Using ulogd and JSON output and modify the input section:

input {
   file { 
      path => [ "/var/log/ulogd.json", "/var/log/ulogd_nfct.json"]
      codec =>   json 
   }
}

One interesting information in a connection tracking entry is the duration. But the field is not available in ulogd JSON output and it is not possible to do mathematical operations in Kibana. A solution to get the information is to add a filter in logstash.conf to compute the duration:

filter {
  if [type] == "json-log" {
    ruby {
      code => "if event['ct.id']; event['flow.duration.sec']=(event['flow.end.sec'].to_i - event['flow.start.sec'].to_i); end"
    }
  }
}

A thing to notice to understand the obtained duration is that a connection is dying following contextual timeout. For example, in the case of a TCP connection, even after a FIN packet there’s a timeout applied. So a short connection will at least be of the duration of the timeout.

An other logging method is PostgreSQL. The stack to use is almost the same as JSON one but use, as you may have guess, the PGSQL plugin:

stack=ct2:NFCT,ip2str1:IP2STR,pgsql2:PGSQL

The configuration of the PostgreSQL plugin is easy based on the setup available in the configuration:

[pgsql2]
db="nulog"
host="localhost"
user="nupik"
table="ulog2_ct"
#schema="public"
pass="changeme"
procedure="INSERT_CT"

I’m not the one who will explain how to connect to a PostgreSQL database and create a ulogd2 database. See Pollux post for that: ulogd2: the new userspace logging daemon for netfilter/iptables (part 2)

Other setup are possible. For example, you can maintain a copy of the connection tracking table in the database and also keep the history.
To do that you need to use the INSERT_OR_REPLACE_CT procedure and a connection tracking INPUT plugin not using the hash table but getting NEW and DESTROY events:

stack=ct2:NFCT,ip2str1:IP2STR,pgsql2:PGSQL

[ct2]
hash_enable=0

[pgsql2]
db="nulog"
host="localhost"
user="nupik"
table="ulog2_ct"
#schema="public"
pass="changeme"
procedure="INSERT_OR_REPLACE_CT"

Connection will be inserted in the table when getting the NEW event and the connection entry in the database will be updated when the DESTROY message will be received.

Suricata and Nftables

Iptables and suricata as IPS

Building a Suricata ruleset with iptables has always been a complicated task when trying to combined the rules that are necessary for the IPS with the firewall rules. Suricata has always used Netfilter advanced features allowing some more or less tricky methods to be used.

For the one not familiar with IPS using Netfilter, here’s a few starting points:

IPS receives the packet coming from kernel via rules using the NFQUEUE target
The IPS must received all packets of a given flow to be able to handle detection cleanly
The NFQUEUE target is a terminal target: when the IPS verdicts a packet, it is or accepted (and leave current chain)

So the ruleset needs to send all packets to the IPS. A basic ruleset for an IPS could thus looks like:

iptables -A FORWARD -j NFQUEUE

With such a ruleset, all packets going through the box are sent to the IPS.

If now you want to combine this with your ruleset, then usually your first try is to add rules to the filter chain:

iptables -A FORWARD -j NFQUEUE
iptables -A FORWARD -m conntrack --ctstate ESTABLISHED -j ACCEPT
# your firewall rules here

But this will not work because of point 2: All packets sent via NFQUEUE to the IPS are or blocked or if accepted, leave the FORWARD chain directly and are going for evaluation to the next chain (mangle POSTROUTING in our case).
With such a ruleset, the result is that there is no firewall but an IPS in place.

As mentioned before there is some existing solutions (see Building a Suricata ruleset for extensive information). The simplest one is to dedicated one another chain such as mangle to IPS:

iptables -A FORWARD -t mangle -j NFQUEUE
iptables -A FORWARD -m conntrack --ctstate ESTABLISHED -j ACCEPT
# your firewall rules here

No conflict here but you have to be sure nothing in your system will use the the mangle table or you will have the same problem as the one seen previously in the filter chain. So there was no universal and simple solution to implement an IPS and a firewall ruleset with iptables.

IPS the easy way with Nftables

In Nftables, chains are defined by the user using nft command line. The user can specify:

The hook: the place in packet life where the chain will be set. See this diagram for more info.
- prerouting: chain will be placed before packet are routed
- input: chain will receive packets going to the box
- forward: chain will receive packets routed by the box
- postrouting: chain will receive packets after routing and before sending packets
- output: chain will receive packet sent by the host
The chain type: define the objective of the chain
- filter: chain will filter packet
- nat: chain will only contains NAT rules
- route: chain is containing rule that may change the route (previously now as mangle)
The priority: define the evaluation order of the different chains of a given hook. It is an integer that can be freely specified. But it also permits to place chain before or after some internal operation such as connection tracking.

In our case, we want to act on forwarded packets. And we want to have a chain for filtering followed by a chain for IPS. So the setup is simple of chain is simple

nft -i
nft> add table filter
nft> add chain filter firewall { type filter hook forward priority 0;}
nft> add chain filter IPS { type filter hook forward priority 10;}

With this setup, a packet will reach the firewall chain first where it will be filtered. If the packet is blocked, it will be destroy inside of the kernel. It the packet is accepted it will then jump to the next chain following order of increasing priority. In our case, the packet reaches the IPS chain.

Now, that we’ve got our chains we can add filtering rules, for example:

nft add rule filter firewall ct state established accept
nft add rule filter firewall tcp dport ssh counter accept
nft add rule filter firewall tcp dport 443 accept
nft add rule filter firewall counter log drop

And for our Suricata IPS, that’s just trivial:

nft add rule filter IPS queue

A bit more details

The queue target in nftables

The complete support for the queue target will be available in Linux 3.14. The syntax looks as follow:

nft add rule filter output queue num 3 total 2 options fanout

This rule sends matching packets to 2 load-balanced queues (total 2) starting at 3 (num 3).
fanout is one of the two queue options:

fanout: When used together with total load balancing, this will use the CPU ID as an index to map packets to the queues. The idea is that you can improve perfor mance if there’s a queue per CPU. This requires total with a value superior to 1 to be specified.
bypass: By default, if no userspace program is listening on an Netfilter queue,then all packets that are to be queued are dropped. When this option is used, the queue rule behaves like ACCEPT instead, and the packet will move on to the next table.

For a complete description of queueing mechanism in Netfilter see Using NFQUEUE and libnetfilter_queue.

If you want to test this before Linux 3.14 release, you can get nft sources from nftables git and use next-3.14 branch.

Chain priority

For reference, here are the priority values of some important internal operations and of iptables static chains:

NF_IP_PRI_CONNTRACK_DEFRAG (-400): priority of defragmentation
NF_IP_PRI_RAW (-300): traditional priority of the raw table placed before connection tracking operation
NF_IP_PRI_SELINUX_FIRST (-225): SELinux operations
NF_IP_PRI_CONNTRACK (-200): Connection tracking operations
NF_IP_PRI_MANGLE (-150): mangle operation
NF_IP_PRI_NAT_DST (-100): destination NAT
NF_IP_PRI_FILTER (0): filtering operation, the filter table
NF_IP_PRI_SECURITY (50): Place of security table where secmark can be set for example
NF_IP_PRI_NAT_SRC (100): source NAT
NF_IP_PRI_SELINUX_LAST (225): SELInux at packet exit
NF_IP_PRI_CONNTRACK_HELPER (300): connection tracking at exit

For example, one can create in nftables an equivalent of the raw PREROUTING chain of iptables by doing:

# nft -i
nft> add chain filter pre_raw { type filter hook prerouting priority -300;}

Using ulogd and JSON output

Ulogd and JSON output

In February 2014, I’ve commited a new output plugin to ulogd, the userspace logging daemon for Netfilter. This is a JSON output plugin which output logs into a file in JSON format. The interest of the JSON format is that it is easily parsed by software just as logstash. And once data are understood by logstash, you can get some nice and useful dashboard in Kibana:

This post explains how to configure ulogd and iptables to do packet logging and differentiate accepted and blocked packets. If you want to see how cool is the result, just check my post: Investigation on an attack tool used in China.

Installation

At the time of this writing, the JSON output plugin for ulogd is only available in the git tree. Ulogd 2.0.4 will contain the feature.

If you need to get the source, you can do:

git clone git://git.netfilter.org/ulogd2

Then the build is standard:

./autogen.sh
./configure
make
sudo make install

Please note that at the end of the configure, you must see:

Ulogd configuration:
  Input plugins:
    NFLOG plugin:			yes
...
    NFACCT plugin:			yes
  Output plugins:
    PCAP plugin:			yes
...
    JSON plugin:			yes

If the JSON plugin is not build, you need to install libjansson devel files on your system and rerun configure.

Configuration

Ulogd configuration

All the edits are made in the ulogd.conf file. With default configure option the file is in /usr/local/etc/.

First, you need to activate the JSON plugin:

plugin="/home/eric/builds/ulogd/lib/ulogd/ulogd_output_JSON.so"

Then we define two stacks for logging. It will be used to differentiate accepted packets from dropped packets:

stack=log2:NFLOG,base1:BASE,ifi1:IFINDEX,ip2str1:IP2STR,mac2str1:HWHDR,json1:JSON
stack=log3:NFLOG,base1:BASE,ifi1:IFINDEX,ip2str1:IP2STR,mac2str1:HWHDR,json1:JSON

The first stack will be used to log accepted packet, so we the numeric_label to 1 in set in [log2].
In [log3], we use a numeric_label of 0.

[log2]
group=1 # Group has to be different from the one use in log1
numeric_label=1

[log3]
group=2 # Group has to be different from the one use in log1/log2
numeric_label=0 # you can label the log info based on the packet verdict

The last thing to edit is the configuration of the JSON instance:

[json1]
sync=1
device="My awesome FW"
boolean_label=1

Here we say we want log and write on disk configuration (via sync) and we named our device My awesome FW.
Last value boolean_label is the most tricky. It this configuration variable is set to 1, the numeric_label will
be used to decide if a packet has been accepted or blocked. It this variable is set non null, then the packet is seen as allowed.
If not, then it is seen as blocked.

Sample Iptables rules

In this example, packets to port 22 are logged and accepted and thus are logged in nflog-group 1. Packet in the default drop rule are sent to group 2 because they are dropped.

iptables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
iptables -A INPUT ! -i lo -p tcp -m tcp --dport 22 --tcp-flags FIN,SYN,RST,ACK SYN -m state --state NEW -j NFLOG --nflog-prefix  "SSH Attempt" --nflog-group 1
iptables -A INPUT -i lo -j ACCEPT
iptables -A INPUT -p tcp -m tcp --dport 22 -m state --state NEW -j ACCEPT
iptables -A INPUT -j NFLOG --nflog-prefix  "Input IPv4 Default DROP" --nflog-group 2

There is no difference in IPv6, we just use nflog-group 1 and 2 with the same purpose:

ip6tables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
ip6tables -A INPUT ! -i lo -p tcp -m tcp --dport 22 --tcp-flags FIN,SYN,RST,ACK SYN -m state --state NEW -j NFLOG --nflog-prefix  "SSH Attempt" --nflog-group 1
ip6tables -A INPUT ! -i lo -p ipv6-icmp -m icmp6 --icmpv6-type 128 -m state --state NEW -j NFLOG --nflog-prefix  "Input ICMPv6" --nflog-group 1
ip6tables -A INPUT -p ipv6-icmp -j ACCEPT
ip6tables -A INPUT -p tcp -m tcp --dport 22 -m state --state NEW -j ACCEPT
ip6tables -A INPUT -i lo -j ACCEPT
ip6tables -A INPUT -j NFLOG --nflog-prefix  "Input IPv6 Default DROP" --nflog-group 2

Logstash configuration

Logstash configuration is simple. You must simply declare the ulogd.json file as input and optionaly you can activate geoip on the src_ip key:

input {
   file { 
      path => [ "/var/log/ulogd.json"]
      codec =>   json 
   }
}

filter {
  if [src_ip]  {
    geoip {
      source => "src_ip"
      target => "geoip"
      add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
      add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}"  ]
    }
    mutate {
      convert => [ "[geoip][coordinates]", "float" ]
    }
  }
}

output { 
  stdout { codec => rubydebug }
  elasticsearch { embedded => true }
}

Usage

To start ulogd in daemon mode, simply run:

ulogd -d

You can download logstash from their website and start it with the following command line:

java -jar logstash-1.3.3-flatjar.jar agent -f etc/logstash.conf --log log/logstash-indexer.out -- web

Once done, just point your browser to localhost:9292 and enjoy nice and interesting graphs.