Using linux perf tools for Suricata performance analysis

Introduction

Perf is a great tool to analyse performances on Linux boxes. For example, perf top will give you this type of output on a box running Suricata on a high speed network:

Events: 32K cycles                                                                                                                                                                                                                            
 28.41%  suricata            [.] SCACSearch
 19.86%  libc-2.15.so        [.] tolower
 17.83%  suricata            [.] SigMatchSignaturesBuildMatchArray
  6.11%  suricata            [.] SigMatchSignaturesBuildMatchArrayAddSignature
  2.06%  suricata            [.] tolower@plt
  1.70%  libpthread-2.15.so  [.] pthread_mutex_trylock
  1.17%  suricata            [.] StreamTcpGetFlowState
  1.10%  libc-2.15.so        [.] __memcpy_ssse3_back
  0.90%  libpthread-2.15.so  [.] pthread_mutex_lock

The functions are sorted by CPU consumption. Using arrow key it is possible to jump into the annotated code to see where most CPU cycles are used.

This is really useful but in the case of a function like pthread_mutex_trylock, the interesting part is to be able to find where this function is called.

Getting function call graph in perf

This stack overflow question lead me to the solution.

I’ve started to build suricata with the -fno-omit-frame-pointer option:

./configure --enable-pfring --enable-luajit CFLAGS="-fno-omit-frame-pointer"
make
make install

Once suricata was restarted (with pid being 9366), I was then able to record the data:

sudo perf record -a --call-graph -p 9366

Extracting the call graph was then possible by running:

sudo perf report --call-graph --stdio

The result is a huge detailed report. For example, here’s the part on pthread_mutex_lock:

     0.94%  Suricata-Main  libpthread-2.15.so     [.] pthread_mutex_lock
            |
            --- pthread_mutex_lock
               |
               |--48.69%-- FlowHandlePacket
               |          |
               |          |--53.04%-- DecodeUDP
               |          |          |
               |          |          |--95.84%-- DecodeIPV4
               |          |          |          |
               |          |          |          |--99.97%-- DecodeVLAN
               |          |          |          |          DecodeEthernet
               |          |          |          |          DecodePfring
               |          |          |          |          TmThreadsSlotVarRun
               |          |          |          |          TmThreadsSlotProcessPkt
               |          |          |          |          ReceivePfringLoop
               |          |          |          |          TmThreadsSlotPktAcqLoop
               |          |          |          |          start_thread
               |          |          |           --0.03%-- [...]
               |          |          |
               |          |           --4.16%-- DecodeIPV6
               |          |                     |
               |          |                     |--97.59%-- DecodeTunnel
               |          |                     |          |
               |          |                     |          |--99.18%-- DecodeTeredo
               |          |                     |          |          DecodeUDP
               |          |                     |          |          DecodeIPV4
               |          |                     |          |          DecodeVLAN
               |          |                     |          |          DecodeEthernet
               |          |                     |          |          DecodePfring
               |          |                     |          |          TmThreadsSlotVarRun
               |          |                     |          |          TmThreadsSlotProcessPkt
               |          |                     |          |          ReceivePfringLoop
               |          |                     |          |          TmThreadsSlotPktAcqLoop
               |          |                     |          |          start_thread
               |          |                     |          |
               |          |                     |           --0.82%-- DecodeIPV4
               |          |                     |                     DecodeVLAN
               |          |                     |                     DecodeEthernet
               |          |                     |                     DecodePfring
               |          |                     |                     TmThreadsSlotVarRun
               |          |                     |                     TmThreadsSlotProcessPkt
               |          |                     |                     ReceivePfringLoop
               |          |                     |                     TmThreadsSlotPktAcqLoop
               |          |                     |                     start_thread
               |          |                     |
               |          |                      --2.41%-- DecodeIPV6
               |          |                                DecodeTunnel
               |          |                                DecodeTeredo
               |          |                                DecodeUDP
               |          |                                DecodeIPV4
               |          |                                DecodeVLAN
               |          |                                DecodeEthernet
               |          |                                DecodePfring
               |          |                                TmThreadsSlotVarRun
               |          |                                TmThreadsSlotProcessPkt
               |          |                                ReceivePfringLoop
               |          |                                TmThreadsSlotPktAcqLoop
               |          |                                start_thread

A bit of fun with IPv6 setup

When doing some tests on Suricata, I needed to setup a small IPv6 network. The setup is simple with one laptop which is Ethernet connected to a desktop. And the desktop host a Virtualbox system.
This way, the desktop can act as a router with laptop on eth0 and Vbox on vboxnet0.

To setup the desktop/router, I’ve used:

ip a a 4::1/64 dev eth0
ip a a 2::1/64 dev vboxnet0
echo "1">/proc/sys/net/ipv6/conf/all/forwarding

To setup the laptop who already has a IPv6 public address on eth0, I’ve done:

ip a a 4::4/64 dev wlan0
ip -6 r a 2::2/128 via 4::1 src 4::2 metric 128

Almost same thing on the Vbox:

ip a a 2::2/64 dev eth0
ip -6 r a default via 2::1

This setup should be enough but when I tried to do from the laptop:

ping6 2::2

I got a failure.

I then checked the routing on the laptop:

# ip r g 2::2
2::2 via 4::1 dev wlan0  src 2a01:e35:1394:5bd0:f8b3:5a98:2715:6c8d  metric 128

A public IPv6 address is used as source address and this is confirmed by a tcpdump on the desktop:

# tcpdump -i eth0 icmp6 -nv
10:54:48.841761 IP6 (hlim 64, next-header ICMPv6 (58) payload length: 64) 2a01:e35:1394:5bd0:f8b3:5a98:2715:6c8d > 4::1: [icmp6 sum ok] ICMP6, echo request, seq 11

And the desktop does not know how to reach this IP address because it does not have a public IPv6 address.

On the laptop, I’ve dumped wlan0 config to check the address:

# ip a l dev wlan0
3: wlan0:  mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether c4:85:08:33:c4:c8 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.137/24 brd 192.168.1.255 scope global wlan0
       valid_lft forever preferred_lft forever
    inet6 4::4/64 scope global
       valid_lft forever preferred_lft forever
    inet6 2a01:e35:1434:5bd0:f8b3:5a98:2715:6c8d/64 scope global temporary dynamic
       valid_lft 86251sec preferred_lft 84589sec
    inet6 2a01:e35:1434:5bd0:c685:8ff:fe33:c4c8/64 scope global dynamic
       valid_lft 86251sec preferred_lft 86251sec
    inet6 fe80::c685:8ff:fe33:c4c8/64 scope link
       valid_lft forever preferred_lft forever

And, yes, 2a01:e35:1394:5bd0:f8b3:5a98:2715:6c8d is a dynamic IPv6 address which is used by default to get out (and bring a little privacy).

Deleting the address did fix the ping issue:

# ip a d 2a01:e35:1394:5bd0:f8b3:5a98:2715:6c8d/64 dev wlan0
# ping6 2::2
PING 2::2(2::2) 56 data bytes
64 bytes from 2::2: icmp_seq=1 ttl=63 time=5.47 ms

And getting the route did confirm the fix was working:

# ip r g 2::2
2::2 via 4::1 dev wlan0  src 4::4  metric 128 

All that to say, that it can be useful to desactivate temporary IPv6 address before setting up a test network:

echo "0" > /proc/sys/net/ipv6/conf/wlan0/use_tempaddr

Talk about nftables at Kernel Recipes 2013

I’ve just gave a talk about nftables, the iptables successor, at Kernel Recipes 2013. You can find the slides here:
2013_kernel_recipes_nftables

A description of the talk as well as slides and video are available on Kernel Recipes website

Here’s the video of my talk:

I’ve presented a video of nftables source code evolution:

The video has been generated with gource. Git history of various components have been merged and the file path has been prefixed with project name.

A month in the life of Debian in 2000 and 2012

Visualizing Debian packages upload

Ultimate Debian Database provide a way to get information about all packages upload on Debian repositories accros time. After a discussion with Lucas Nussbaum at Distro Recipes, he made available a webpage to access to a gource compatible file format of packages upload.

Using this I was able to create videos of Debian evolution over time. I’ve generated two videos showing on month of packages upload in 2000 and to compare one month in 2012.

The first video is really peaceful even if the lack of activity cause gource to do some jump in time:

The second video is made with exactly the same time scale and the rhythm is completely crazy:

More info about video generation

The raw data are the following: udd.gource.log.bz2. I’ve transformed them to add section information to package name by using the following script:

[python]
#!/usr/bin/python

import fileinput
import apt

cache = apt.Cache()

for line in fileinput.input():
[date, user, mode, package] = line.split(“|”)
pack = package.rstrip()

if len(user) == 0:
continue

try:
pkg = cache[pack] # Access the Package object for python-apt
package = pkg.section + “/” + pack
except KeyError:
package = “undef” + “/” + pack

print date + “|” + user + “|” + mode + “|” + package
[/python]

The result is the following file: udd.gource-section.log.bz2. Once extracted, it can be visualized in gource:

gource --log-format custom  udd.gource-section.log

Next step was to extract the upload at start (in 2000) and the latest upload (in 2012). I’ve simply used tail and head to do so. The generation of the videos was made using indication given on gource website:

gource -1920x1080 -o - udd.gource-end.log | ffmpeg -y -r 60 -f image2pipe -vcodec ppm -i - -vcodec libvpx -b 13000K debian-2012.webm

David Miller: routing cache is dead, now what ?

The routing cache was maintaining a list of routing decisions. This was an hash table which was highly dynamic and was changing due to traffic. One of the major problem was the garbage collector. An other severe issue was the possibility of DoS using the increase

The routing cache has been suppressed in Linux 3.6 after a 2 years effort by David and the other Linux kernel developers. The global cache has been suppressed and some stored information have been moved to more separate resources like socket.

There was a lot of side effects following this big transformation. On user side, there is no more “neighbour cache overflow” thanks to synchronized sizes of routing and neighbour table.

Metrics were stored in the routing cache entry which has disappeared. So it has been necessary to introduce a separate TCP metrics cache. A netlink interface is available to update/delete/add entry to the cache.

A other side effect of these modifications is that, on TCP socket, xt_owner could be used on input socket but the code needs to be updated.

On security side, the Reverse path filtering has been updated. When activated it is causing up to two extra FIB lookups But when deactivated there is now no overhead at all.

Minimal linux kernel config for Virtualbox

I was looking for some minimal Linux kernel configuration for Virtualbox guest and did only find some old one. I thus decide to build one and to publish them.
They are available on github: regit-config

For now, the only published configuration are for Linux kernel 3.5:

Using AF_PACKET zero copy mode in Suricata

Victor Julien has just pushed a new feature to suricata’s git tree. It brings improvements to the AF_PACKET capture mode.

This capture mode can be used on Linux. It is the native way to capture packet. Suricata is able to use the interesting new multithreading feature provided by AF_PACKET on recent kernels: it is possible to have multiple capture threads receiving the packet of a single interface.

The commits add mmaped ring buffer support to AF_PACKET capture and also provide a zero copy mode. Mmaped ring buffer is mechanism similar to the one used by PF_RING. The kernel allocates some memory to store the packets and share this memory with the capture process. Instead of sending messages, the kernel just write to the shared memory and the process capture reads it. This is less consuming in term of CPU ressource and helps to increase the capture rate. But the main avantage of this technique is that the capture process can treat the packets without making a copy and this saves a lot of time

To activate this features, you need a Suricata compiled from latest git and you need to modify some entries in your suricata.yaml file. You have to tell suricata that you want to activate the mmap feature. For example to activate the feature on eth0, you have to add ‘use-mmap’ to your configuration:
[code]
af-packet:
– interface: eth0
use-mmap: yes
[/code]

You can then run Suricata with the command:

suricata -c suricata.yaml --af-packet=eth0

This setup will not activate the zero copy feature which is currently dependant of the running mode. You will need to activate the worker mode to enable zero copy. To do so, run Suricata with a command similar to this one:

suricata -c suricata.yaml --af-packet=eth0 --runmode=workers

This code should provide an interesting performance boost to the AF_PACKET capture system. I’ve no number to provide now but I will be happy to hear some if you make some tests

Upgrading Galaxy S from Android 2.1 to 2.3.3 under Linux

After some time lost by trying in vain to have Kies (of Death) from Samsung oder Odin working under Virtualbox, I’ve found about the exitence of Heimdall. This software has been developped to flash firmware onto Samsung Galaxy S devices.

It did work quiet easily. Upgrade procedure only requires some files download and in my case some usage of the tar command.

The command line was long but simple:
[bash]heimdall flash -pit s1_odin_20100512.pit –factoryfs factoryfs.rfs \
–cache cache.rfs –dbdata dbdata.rfs –param param.lfs \
–kernel zImage –modem modem.bin \
–primary-boot boot.bin –secondary-boot Sbl.bin \
–verbose[/bash]

A GUI named heimdall-frontend is available for people who do not like command line.

Here’s a list of problems I’ve encountered during this update :

  • Going to download mode was not possible on the phone: I had to use adb
  • adb is 32 bit and I had to install 32 bit libs: aptitude install ia32-libs
  • heimdall uses a device /dev/ttyACM0 which is read/write for dialout (and I was not in the group)
  • I had to chain command adp reboot download with heimdall command to have the upgrade starting
  • The first restart was blocked at “Galaxy S” display, I’ve run an adb reboot recover to return to normal behaviour
  • I had no data connection (3G) after upgrade: Restoring default APN configuration fixed the issue

If we omit this little points, the upgrade procedure was fine. Heimdall was very efficient compare to every crap I’ve tried to use on the Windows. Thanks a lot for their work !

IPv6 privacy extensions on Linux

IPv6 global address

The global address is used in IPv6 to communicate with the outside world. This is thus the one that is used as source for any communication and thus in a way identify you on Internet.

Below is a dump of an interface configuration:

eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:22:15:64:42:bd brd ff:ff:ff:ff:ff:ff
    inet6 2a01:f123:1234:5bd0:222:15ff:fe64:42bd/64 scope global dynamic 
       valid_lft 86314sec preferred_lft 86314sec
    inet6 fe80::222:15ff:fe64:42bd/64 scope link 
       valid_lft forever preferred_lft forever

The global address is here 2a01:f123:1234:5bd0:222:15ff:fe64:42bd/64. It is build by using the prefix and adding an identifier build with the hardware address. For example, here the hardware address is 00:22:15:64:42:bd and the global IPv6 address is ending with 22:15ff:fe64:42bd.

It is thus easy to go from the IPv6 global address to the hardware address. To fix this issue and increase the privacy of network user, privacy extensions have been developed.

Privacy extensions

The RFC 3041 describes how to build and use temporary addresses that will be used as source address for connection to the outside world.

To activate this feature, you simply have to modify an entry in /proc. For example to activate the feature on eth0, you can do

echo "2">/proc/sys/net/ipv6/conff/eth0/use_tempaddr

The usage of the option is detailled in the must-read ip-sysctl.txt file:

use_tempaddr - INTEGER
        Preference for Privacy Extensions (RFC3041).
          <= 0 : disable Privacy Extensions
          == 1 : enable Privacy Extensions, but prefer public
                 addresses over temporary addresses.
          >  1 : enable Privacy Extensions and prefer temporary
                 addresses over public addresses.
        Default:  0 (for most devices)
                 -1 (for point-to-point devices and loopback devices)

After network restart (a simple ifdown, ifup of the interface is enough), the output of the ip a command looks like that:

eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:22:15:64:42:bd brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.129/24 brd 192.168.1.255 scope global eth0
    inet6 2a01:f123:1234:5bd0:21f1:f624:d2b8:3702/64 scope global temporary dynamic 
       valid_lft 86314sec preferred_lft 2914sec
    inet6 2a01:f123:1234:5bd0:222:15ff:fe64:42bd/64 scope global dynamic 
       valid_lft 86314sec preferred_lft 86314sec
    inet6 fe80::222:15ff:fe64:42bd/64 scope link 
       valid_lft forever preferred_lft forever

A new temporary address has been added. After preferred_lft seconds, it becomes deprecated and a new address is added:

eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:22:15:64:42:bd brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.129/24 brd 192.168.1.255 scope global eth0
    inet6 2a01:f123:1234:5bd0:55c3:7efd:93d1:5057/64 scope global temporary dynamic 
       valid_lft 85009sec preferred_lft 1672sec
    inet6 2a01:f123:1234:5bd0:21f1:f624:d2b8:3702/64 scope global temporary deprecated dynamic 
       valid_lft 82077sec preferred_lft 0sec
    inet6 2a01:f123:1234:5bd0:222:15ff:fe64:42bd/64 scope global dynamic 
       valid_lft 86398sec preferred_lft 86398sec
    inet6 fe80::222:15ff:fe64:42bd/64 scope link 
       valid_lft forever preferred_lft foreverr

The deprecated address is removed when the valid_lft counter reach zero second.

Some more tuning

The default duration for a prefered adress is of one day. This can be changed by modifying the temp_prefered_lft variable.

For example, you can add to sysctl.conf:

net.ipv6.conf.eth0.temp_prefered_lft = 7200

The default validity length of the addresses can be changed via the temp_valid_lft variable.

The max_desync_factor set the max random time to wait before asking a new address. This is used to avoid that all computers in network ask for an address at the same time.
On side effect is that if you set the prefered or valid time to a low value, the max_desync_factor must also be decreased. If not, there will be long time period without temporary address.

If temp_prefered_lft is multiple time lower than temp_valid_lft, then the deprecated addresses will accumulate. To avoid overloading the kernel, a maximum number of addresses is set.
Equal to 16 by default, it can be changed by setting the max_addresses sysctl variable.

Known issues and problems

As the temporary address is used for connection to the outside and has a limited duration, some long duration connections (tink ssh) will be cut when the temporary address is removed.

I’ve also observed a problem when the maximum number of addresses is reached:

ipv6_create_tempaddr(): retry temporary address regeneration.
ipv6_create_tempaddr(): regeneration time exceeded. disabled temporary address support.

The result was that the temporary address support was disabled and the standard global address was used again. When setting temp_prefered_lft to 3600 and keeping temp_valid_ft to default value, the problem is reproduced easily.

Conclusion

The support of IPv6 privacy extensions is correct but the lack of link with existing connection can cause the some services to be disrupted. A easy to use per-software selection of address could be really interesting to avoid these problems.