Slides of my talks at Lecce

I’ve been invited by SaLUG to Lecce to give some talks during their Geek Evening. I’ve done a talk on nftables and one of suricata.

Lecce by night
Lecce by night

The nftables talk was about the motivation behind the change from iptables.

Here are the slides: Nftables

The talk on Suricata was explaining the different feature of Suricata and was showing how I’ve used it to make a study of SSH bruteforce.

Here are the slides:
Suricata, Netfilter and the PRC.

Thanks a lot to Giuseppe Longo, Luca Greco and all the SaLUG team, you have been wonderful hosts!

Speeding up scapy packets sending

Sending packets with scapy

I’m currently doing some code based on scapy. This code reads data from a possibly huge file and send a packet for each line in the file using the contained information.
So the code contains a simple loop and uses sendp because the frame must be sent at layer 2.

     def run(self):
         filedesc = open(self.filename, 'r')
         # loop on read line
         for line in filedesc:
             # Build and send packet
             sendp(pkt, iface = self.iface, verbose = verbose)
             # Inter packet treatment

Doing that the performance are a bit deceptive. For 18 packets, we’ve got:

    real    0m2.437s
    user    0m0.056s
    sys     0m0.012s

If we strace the code, the explanation is quite obvious:

socket(PF_PACKET, SOCK_RAW, 768)        = 4
setsockopt(4, SOL_SOCKET, SO_RCVBUF, [0], 4) = 0
select(5, [4], [], [], {0, 0})          = 0 (Timeout)
ioctl(4, SIOCGIFINDEX, {ifr_name="lo", ifr_index=1}) = 0
bind(4, {sa_family=AF_PACKET, proto=0x03, if1, pkttype=PACKET_HOST, addr(0)={0, }, 20) = 0
setsockopt(4, SOL_SOCKET, SO_RCVBUF, [1073741824], 4) = 0
setsockopt(4, SOL_SOCKET, SO_SNDBUF, [1073741824], 4) = 0
getsockname(4, {sa_family=AF_PACKET, proto=0x03, if1, pkttype=PACKET_HOST, addr(6)={772, 000000000000}, [18]) = 0
ioctl(4, SIOCGIFNAME, {ifr_index=1, ifr_name="lo"}) = 0
sendto(4, "\377\377\377\377\377\377\0\0\0\0\0\0\10\0E\0\0S}0@\0*\6\265\373\307;\224\24\300\250"..., 97, 0, NULL, 0) = 97
select(0, NULL, NULL, NULL, {0, 0})     = 0 (Timeout)
close(4)                                = 0
socket(PF_PACKET, SOCK_RAW, 768)        = 4
setsockopt(4, SOL_SOCKET, SO_RCVBUF, [0], 4) = 0
select(5, [4], [], [], {0, 0})          = 0 (Timeout)
ioctl(4, SIOCGIFINDEX, {ifr_name="lo", ifr_index=1}) = 0
bind(4, {sa_family=AF_PACKET, proto=0x03, if1, pkttype=PACKET_HOST, addr(0)={0, }, 20) = 0
setsockopt(4, SOL_SOCKET, SO_RCVBUF, [1073741824], 4) = 0
setsockopt(4, SOL_SOCKET, SO_SNDBUF, [1073741824], 4) = 0
getsockname(4, {sa_family=AF_PACKET, proto=0x03, if1, pkttype=PACKET_HOST, addr(6)={772, 000000000000}, [18]) = 0
ioctl(4, SIOCGIFNAME, {ifr_index=1, ifr_name="lo"}) = 0
sendto(4, "\377\377\377\377\377\377\0\0\0\0\0\0\10\0E\0\0004}1@\0*\6\266\31\307;\224\24\300\250"..., 66, 0, NULL, 0) = 66
select(0, NULL, NULL, NULL, {0, 0})     = 0 (Timeout)
close(4)                                = 0

For each packet, a new socket is opened and this takes age.

Speeding up the sending

To speed up the sending, one solution is to build a list of packets and to send that list via a sendp() call.

     def run(self):
         filedesc = open(self.filename, 'r')
         pkt_list = []
         # loop on read line
         for line in filedesc:
             # Build and send packet
             pkt_list.append(pkt)
         sendp(pkt_list, iface = self.iface, verbose = verbose)

This is not possible in our case due to the inter packet treatment we have to do.
So the best way is to reuse the socket. This can be done easily when you’ve read the documentation^W code:

@@ -27,6 +27,7 @@ class replay:
     def run(self):
         # open filename
         filedesc = open(self.filename, 'r')
+        s = conf.L2socket(iface=self.iface)
         # loop on read line
         for line in filedesc:
             # Build and send packet
-            sendp(pkt, iface = self.iface, verbose = verbose)
+            s.send(pkt)

The idea is to create a socket via the function used in sendp() and to use the send() function of the object to send packets.

With that modification, the performance are far better:

    real    0m0.108s
    user    0m0.064s
    sys     0m0.004s

I’m not a scapy expert so ping me if there is a better way to do this.

Using linux perf tools for Suricata performance analysis

Introduction

Perf is a great tool to analyse performances on Linux boxes. For example, perf top will give you this type of output on a box running Suricata on a high speed network:

Events: 32K cycles                                                                                                                                                                                                                            
 28.41%  suricata            [.] SCACSearch
 19.86%  libc-2.15.so        [.] tolower
 17.83%  suricata            [.] SigMatchSignaturesBuildMatchArray
  6.11%  suricata            [.] SigMatchSignaturesBuildMatchArrayAddSignature
  2.06%  suricata            [.] tolower@plt
  1.70%  libpthread-2.15.so  [.] pthread_mutex_trylock
  1.17%  suricata            [.] StreamTcpGetFlowState
  1.10%  libc-2.15.so        [.] __memcpy_ssse3_back
  0.90%  libpthread-2.15.so  [.] pthread_mutex_lock

The functions are sorted by CPU consumption. Using arrow key it is possible to jump into the annotated code to see where most CPU cycles are used.

This is really useful but in the case of a function like pthread_mutex_trylock, the interesting part is to be able to find where this function is called.

Getting function call graph in perf

This stack overflow question lead me to the solution.

I’ve started to build suricata with the -fno-omit-frame-pointer option:

./configure --enable-pfring --enable-luajit CFLAGS="-fno-omit-frame-pointer"
make
make install

Once suricata was restarted (with pid being 9366), I was then able to record the data:

sudo perf record -a --call-graph -p 9366

Extracting the call graph was then possible by running:

sudo perf report --call-graph --stdio

The result is a huge detailed report. For example, here’s the part on pthread_mutex_lock:

     0.94%  Suricata-Main  libpthread-2.15.so     [.] pthread_mutex_lock
            |
            --- pthread_mutex_lock
               |
               |--48.69%-- FlowHandlePacket
               |          |
               |          |--53.04%-- DecodeUDP
               |          |          |
               |          |          |--95.84%-- DecodeIPV4
               |          |          |          |
               |          |          |          |--99.97%-- DecodeVLAN
               |          |          |          |          DecodeEthernet
               |          |          |          |          DecodePfring
               |          |          |          |          TmThreadsSlotVarRun
               |          |          |          |          TmThreadsSlotProcessPkt
               |          |          |          |          ReceivePfringLoop
               |          |          |          |          TmThreadsSlotPktAcqLoop
               |          |          |          |          start_thread
               |          |          |           --0.03%-- [...]
               |          |          |
               |          |           --4.16%-- DecodeIPV6
               |          |                     |
               |          |                     |--97.59%-- DecodeTunnel
               |          |                     |          |
               |          |                     |          |--99.18%-- DecodeTeredo
               |          |                     |          |          DecodeUDP
               |          |                     |          |          DecodeIPV4
               |          |                     |          |          DecodeVLAN
               |          |                     |          |          DecodeEthernet
               |          |                     |          |          DecodePfring
               |          |                     |          |          TmThreadsSlotVarRun
               |          |                     |          |          TmThreadsSlotProcessPkt
               |          |                     |          |          ReceivePfringLoop
               |          |                     |          |          TmThreadsSlotPktAcqLoop
               |          |                     |          |          start_thread
               |          |                     |          |
               |          |                     |           --0.82%-- DecodeIPV4
               |          |                     |                     DecodeVLAN
               |          |                     |                     DecodeEthernet
               |          |                     |                     DecodePfring
               |          |                     |                     TmThreadsSlotVarRun
               |          |                     |                     TmThreadsSlotProcessPkt
               |          |                     |                     ReceivePfringLoop
               |          |                     |                     TmThreadsSlotPktAcqLoop
               |          |                     |                     start_thread
               |          |                     |
               |          |                      --2.41%-- DecodeIPV6
               |          |                                DecodeTunnel
               |          |                                DecodeTeredo
               |          |                                DecodeUDP
               |          |                                DecodeIPV4
               |          |                                DecodeVLAN
               |          |                                DecodeEthernet
               |          |                                DecodePfring
               |          |                                TmThreadsSlotVarRun
               |          |                                TmThreadsSlotProcessPkt
               |          |                                ReceivePfringLoop
               |          |                                TmThreadsSlotPktAcqLoop
               |          |                                start_thread

Adding a force build to all builders

Recent versions of buildbot, the continuous integration framework don’t allow by default the force build feature.
This feature can be used to start a build on demand. It is really useful when you’ve updated the build procedure or when you want to test new branches.

It was a little tricky to add it, so I decided to share it. If c is the name of the configuration you build in your master.cfg, you can add after all builders declarations:

from buildbot.schedulers.forcesched import *
c['schedulers'].append(ForceScheduler(name="force", 
                       builderNames = [ builder.getConfigDict()['name'] for builder in c['builders'] ]))

As was saying one of my physic teacher: “easy when you’ve done it once”.

Ulogd 2.0.2, my first release as maintainer

Objectives of this release

So it is my first ulogd2 release as maintainer. I’ve been in charge of the project since 2012 October 30th and this was an opportunity for me to increase my developments on the project. Roadmap was almost empty so I’ve decided to work on issues that were bothering me as a user of the project. I’ve also included two features which are connection tracking event filtering and a Graphite output module. Ulogd is available on Netfilter web site

Conntrack event filtering

When logging connections entries, there is potentially a lot of events. Filtering the events on network parameters is thus a good idea. This can now be done via a series of options:

  • accept_src_filter: log only a connection if source ip of connection belong to the specified networks. This can be a list of network for example 192.168.1.0/24,1:2::/64
  • accept_dst_filter: log only a connection if destination ip of connection belong to specified networks. This can be a list of networks too.
  • accept_proto_filter: log only connection for the specified layer 4 protocol. It can be a list for example tcp,sctp

A GRAPHITE output module

This is the sexiest part of this release. Seth Hall from the Graphite, a scalable realtime graphing solution. I was playing at the moment with the new Netfilter accounting plugin of ulogd2 and my first thought has been that it was a good idea to add a new output ulogd2 plugin to export data to a Graphite server.

You can read more about Graphite output plugin on this dedicated post.

The result was really cool as show the following dashboard:

A better command line

In case of error, ulogd was just dying and telling you to read a log file. It is now possible to add the -v flag which will redirect the output to stdout and let you see what’s going one.

If it is to verbose for you, you can also set log level from command line via the -l option.

Improved build system

I’ve made some light improvement to the build system. First of all, a configuration status is printed at the end of configure. It displays the compiled input and output plugins:

Ulogd configuration:
  Input plugins:
    NFLOG plugin:			yes
    NFCT plugin:			yes
    NFACCT plugin:			yes
  Output plugins:
    PCAP plugin:			yes
    PGSQL plugin:			yes
    MySQL plugin:			yes
    SQLITE3 plugin:			no
    DBI plugin:				no

I’ve also added configure option to disable the building of some input plugins:

  --enable-nflog          Enable nflog module [default=yes]
  --enable-nfct           Enable nfct module [default=yes]
  --enable-nfacct         Enable nfacct module [default=yes]

For example, to disable Netfilter conntrack logging, you can use:

./configure --disable-nfct

.

Coccigrep improved func operation

Coccigrep 1.11 is now available and mainly features some improvements related to the func search. The func operation can be used to search when a structure is used as argument of a function. For example, to search where the Packet structures are freed inside Suricata project, one can run:

$ coccigrep -t Packet -a "SCFree" -o func src/
src/alert-unified2-alert.c:1156 (Packet *p):         SCFree(p);
src/alert-unified2-alert.c:1161 (Packet *p):         SCFree(p);
...
src/alert-unified2-alert.c:1368 (Packet *pkt):         SCFree(pkt);

With coccigrep 1.11, it is now possible to look for a function with a regular expression. For example, to see how a time_t is used in the print function of Suricata which are all starting by SCLog (SCLogDebug, SCLogWarning, …), you can simply run:

$ coccigrep -t time_t -a "SCLog.*" -o func src/ 
src/defrag.c:480 (time_t *dc->timeout):     SCLogDebug("\tTimeout: %"PRIuMAX, (uintmax_t)dc->timeout);

With 1.11 version, the func operation is now more useful. It is also more accurate as casted parameters or direct use of a structure (instead of usage though a pointer) are now supported.

Run a build on all commits in a git branch

Sometime, you need to check that all the commits in a branch are building correctly. For example, when a rebase has been done, it is possible you or diff has made a mistake during the operation. The building operation can be run against all commits of the current branch with the following one-liner (splitted here for more readability):

for COMMIT in $(git log --reverse --format=format:%H origin/master..HEAD); do
    git checkout ${COMMIT} ;
    make -j8 1>/dev/null || { echo "Commit $COMMIT don't build";  break; }
done

The idea is trivial, we build the list of commits with git log using a simple format string (to get only the hash). We add the reverse tag to start from the oldest commit.
For each commit, we checkout and run the build command. If the build fails, we exit from the loop.

The result is a directory with the non-building code. Thus, don’t forget to get back to the original branch ORIG_BRANCH by running a git checkout ORIG_BRANCH.

Set or unset define variables in Coccigrep

Following a discussion with the great Julia Lawall, she added a new feature in coccinelle: it is now possible to define as set or unset some variables. This option has been added in coccigrep 1.9 and requires coccinelle 1.0-rc14.

For example, let’s have a code like Suricata where a lot of unit tests are implemented. The structure of the code is the following:

REGULAR CODE

#ifdef UNITTESTS
 TEST CODE
#endif

When doing search in the regular code, you don’t want to be bothered by results found in the test code. To obtain this result, you can pass the -U UNITTESTS option to coccigrep to tell him to consider UNITTESTS variable as undefined. If you want to define a variable, you can use the -D flag.

If you are using coccigrep inside vim, you can set the coccigrep_path variable with this option. The basic vim syntax is:

let g:coccigrep_path="coccigrep -U UNITTESTS"

As I wanted to have it for all query in my Suricata source directory, I’ve added at the end of my ~/.vim/after/syntax/suricata.vim file:

autocmd BufEnter,BufNewFile,BufRead */git/oisf/* let g:coccigrep_path="coccigrep -U UNITTESTS"

What’s new in coccigrep 1.6?

I did not write any article on coccigrep since the 1.0 release. Here is an update on what has been added to the software since that release.

C++ support

Coccinelle has a basic C++ support which can be activated by using the –cpp flag in coccigrep.

Patches information

The -L -v options on command line will display a description of the match available on the system.

$ coccigrep -L -v
set: Search where a given attribute of structure 'type' is set
 * Confidence: 80%
 * Author: Eric Leblond 
 * Arguments: type, attribute
 * Revision: 2

For the developer, this is obtained from structured comments put at the start of the cocci file:

$ head src/data/set.cocci 
// Author: Eric Leblond 
// Desc: Search where a given attribute of structure 'type' is set
// Confidence: 80%
// Arguments: type, attribute
// Revision: 2
@init@

This is thus an easy way to document the search operation. Please note, that this will also work for the operations put in the user or system custom directory.

Context line display improvement

Guillaume Nault has contributed a series of patches that greatly improved the display of context lines:

$ coccigrep -C 3 -t Packet -a flags -o set decode*c 
decode.c-90            -     }
decode.c-91            - 
decode.c-92            -     PACKET_INITIALIZE(p);
decode.c:93 (Packet *p):     p->flags |= PKT_ALLOC;
decode.c-94            - 
decode.c-95            -     SCLogDebug("allocated a new packet only using alloc...");
decode.c-96            -

Documentation

A man page is now available.

Python 2.5 support

The 1.6 release came with a code modification that permit coccigrep to run with python 2.5. Some users seem to still use this old version of Python and the support was not requiring to degrade coccigrep code. It has even improved it.

Option to read file lists from a file

Thomas Graf has contributed the -l option which provides a way to specify a file containing the list of the files to search in.

Operation improvement

The set operation has been improved and is now more accurate thanks to the support of all related operators.

Conclusion

Coccigrep is becoming more and more mature over time. The existing code base remains and a polishing work is currently under progress. One last point on the project is that some Linux and *BSD distribution seems to have done packages. This is the case of Aur, Gentoo, NetBSD, OpenBSD, Mandriva and soon Debian if the intention to package is confirmed.

Acquisition systems and running modes evolution of Suricata

Some new features have recently reach Suricata’s git tree and will be available in the next development release. I’ve worked on some of them that I will describe here.

Multi interfaces support and new running modes

Configuration update

IDS live mode in suricata (pcap, pf_ring, af_packet) now supports the capture on multiple interfaces. The syntax of the YAML configuration file has evolved and it is now possible to set per-interface variables.

For example, it is possible to define pfring configuration with the following syntax:

pfring:
  - interface: eth4
    threads: 8
    cluster-id: 99
    cluster-type: cluster_flow
  - interface: eth1
    threads: 2
    cluster-id: 98
    cluster-type: cluster_round_robin

This set different parameters for the eth4 and eth2 interfaces. With that configuration, it the user launches suricata with

suricata -c suricata.yaml --pfring

it will be listening on eth4 and with 8 threads receiving packets on eth4 with a flow based load balancing and 2 threads on eth3.

If you want to run suricata on a single interface, simply do:

suricata -c suricata.yaml --pfring=eth4

This syntax can be used with the new AF_PACKET acquisition module describe below.

New running modes

The running modes have been extended by a new running mode available for pfring and af_packet which is called workers. This mode starts a configurable number of threads which are doing all the treatment from packet acquisition to logging.

List of running modes

Here is the list of current running modes:

  • auto: Multi threaded mode (available for all packet acquisition modules)
  • single: Single threaded mode (available in pcap, pcap file, pfring, af_packet)
  • workers: Workers mode (available in AF_PACKET and pfring)
  • autofp: Multi threaded mode. Packets from each flow are assigned to a single detect thread.

af_packet support

Suricata now supports acquisition via AF_PACKET. This linux packet acquisition socket has recently evolved and it supports now load balancing of the capture of an interface between userspace sockets. This module can be configured like show at the start of this post. It will run on almost any Linux but you will need a 3.0 kernel to be able to use the load balancing features.

suricata -c suricata.yaml --af-packet=eth4