What’s new in ulogd 2.0.3

New features in ulogd 2.0.3 release

Database framework update

ulogd 2.0.3 implements two new optional modes for database connections:

  • backlog system to avoid event loss in case of database downtime
  • running mode where acquisition is made in one thread and queries to databases are made in separate threads to reduce latency in the treatment of kernel messages

These two modes are described below.

Postgresql update

Postgresql output plugin was only offering a small subset of Postgresql connection-related options.
It is now possible to use the connstring to use all possible parameters of libpq param keywords. If set, this variable has precedence on other variables.

One interest of connstring is to be able to use a SSL-encrypted connection to the database by using the sslmode keyword:

connstring="host=localhost port=4321 dbname=nulog user=nupik password=changeme sslmode=verify-full sslcert=/etc/ssl/pgsql-cert.pem sslkey=/etc/ssl/pgsql-key.pem sslrootcert==/etc/ssl/pgsql-rootcert.pem"

Event loss prevention

ulogd 2.0.3 implements a backlog system for all database output plugins using the abstraction framework for database connection. At the writing of this article, this is MySQL, PostgreSQL and DBI. Memory will be dedicated to store the queries that can not be run because of an unavailability of the database. Once the database is back, the queries are played in order.

To activate this mode, you need to set the backlog_memcap value in the database definition.

[mysql1]
db="nulog"
...
procedure="INSERT_PACKET_FULL"
backlog_memcap=1000000
backlog_oneshot_requests=10

Set backlog_memcap to the size of memory that will be allocated to store events in memory if data is temporary down. The variable backlog_oneshot_requests is storing the number of queries to process at once before reading a kernel message.

Multithreaded database output

If the ring buffer mode is active, a thread will be created for each stack involving the configured database. It will connect to the database and execute the queries.
The idea is to try to avoid buffer overrun by minimizing the time requested to treat kernel message. Doing synchronous SQL request, as it was made before was causing a delay which could cause some messages to be lost in case of burst from kernel side. With this new mode, the time to process kernel message is equal to the time of the formatting of the query.

To activate this mode, you need to set ring_buffer_size to a value superior to 1.
The value stores the number of SQL requests to keep in the ring buffer.

[pgsql1]
db="nulog"
...
procedure="INSERT_PACKET_FULL"
ring_buffer_size=1000

The ring_buffer_size has precedence on the backlog_memcap value. And backlog will be disabled if the ring buffer is active as ring buffer also provide packet loss prevention. ring_buffer_size is the maximum number of queries to keep in memory.

Using linux perf tools for Suricata performance analysis

Introduction

Perf is a great tool to analyse performances on Linux boxes. For example, perf top will give you this type of output on a box running Suricata on a high speed network:

Events: 32K cycles                                                                                                                                                                                                                            
 28.41%  suricata            [.] SCACSearch
 19.86%  libc-2.15.so        [.] tolower
 17.83%  suricata            [.] SigMatchSignaturesBuildMatchArray
  6.11%  suricata            [.] SigMatchSignaturesBuildMatchArrayAddSignature
  2.06%  suricata            [.] tolower@plt
  1.70%  libpthread-2.15.so  [.] pthread_mutex_trylock
  1.17%  suricata            [.] StreamTcpGetFlowState
  1.10%  libc-2.15.so        [.] __memcpy_ssse3_back
  0.90%  libpthread-2.15.so  [.] pthread_mutex_lock

The functions are sorted by CPU consumption. Using arrow key it is possible to jump into the annotated code to see where most CPU cycles are used.

This is really useful but in the case of a function like pthread_mutex_trylock, the interesting part is to be able to find where this function is called.

Getting function call graph in perf

This stack overflow question lead me to the solution.

I’ve started to build suricata with the -fno-omit-frame-pointer option:

./configure --enable-pfring --enable-luajit CFLAGS="-fno-omit-frame-pointer"
make
make install

Once suricata was restarted (with pid being 9366), I was then able to record the data:

sudo perf record -a --call-graph -p 9366

Extracting the call graph was then possible by running:

sudo perf report --call-graph --stdio

The result is a huge detailed report. For example, here’s the part on pthread_mutex_lock:

     0.94%  Suricata-Main  libpthread-2.15.so     [.] pthread_mutex_lock
            |
            --- pthread_mutex_lock
               |
               |--48.69%-- FlowHandlePacket
               |          |
               |          |--53.04%-- DecodeUDP
               |          |          |
               |          |          |--95.84%-- DecodeIPV4
               |          |          |          |
               |          |          |          |--99.97%-- DecodeVLAN
               |          |          |          |          DecodeEthernet
               |          |          |          |          DecodePfring
               |          |          |          |          TmThreadsSlotVarRun
               |          |          |          |          TmThreadsSlotProcessPkt
               |          |          |          |          ReceivePfringLoop
               |          |          |          |          TmThreadsSlotPktAcqLoop
               |          |          |          |          start_thread
               |          |          |           --0.03%-- [...]
               |          |          |
               |          |           --4.16%-- DecodeIPV6
               |          |                     |
               |          |                     |--97.59%-- DecodeTunnel
               |          |                     |          |
               |          |                     |          |--99.18%-- DecodeTeredo
               |          |                     |          |          DecodeUDP
               |          |                     |          |          DecodeIPV4
               |          |                     |          |          DecodeVLAN
               |          |                     |          |          DecodeEthernet
               |          |                     |          |          DecodePfring
               |          |                     |          |          TmThreadsSlotVarRun
               |          |                     |          |          TmThreadsSlotProcessPkt
               |          |                     |          |          ReceivePfringLoop
               |          |                     |          |          TmThreadsSlotPktAcqLoop
               |          |                     |          |          start_thread
               |          |                     |          |
               |          |                     |           --0.82%-- DecodeIPV4
               |          |                     |                     DecodeVLAN
               |          |                     |                     DecodeEthernet
               |          |                     |                     DecodePfring
               |          |                     |                     TmThreadsSlotVarRun
               |          |                     |                     TmThreadsSlotProcessPkt
               |          |                     |                     ReceivePfringLoop
               |          |                     |                     TmThreadsSlotPktAcqLoop
               |          |                     |                     start_thread
               |          |                     |
               |          |                      --2.41%-- DecodeIPV6
               |          |                                DecodeTunnel
               |          |                                DecodeTeredo
               |          |                                DecodeUDP
               |          |                                DecodeIPV4
               |          |                                DecodeVLAN
               |          |                                DecodeEthernet
               |          |                                DecodePfring
               |          |                                TmThreadsSlotVarRun
               |          |                                TmThreadsSlotProcessPkt
               |          |                                ReceivePfringLoop
               |          |                                TmThreadsSlotPktAcqLoop
               |          |                                start_thread