Using linux perf tools for Suricata performance analysis

Introduction

Perf is a great tool to analyse performances on Linux boxes. For example, perf top will give you this type of output on a box running Suricata on a high speed network:

Events: 32K cycles                                                                                                                                                                                                                            
 28.41%  suricata            [.] SCACSearch
 19.86%  libc-2.15.so        [.] tolower
 17.83%  suricata            [.] SigMatchSignaturesBuildMatchArray
  6.11%  suricata            [.] SigMatchSignaturesBuildMatchArrayAddSignature
  2.06%  suricata            [.] tolower@plt
  1.70%  libpthread-2.15.so  [.] pthread_mutex_trylock
  1.17%  suricata            [.] StreamTcpGetFlowState
  1.10%  libc-2.15.so        [.] __memcpy_ssse3_back
  0.90%  libpthread-2.15.so  [.] pthread_mutex_lock

The functions are sorted by CPU consumption. Using arrow key it is possible to jump into the annotated code to see where most CPU cycles are used.

This is really useful but in the case of a function like pthread_mutex_trylock, the interesting part is to be able to find where this function is called.

Getting function call graph in perf

This stack overflow question lead me to the solution.

I’ve started to build suricata with the -fno-omit-frame-pointer option:

./configure --enable-pfring --enable-luajit CFLAGS="-fno-omit-frame-pointer"
make
make install

Once suricata was restarted (with pid being 9366), I was then able to record the data:

sudo perf record -a --call-graph -p 9366

Extracting the call graph was then possible by running:

sudo perf report --call-graph --stdio

The result is a huge detailed report. For example, here’s the part on pthread_mutex_lock:

     0.94%  Suricata-Main  libpthread-2.15.so     [.] pthread_mutex_lock
            |
            --- pthread_mutex_lock
               |
               |--48.69%-- FlowHandlePacket
               |          |
               |          |--53.04%-- DecodeUDP
               |          |          |
               |          |          |--95.84%-- DecodeIPV4
               |          |          |          |
               |          |          |          |--99.97%-- DecodeVLAN
               |          |          |          |          DecodeEthernet
               |          |          |          |          DecodePfring
               |          |          |          |          TmThreadsSlotVarRun
               |          |          |          |          TmThreadsSlotProcessPkt
               |          |          |          |          ReceivePfringLoop
               |          |          |          |          TmThreadsSlotPktAcqLoop
               |          |          |          |          start_thread
               |          |          |           --0.03%-- [...]
               |          |          |
               |          |           --4.16%-- DecodeIPV6
               |          |                     |
               |          |                     |--97.59%-- DecodeTunnel
               |          |                     |          |
               |          |                     |          |--99.18%-- DecodeTeredo
               |          |                     |          |          DecodeUDP
               |          |                     |          |          DecodeIPV4
               |          |                     |          |          DecodeVLAN
               |          |                     |          |          DecodeEthernet
               |          |                     |          |          DecodePfring
               |          |                     |          |          TmThreadsSlotVarRun
               |          |                     |          |          TmThreadsSlotProcessPkt
               |          |                     |          |          ReceivePfringLoop
               |          |                     |          |          TmThreadsSlotPktAcqLoop
               |          |                     |          |          start_thread
               |          |                     |          |
               |          |                     |           --0.82%-- DecodeIPV4
               |          |                     |                     DecodeVLAN
               |          |                     |                     DecodeEthernet
               |          |                     |                     DecodePfring
               |          |                     |                     TmThreadsSlotVarRun
               |          |                     |                     TmThreadsSlotProcessPkt
               |          |                     |                     ReceivePfringLoop
               |          |                     |                     TmThreadsSlotPktAcqLoop
               |          |                     |                     start_thread
               |          |                     |
               |          |                      --2.41%-- DecodeIPV6
               |          |                                DecodeTunnel
               |          |                                DecodeTeredo
               |          |                                DecodeUDP
               |          |                                DecodeIPV4
               |          |                                DecodeVLAN
               |          |                                DecodeEthernet
               |          |                                DecodePfring
               |          |                                TmThreadsSlotVarRun
               |          |                                TmThreadsSlotProcessPkt
               |          |                                ReceivePfringLoop
               |          |                                TmThreadsSlotPktAcqLoop
               |          |                                start_thread