Nov 182013
 

Introduction

Perf is a great tool to analyse performances on Linux boxes. For example, perf top will give you this type of output on a box running Suricata on a high speed network:

Events: 32K cycles                                                                                                                                                                                                                            
 28.41%  suricata            [.] SCACSearch
 19.86%  libc-2.15.so        [.] tolower
 17.83%  suricata            [.] SigMatchSignaturesBuildMatchArray
  6.11%  suricata            [.] SigMatchSignaturesBuildMatchArrayAddSignature
  2.06%  suricata            [.] tolower@plt
  1.70%  libpthread-2.15.so  [.] pthread_mutex_trylock
  1.17%  suricata            [.] StreamTcpGetFlowState
  1.10%  libc-2.15.so        [.] __memcpy_ssse3_back
  0.90%  libpthread-2.15.so  [.] pthread_mutex_lock

The functions are sorted by CPU consumption. Using arrow key it is possible to jump into the annotated code to see where most CPU cycles are used.

This is really useful but in the case of a function like pthread_mutex_trylock, the interesting part is to be able to find where this function is called.

Getting function call graph in perf

This stack overflow question lead me to the solution.

I’ve started to build suricata with the -fno-omit-frame-pointer option:

./configure --enable-pfring --enable-luajit CFLAGS="-fno-omit-frame-pointer"
make
make install

Once suricata was restarted (with pid being 9366), I was then able to record the data:

sudo perf record -a --call-graph -p 9366

Extracting the call graph was then possible by running:

sudo perf report --call-graph --stdio
The result is a huge detailed report. For example, here’s the part on pthread_mutex_lock:
     0.94%  Suricata-Main  libpthread-2.15.so     [.] pthread_mutex_lock
            |
            --- pthread_mutex_lock
               |
               |--48.69%-- FlowHandlePacket
               |          |
               |          |--53.04%-- DecodeUDP
               |          |          |
               |          |          |--95.84%-- DecodeIPV4
               |          |          |          |
               |          |          |          |--99.97%-- DecodeVLAN
               |          |          |          |          DecodeEthernet
               |          |          |          |          DecodePfring
               |          |          |          |          TmThreadsSlotVarRun
               |          |          |          |          TmThreadsSlotProcessPkt
               |          |          |          |          ReceivePfringLoop
               |          |          |          |          TmThreadsSlotPktAcqLoop
               |          |          |          |          start_thread
               |          |          |           --0.03%-- [...]
               |          |          |
               |          |           --4.16%-- DecodeIPV6
               |          |                     |
               |          |                     |--97.59%-- DecodeTunnel
               |          |                     |          |
               |          |                     |          |--99.18%-- DecodeTeredo
               |          |                     |          |          DecodeUDP
               |          |                     |          |          DecodeIPV4
               |          |                     |          |          DecodeVLAN
               |          |                     |          |          DecodeEthernet
               |          |                     |          |          DecodePfring
               |          |                     |          |          TmThreadsSlotVarRun
               |          |                     |          |          TmThreadsSlotProcessPkt
               |          |                     |          |          ReceivePfringLoop
               |          |                     |          |          TmThreadsSlotPktAcqLoop
               |          |                     |          |          start_thread
               |          |                     |          |
               |          |                     |           --0.82%-- DecodeIPV4
               |          |                     |                     DecodeVLAN
               |          |                     |                     DecodeEthernet
               |          |                     |                     DecodePfring
               |          |                     |                     TmThreadsSlotVarRun
               |          |                     |                     TmThreadsSlotProcessPkt
               |          |                     |                     ReceivePfringLoop
               |          |                     |                     TmThreadsSlotPktAcqLoop
               |          |                     |                     start_thread
               |          |                     |
               |          |                      --2.41%-- DecodeIPV6
               |          |                                DecodeTunnel
               |          |                                DecodeTeredo
               |          |                                DecodeUDP
               |          |                                DecodeIPV4
               |          |                                DecodeVLAN
               |          |                                DecodeEthernet
               |          |                                DecodePfring
               |          |                                TmThreadsSlotVarRun
               |          |                                TmThreadsSlotProcessPkt
               |          |                                ReceivePfringLoop
               |          |                                TmThreadsSlotPktAcqLoop
               |          |                                start_thread

  2 Responses to “Using linux perf tools for Suricata performance analysis”

  1. This is very good and helpful

  2. […] Some more info found HERE and thanks to Regit HERE. […]

 Leave a Reply

(required)

(required)

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>