Massive and semantic patching with Coccinelle

I’m currently working on suricata and one of the feature I’m working on change the way the main structure Packet is accessed.

One of the consequences is that almost all unit tests need to be rewritten because the use Packet p construction which has to be replace by an dynamically allocated Packet *. Given the number of tests in suricata, this task is very dangerous:

  • It is error prone
  • Too long to be done correctly

I thus decide to give a try to coccinelle which is a "program matching and transformation engine which provides the language SmPL (Semantic Patch Language) for specifying desired matches and transformations in C code". Well, from user point of view it is a mega over-boosted sed for C.

One of the transformation I had to do was to find all memset() done on a Packet structure and replace it by a memset on the correct length followed by the setting of a pointer. In term of code with "..." meaning some code, I had to found all codes like
[C]func(...)
{
Packet p;
...
memset(&p, 0, ...);
}[/C]
and replace it by
[C]func(...)
{
Packet p;
...
memset(&p, 0, SIZE_OF_PACKET);
p->pkt = (uint8_t *)(p + 1);
}[/C]
To do so, I wrote the following semantic patch which defined the objects and the transformation I want to apply:
[diff]@rule0@
identifier p;
identifier func;
typedef uint8_t;
typedef Packet;
@@
func(...) {
<... Packet p; ... - memset(&p, 0, ...); + memset(&p, 0, SIZE_OF_PACKET); + p.pkt = (uint8_t *)(p + 1); ...>
}
[/diff]
If this semantic patch is saved in the file memset.cocci, you just have to run

spatch -sp_file packet.cocci -in_place detect.c

to modify the file.
The result of the command is that detect.c has been modified. Here's an extract of the resulting diff:
[diff]
@@ -9043,6 +9100,7 @@ static int SigTest...m_type() {
Packet p2;
memset(&p2, 0, SIZE_OF_PACKET);
+ p2.pkt = (uint8_t *)(p2 + 1);
DecodeEthernet(&th_v, &dtv, &p2, rawpkt2, sizeof(rawpkt2), NULL);
[/diff]
As you can see, spatch does not care that the variable is name p2. This is a Packet structure which is defined inside a function and which is memset() afterwards. It does the transformation knowing C and thus you need to think C when writing the semantic patch.

Now let's go for some explanations. The semantic patch start with the declaration of the parameters:
[diff]@rule0@ // name of the rule
identifier p; // this will be replace by the name of a variable
identifier func; // func will be the name of something
typedef uint8_t; // this is a C type we will use
typedef Packet; // same remark
@@
[/diff]
The main point is that, as coccinelle is using variable you must give in the information about what is a variable for you (usage of identifier) but you also need to specify what word are specific to the code (usage of typedef in the example).
The rest is straightforward if we omit an astuce I will detail:
[diff]func(...) { // the modification occurs in any function
<... // there is some code (...) who can occur more than once (<) Packet p; // a variable is a Packet, we named it p ... // some code - memset(&p, 0, ...); // a memset is done on p, we remove it (-) + memset(&p, 0, SIZE_OF_PACKET); // and replace it + p.pkt = (uint8_t *)(p + 1); // by this two lines (+) ...> // the part of the code occuring more than once end here
}
[/diff]

My complete semantic patch for the suricata modification is around 55 lines and the resulting patch on suricata has the following git stat:

30 files changed, 3868 insertions(+), 2745 deletions(-)

and a size of 407Ko. This gives an idea of the power of coccinelle.

Here's a light example of what coccinelle is able to do. If you want to read further just go on coccinelle website or read my "Coccinelle for the newbie" page.

I like to thanks Holger Eitzenberger for talking me about the existence of Coccinelle and I give out a great thanks at Julia Lawall for her expertise and her patience. She helps me a lot during my discovery of Coccinelle.