User login

Weekly Report -- 21/10/2016




Spent most of my week working on turning system call patterns extracted by my suffix tree into workable FSMs. So far the focus has been on recognising which patterns are variations on previously seen patterns and creating "branches" in my internal representations of those patterns that incorporate the allowed variations (ideally, without creating any invalid transitions). I also have to account for situations where the pattern has been "shifted", so naively looking at the edit distance between two patterns doesn't work too well.

The other challenge that I've run into, especially with shifted variants where the pattern repeats, is trying to determine the correct start state. In some cases, there are other variants to the pattern that indicate where the start could be -- i.e. the first system call that is common to all variants is probably the starting point -- but this extra information will not always be available.

End result: I've got an algorithm that seems to work as expected on the first couple of examples I've looked at. It'll need more testing on a wider variety of cases and there are still some outstanding situations that I know are not dealt with as well as I would like, e.g. loops that contain multiple distinct system calls.

Changed direction a bit to help Harris and Alan with an experiment they are running that tries to map application log items with system call patterns. Again using my suffix tree code, I am pulling out common system call patterns and reporting the pids and start times of all instances where those patterns appear in the system call logs. Alan will then see how well those correlate with the entries in the application logs.

Spent some time reading over Dimeji's paper and offering feedback.