User login

Shane Alcock's Blog

02

Jun

2017

Libflowmanager 3.0.0 has been released today.

The libflowmanager API has been re-written to be thread-safe (and therefore compatible with parallel libtrace), hence the major version number change.

The old libflowmanager API has been removed entirely; there is no backwards compatibility with previous versions of libflowmanager. If you choose to install libflowmanager 3 then you will need to update your existing code to use the new API. This should not be too onerous in most cases, as most of the old global API functions have simply been replaced with method calls to a FlowManager class instance. The README and example programs demonstrate and explain the new API in detail.

Note that much of our other software that relies on libflowmanager, such as the libprotoident tools and lpicollector, have NOT yet been officially released with libflowmanager 3 support. If you are currently using any of this software, you should continue to use libflowmanager 2.0.5 until we are able to test and release new libflowmanager 3 compatible versions.

You can download both libflowmanager 3 and libflowmanager 2.0.5 from our website.

08

May

2017

Added another 5 protocols to libprotoident -- having a slightly more powerful PC for installing and running various candidate applications has helped quite a bit. Updated the rules for several more protocols as well.

Made some more progress on my protocol taxonomy -- I'm up to 'P' for the TCP protocols so I'm probably about 1/4 of the way through now.

Continued re-factoring the FSM generation code. Getting close to done, although I suspect the amount of changes and variable renaming will require a fair bit of testing to make sure I've transferred everything across correctly.

Added the ability to choose between TCP and HTTP throughput data on the AMP matrix. To do this, I had to bring the amp-web/nntsc install on prophet back up to date after a few months of being untouched. As always, there were a few issues with dependencies and versioning which slowed everything down, but eventually Brendon and I got it all working correctly.

01

May

2017

Another disrupted week, this time due to being ill. Spent most of my available time looking over the output of my new multi-process state machine generation algorithm. The extra sequence fragments that become apparent when considering multiple processes managed to reveal a few new situations where my code wasn't quite doing the right thing. I've fixed those and am reasonably happy again with the machines produced for my test dataset.

Moved on to some code re-factoring, as the existing code-base had become a bit of a mess from hacking in fixes to all of the edge cases I had been dealing with. In particular, I'm aiming to separate code that deals with the machine itself, i.e. the states and their transitions, from the code that compares sequences and determines what needs to be added to (or removed from) the machine to accommodate the variation.

19

Apr

2017

Slightly disrupted week with Easter and cyclones having an impact on the productivity. Most of my time ended up being spent hunting down more previously unknown protocols. Just three new protocols this week, along with fixes for three more.

On the STRATUS side, I worked on creating a way to "combine" the suffix trees for each individual process so that we can account for sequences that appear frequently in the whole dataset but never more than once or twice within a given process. The original implementation would not recognise those sequences as frequent, because it considered each process individually. I think I've got this working now -- but I'm yet to look at the results too closely.

10

Apr

2017

Continued delving into the unknown traffic on the campus network. Had a mix of frustrating days and successful days -- one protocol (N2Ping) took nearly two days to track down but I got there in the end. 8 new protocols added to libprotoident this week again, so we're starting to get close to 400 supported protocols in libprotoident.

Another week of refinement on the FSM code. Most of the effort has been focused on loop recognition, particularly in terms of making sure we don't ignore candidates that can be used to identify loops.

03

Apr

2017

Have been using my new daily libprotoident email to make some good progress in terms of adding new protocols to libprotoident. Another 8 protocols added this week, with 5 existing protocols improved as well.

Found a few new bugs in my FSM tandem-repeat code after running it against my full test dataset and doing an initial validation of the resulting machines. Finished up a set of slides describing (broadly) what I'm doing overall with the FSM project and how I'm going about it, i.e. suffix trees, pattern extraction, variant detection and machine building.

Started looking into a parallel RT implementation for libtrace / wdcap, with an eye towards removing the combiner bottleneck from wdcap.

27

Mar

2017

Finished implementing tandem repeat detection within my existing pattern extraction code. The initial results look promising, i.e. the code has been able to identify "write,read" as a repeat in the FTP system call log with no obvious false positives. Next job will be to repeat the machine validation and make sure that I have improved the results overall.

Wrote a libprotoident program to perform daily monitoring of unknown payload patterns on the Waikato capture point and send me an email every morning with the 25 "biggest" patterns by payload, as well as a few example flows matching each pattern. Using this data, I've already been able to add a few new patterns to libprotoident and look forward to being able to be more proactive at keeping libprotoident up to date.

20

Mar

2017

Finished porting the remaining libprotoident tools to be parallel-compatible. Spent a couple of days looking at unknown payload patterns in some recent Uni traffic -- unfortunately I wasn't able to make much tangible progress on identifying much of the unknown traffic.

Worked on implementing an algorithm for finding tandem repeats in strings, with the eventual aim of porting it over to work with my system call sequences. The published algorithm consists of three phases, but each of those phases has either involved looking up and implementing several other string processing algorithms (LZ-decomposition, longest common extension) or has required modifications to my existing suffix tree code (extracting a suffix array, bottom-up traversal, storing the longest child suffix in each node). Therefore, I'm about half-way through implementing the algorithm.

Moved libtrace into its own github organization to reflect that libtrace is now going to be more of a community project than a WAND project. I'll still be helping out with maintaining it for now, but now the workload can be shared amongst a group of trusted libtrace users (including people outside of WAND). This will hopefully keep libtrace well looked-after, even as my available time gets more and more restricted.

13

Mar

2017

Went back and finished making libflowmanager work with parallel libtrace. The remaining problem had been that the expiry modules were not thread-safe, so I've rewritten them to be classes so that the expiry lists are local to each module. Testing with lpi_protoident has proven these changes to work (at least when reading from a trace file), so I can continue updating the rest of the libprotoident tools to be parallel-libtrace compatible soon.

Spent the remainder of my week validating some of the FSMs produced by my model generation algorithm. Overall, the results are starting to look fairly good -- most of the machines being generated by my code are close matches to the ground truth machines, and there are very few duplicate or redundant machines. The most obvious outstanding problem is related to "tandem repeats", i.e. sequences of multiple system calls that can be repeated any number of times (such as "read,write,read,write,read,write", where "read,write" simply repeats until the action is over. Started looking into methods where I could detect tandem repeats so that I can try to encode them as a single self-repeating state.

06

Mar

2017

Finished testing my packet ordering fix for libtrace. Managed to come up with a more efficient method of determining an appropriate order value for int: and ring: so hopefully performance shouldn't be impacted too much by this change. Also fixed a couple of other libtrace bugs that I had noticed, particularly the horrid performance of tracertstats on some live formats. Released a new version of libtrace (4.0.1) that includes these fixes as well as a few others that have come in since the first parallel release.

More tweaks to the FSM generation code. I've found some errors in the way by which I was determining whether one machine was effectively superceded by another, which was causing me to produce extra redundant machines. I've also come up with the new method for creating the match maps when comparing two sequences -- the old method simply focused on picking the longest match and then finding any other matches that cover unmatched territory, which doesn't work so well for some looping sequences which have repetitive sub-sequences within them. My new method tries to find an optimal set of matches that gets the best possible coverage while minimising the amount of overlap, so we avoid matches that only end up covering one token because the rest of the sequence has already been covered by a larger match.