User login

Shane Alcock's Blog




Managed to create a new model for use with the Bernaille traffic classification technique, based on an hour of ISP traffic and using PACE to determine ground truth. The model does not perform much better than the default one I tested last week, despite including a few extra protocols.

Developed a new technique for comparing the various traffic classification schemes. My main problem is that even the commercial tools are not reliable enough to act as a genuine ground truth, so it becomes difficult to evaluate the accuracy of any given approach. My new approach evaluates each tool by comparing the classifications against the results produced by each other tool in turn, treating all flows that are unknown or classified differently as failure cases. The average failure rate is then calculated across all the tools compared against to produce an estimated accuracy rating for the evaluated tool.

So far, the results produced by this comparison approach have matched my expectations (libprotoident and PACE have lower failure rates, nmap has the highest) and have also highlighted the high quality of libprotoident's classifications. Hopefully, we will continue to have good results when NAVL is added to the mix.

On that note, still waiting on Vineyard to provide me with a binary that fixes the bug I reported last week - it has been acknowledged as a bug and are in the process of testing the fix now.




Started collating together the results of my analysis of dark and sleeper traffic in the ISP traces. It's not finished yet, but the results I have so far can be viewed at

CCR rejected my libprotoident paper, primarily due to a reviewer stating that we had not compared against the "state of the art" described in a paper from 2006 ( This particular technique requires no packet payload, but is only able to identify 10 different TCP application protocols (although I can supposedly create new models for other TCP applications).

I tested the default models against some ISP traffic and found that it performed much better than I had expected, but was still less accurate than the weakest of the OSS DPI techniques. Their failure rate (in terms of misclassified bytes) was 24%, compared with 4.5% for libprotoident.

Started integrating Vineyard's NAVL library into my traffic classification evaluation tool. Started out OK, but ran into a few problems with not being able to force NAVL to expire internal entries for UDP flows when I have decided the flow has ended. This creates a problem if the 5-tuple reappears later, as NAVL returns an error when I try to create a new NAVL connection for that flow because NAVL believes the flow already exists. I've filed a support request, so hopefully I'll get some sort of solution in the next day or two.

Continued integrating Simon's OSPF code into libtrace.




Started looking into the traffic sent to "sleeper" hosts, i.e. IP addresses that have been active but are now inactive. Still putting together the initial results, but there is definitely a difference between the traffic observed heading to "dark" hosts vs the traffic observed heading to sleepers.

During the sleeper analysis, I've been able to improve a few of the libprotoident rules to correctly match more of the traffic I've been looking at.

Began integrating Simon's OSPF parsing code into libtrace. Been slightly trickier than I had anticipated due to major differences between OSPFv2 (which Simon's code parses) and OSPFv3 (which we may want to parse in future).

Had a brief phone meeting with Vineyard Networks. They've agreed to give us access to their NAVL library for evaluation.




Implemented a libprotoident module for my longitudinal analysis and starting running that over the trace sets that it would be appropriate for. During this process, I noticed that one of the Waikato 3 traces had managed to get corrupted during a sector reallocation. Fortunately, Brad was able to fix it by removing the problematic disk from the RAID before the change became permanent.

First review came back for the libprotoident paper so I decided to start addressing their comments - mainly they wanted more detail about the algorithms used and how the rule matching actually worked. Hopefully, adding some pseudocode describing various components will be satisfactory. Then I had to go back and get the paper back under the page limit :(




Determined that adding threading to libprotoident was completely not beneficial - in fact, it ended up running much slower than before. This seems to be mainly due to the rules being so simple. There was no performance gain to compensate for the overhead of locking mutexes and synching threads that was introduced.

Finished re-processing various trace sets to get updated longitudinal data. Changed some of the graphing scripts to plot the new data correctly. This included messing around with various chart plotting libraries to find something that produces nice pie charts (gnuplot is hopeless at this). Eventually settled on matplotlib for python.

Managed to edit the libtrace paper down so that it fits inside the six pages required for CCR. Hopefully, I haven't cut anything too important...




Made a few tweaks to some of the measurements used in the longitudinal study - now re-processing the trace sets to generate the new results. Also found and fixed a bug in the IP counting measurement, which was over-inflating the IP address counts.

Attempted to add threading to libprotoident, so that multiple protocol rules could be evaluated at the same time. Not totally successful so far - seems to run a lot slower than the single threaded version.

Read over a draft version of Joel's 520. Not looking too bad so far, but was still able to give him plenty of feedback to work on.

Libtrace paper was rejected by Computer Communications because "it does not directly contribute to the scientific literature in the field". They suggested submitting to CCR instead, so now trying to condense the paper down to fit into the CCR format.




Submitted the libtrace paper on Monday.

Finished my little study of the protocol mixes in the latest ISP data. Definitely looks like P2P usage has dropped compared with earlier this year. Some of the graphs can be found at

Released a new version of libprotoident and libflowmanager.

Got back to working on libwandbgp. Finally managed to get it working without segfaulting, but still having problems with keeping the route table "up to date".




Libprotoident 2.0.3 has been released today.

This release adds support for 13 new protocols (including RADIUS, Akamai and Youku) and 3 new categories (Logging, Printing and Translation). It also improves the rules for some existing protocols and fixes a few bugs.

The included tools have all been updated to support analysis of IPv6 traffic and also provide more options for determining the direction of analysed packets.

The full list of changes is described in the libprotoident ChangeLog.

Download libprotoident 2.0.3 here!




Made a few minor changes to the libtrace paper based on Richard's feedback. All ready to submit now.

Continued playing around with the new ISP trace set - hopefully will be able to put together some slightly more comprehensive results soon re: P2P usage in Jan vs Sep.

Also looked at traffic in the new traces that libprotoident could not identify. Managed to add quite a few new protocols to libprotoident - I now have rules for over 200 protocols!

On leave on Thursday and Friday.




Finally finished all my libtrace performance tests and now have a completed draft of the paper done. If anyone feels like reading over it and offering feedback, let me know.

Starting looking at some traces I took last week at our ISP capture point. The initial results suggest that the proportion of P2P traffic has dropped compared with earlier in the year, possibly due to the introduction of the Copyright Amendment Act. These results are the subject of a blog post on the WAND website. Now starting to investigate this further.

Spent some time profiling the pcapint: and int: live capture formats for libtrace. Found a couple of bugs that I was able to fix as well as some inefficiencies in the way that we apply BPF filters.