User login

Shane Alcock's Blog

29

Jan

2013

Spent a day messing around with the event detection software, mainly seeing how Brendon's detectors work with the existing AMP data. The new "is it constant" calculation seems to be working reasonably well, but there are still a lot of issues with some of the detectors. Need to spend a bit of uninterrupted time with it to really see how it all works.

Had a quick look at the latest ISP traces with libprotoident to see if there are any obvious missing protocols I can add to the library. Added one new protocol (Minecraft) and tweaked a few existing protocols.

Spent the rest of the week at NZNOG, catching up on the state of the Internets. Most of the talks were pretty interesting and it was good to meet up with a few familiar faces.

21

Jan

2013

Decided to replace the PACE comparison in my L7 Filter paper with Tstat, a somewhat well-known open-source program that does traffic classification (along with a whole lot of other statistic collection). Tstat's results were disappointing - I was hoping they would be a lot better so that the ineptitude of L7 Filter would be more obvious, but I guess this does make libprotoident look even better.

Fixed a major bug in the lpicollector that was causing us to insert duplicate entries in our IP and User maps. Memory usage is way down now and our active IP counts are much more in line with expectations. Also added a special PUSH message to the protocol so that any clients will know when the collector is done sending messages for the current reporting period.

Spent a fair chunk of time refining Nathan to a) just work as intended, b) be more efficient and c) be more user-friendly / deployable. I've got it reading data properly from LPI, RRDs and AMP and exporting data in an appropriate format for our event detection code to be able to read.

Started toying with using the event detection code on our various inputs. Have run into some problems with the math used to determine whether a time series is relatively constant or not - this is used to determine which of our detectors should be run against the data.

Got the bad news that the libprotoident paper was rejected by TMA over the weekend. A bit disappointed with the reviews - felt like they were too busy trying to find flaws with the 4-byte approach rather than recognising the results I presented that showed it to be more accurate, faster and less memory-intensive than existing OSS DPI classifiers. Regardless, it is back to the drawing board on this one - looks like it might be the libtrace paper all over again.

14

Jan

2013

Spent most of my week working with Meenakshee's LPI collector. The first step was to move it out of libprotoident and into its own project, complete with trac - this meant that future libprotoident releases are not dependent on the collector being in a usable state. Added support to the collector to track the number of local IP addresses "actively" using a given protocol. This is in addition to the current counter that simply looks at the number of local IP addresses involved in flows using a given protocol - an IP receiving BitTorrent UDP traffic but not responding would not count as actively using the protocol (i.e. the new counter), but would count as having been involved in a flow for that protocol (i.e. the old counter).

After meeting with Lightwire, it was suggested that a LPI collector that could give a protocol breakdown per customer would be very useful. As a result, I added support for this to the collector. In terms of the increased workload, the actual collection process seems to manage ok, but exporting this data over the network to the Nathan database client doesn't work so well.

Added some basic transaction support to Nathan's code, so that all of the insertions from the same LPI report are now inserted using a single transaction. Ideally, though, we need to be able to create transactions that cover multiple LPI reports - perhaps by extending the LPI protocol to be able to send some sort of "PUSH" message to the client to indicate that a batch of reports is complete.

Went over the collector with callgrind to find bottlenecks and suboptimal code. Found a number of optimisations that I could make in the collector, such as caching the name strings and lengths for the supported protocols rather than asking libprotoident for them each time we want to use them. I also had a frustrating battle with converting my byteswap64 function into a macro - got there in the end thankfully.

Finished up the draft of my L7 Filter paper.

07

Jan

2013

Just a lonely two day week while everyone else was still on holiday.

Released a new version of libtrace (3.0.16) - now Richard's ring buffer code is out amongst the wide world and hopefully our users won't find too many bugs in it.

Got back into writing my paper on L7 Filter. Most of the content is there now, although I'm not entirely convinced that the way I have structured the paper is quite right. It's much more readable the way I have it now, but it looks more like a bulleted list than a technical paper.

Meenakshee's LPI collector worked pretty well running on some trace files over the break, which was pleasing. Next step is to get it working on our newly functional ISP capture point. Tested the capture point out by running some captures over the weekend - aside from a bug in the direction tagging everything looks good, so we have at least one working capture point.

03

Jan

2013

Libtrace 3.0.16 has been released.

This release includes the new ring: format which is a much more efficient version of the existing int: format. More details on how ring: works and how much better it is than int: can be found here.

People currently using int: are encouraged to give ring: a try - at best, there should be no obvious difference between the two aside from your program using a lot less CPU time. If there are problems, bugs or strange behaviour, please let us know (email contact at wand.net.nz) so we can fix it in the next release.

This release also fixes the problems that occur when trying to capture packets using 'pcapint:any' as input and write them to disk using a different (i.e. non pcap) format and the double free bug that would occur when calling trace_destroy after using trace_event to read packets from a trace file.

The full list of changes in this release can be found in the libtrace ChangeLog.

You can download the new version of libtrace from the libtrace website.

17

Dec

2012

Started writing a paper on my L7 Filter results - managed to get through an introduction and background before running out of steam.

Developed a module for Nathan's data collector that connects to Meena's LPI collector, receives data records, parses them and writes appropriate entries into a postgresql database. Ran into a bit of a design flaw in Nathan's collector - streams (i.e. the identifying characteristics for a measurement) have to be pre-defined before starting the collector. This doesn't work too well with LPI, where there are 250 protocols x 10 metrics x however many monitors one is running. Even worse, the number of protocols will grow with new LPI releases and we don't want to have to stop the collector to add code describing the resulting new streams.

Managed to hack my way around Nathan's code enough to add support for adding new streams whenever a new protocol / metric / monitor combination is observed by my module. Seems to work fairly well (at the second attempt - the first one ran into horrible concurrency problems due to a shared database connection).

Tried deploying the LPI collector at our ISP box, only to find that they've been playing with their core network a lot recently and now we don't see any useful traffic :(

10

Dec

2012

Libtrace:
Managed to get native BPF socket capture exporting correctly over the RT protocol. Changed the build system to make it possible to export captures taken using a native socket interface over RT to a machine running a different OS to the capture host, e.g. capture using Linux Native, export to a FreeBSD box.

WDCap:
WDCap now builds and runs on both Mac OS X and FreeBSD. Also changed the way the disk output module names files, based on some code submitted by Alistair King. You now specify your output filename format using strftime-style conversion modifiers, which offers a bit more flexibility to users rather than them being stuck with our particular file naming convention.

lpi_collector:
Continued working closely with Meenakshee on the new collector. Designed a binary format for exporting our collector messages called the libprotoident collector protocol (or LPICP for short).

L7 Filter:
Finished collecting traces for most of the protocols I wanted to test with L7 Filter and collated the initial results. Wrote a blog post about it (https://secure.wand.net.nz/content/case-against-l7-filter) and started working on a paper.

07

Dec

2012

L7 Filter is used as a source of ground truth in the traffic classification field because it has been around for a long time and is widely known. However, my experiences with L7 Filter had raised a few questions in my mind with regard to its accuracy. After looking online, I did not find any evidence that L7 Filter is actually an accurate or reliable traffic classifier. In this blog post, I present some preliminary results from my own investigation into the correctness (or lack thereof) of L7 Filter's classifications using packet traces containing traffic for only a single known application.

03

Dec

2012

Back into the swing of things this week. Continued collecting traces of various popular Internet applications to use for validating L7 Filter. So far, L7 Filter is very disappointing - it cannot even correctly classify some basic HTTP flows and often misclassifies SSL traffic as Skype.

Worked with Meenakshee to develop a proper LPI collector that we can run on passive monitors and write live application stats to a database (ideally using Nathan's code). The new collector will use libwandevent and export its results over the network rather than via stdout. To help with this, I extracted the counter / statistic management code from the old lpi_live tool and tidied it up for more general purpose use. Updated lpi_live to use the extracted code.

Spent my spare moments looking over Richard's new ring buffer code for Linux native interfaces in libtrace. In particular, my aim has been test it in situations outside of the standard libtrace paradigm, e.g. using trace_event(), trace_copy_packet() and exporting over the RT protocol.

Alistair from CAIDA has updated libtrace and wdcap for capturing using the BSD native interface (something we never did, so the code was missing or half-assed). I've started integrating his changes back into both code-bases and will also look at the problem of decoding RT packets that were capturing using a native interface that is not supported by the recipient machine, e.g. BPF packets exported to a Linux host.

26

Nov

2012

The week before I left for IMC:
* Finished my draft of the libprotoident paper for TMA. Because of the broken Auckland box, I wasn't able to re-run my analysis using the more up-to-date classification software. Instead, I've just submitted a draft based on the old results, with an eye to possibly updating them should we get accepted.
* Released a new version of libprotoident including all the new protocol rules that I'd added over the past couple of weeks.
* Started working on a little project to measure exactly how hopeless L7 Filter is for traffic classification. So many papers and tools use L7 Filter as either the basis for their rules or as ground truth for validation, which I think is a very bad idea. Hoping to get a paper out of it all. The initial phase of my evaluation involves capturing traffic from a number of common Internet applications and testing whether L7 Filter can correctly identify them. So far, it has managed to get 1/3 right :)

Spent the week before last in Boston for IMC. Managed to successfully present my paper on the Copyright Amendment Act and got a fairly good reception. Also got a chance to meet a few folks and put some faces to names. Some of the presentations were interesting, but there was also a lot of stuff that I found to be less useful (social networks lol).