User login

Search Projects

Project Members

Shane Alcock admin

Libprotoident

Libprotoident is a library that performs application layer protocol identification for flows. Unlike many techniques that require capturing the entire packet payload, only the first four bytes of payload sent in each direction, the size of the first payload-bearing packet in each direction and the TCP or UDP port numbers for the flow are used by libprotoident. Libprotoident features a very simple API that is easy to use, enabling developers to quickly write code that can make use of the protocol identification rules present in the library without needing to know anything about the applications they are trying to identify.

14

Jan

2013

Spent most of my week working with Meenakshee's LPI collector. The first step was to move it out of libprotoident and into its own project, complete with trac - this meant that future libprotoident releases are not dependent on the collector being in a usable state. Added support to the collector to track the number of local IP addresses "actively" using a given protocol. This is in addition to the current counter that simply looks at the number of local IP addresses involved in flows using a given protocol - an IP receiving BitTorrent UDP traffic but not responding would not count as actively using the protocol (i.e. the new counter), but would count as having been involved in a flow for that protocol (i.e. the old counter).

After meeting with Lightwire, it was suggested that a LPI collector that could give a protocol breakdown per customer would be very useful. As a result, I added support for this to the collector. In terms of the increased workload, the actual collection process seems to manage ok, but exporting this data over the network to the Nathan database client doesn't work so well.

Added some basic transaction support to Nathan's code, so that all of the insertions from the same LPI report are now inserted using a single transaction. Ideally, though, we need to be able to create transactions that cover multiple LPI reports - perhaps by extending the LPI protocol to be able to send some sort of "PUSH" message to the client to indicate that a batch of reports is complete.

Went over the collector with callgrind to find bottlenecks and suboptimal code. Found a number of optimisations that I could make in the collector, such as caching the name strings and lengths for the supported protocols rather than asking libprotoident for them each time we want to use them. I also had a frustrating battle with converting my byteswap64 function into a macro - got there in the end thankfully.

Finished up the draft of my L7 Filter paper.

07

Jan

2013

Just a lonely two day week while everyone else was still on holiday.

Released a new version of libtrace (3.0.16) - now Richard's ring buffer code is out amongst the wide world and hopefully our users won't find too many bugs in it.

Got back into writing my paper on L7 Filter. Most of the content is there now, although I'm not entirely convinced that the way I have structured the paper is quite right. It's much more readable the way I have it now, but it looks more like a bulleted list than a technical paper.

Meenakshee's LPI collector worked pretty well running on some trace files over the break, which was pleasing. Next step is to get it working on our newly functional ISP capture point. Tested the capture point out by running some captures over the weekend - aside from a bug in the direction tagging everything looks good, so we have at least one working capture point.

17

Dec

2012

Started writing a paper on my L7 Filter results - managed to get through an introduction and background before running out of steam.

Developed a module for Nathan's data collector that connects to Meena's LPI collector, receives data records, parses them and writes appropriate entries into a postgresql database. Ran into a bit of a design flaw in Nathan's collector - streams (i.e. the identifying characteristics for a measurement) have to be pre-defined before starting the collector. This doesn't work too well with LPI, where there are 250 protocols x 10 metrics x however many monitors one is running. Even worse, the number of protocols will grow with new LPI releases and we don't want to have to stop the collector to add code describing the resulting new streams.

Managed to hack my way around Nathan's code enough to add support for adding new streams whenever a new protocol / metric / monitor combination is observed by my module. Seems to work fairly well (at the second attempt - the first one ran into horrible concurrency problems due to a shared database connection).

Tried deploying the LPI collector at our ISP box, only to find that they've been playing with their core network a lot recently and now we don't see any useful traffic :(

10

Dec

2012

Libtrace:
Managed to get native BPF socket capture exporting correctly over the RT protocol. Changed the build system to make it possible to export captures taken using a native socket interface over RT to a machine running a different OS to the capture host, e.g. capture using Linux Native, export to a FreeBSD box.

WDCap:
WDCap now builds and runs on both Mac OS X and FreeBSD. Also changed the way the disk output module names files, based on some code submitted by Alistair King. You now specify your output filename format using strftime-style conversion modifiers, which offers a bit more flexibility to users rather than them being stuck with our particular file naming convention.

lpi_collector:
Continued working closely with Meenakshee on the new collector. Designed a binary format for exporting our collector messages called the libprotoident collector protocol (or LPICP for short).

L7 Filter:
Finished collecting traces for most of the protocols I wanted to test with L7 Filter and collated the initial results. Wrote a blog post about it (https://secure.wand.net.nz/content/case-against-l7-filter) and started working on a paper.

07

Dec

2012

L7 Filter is used as a source of ground truth in the traffic classification field because it has been around for a long time and is widely known. However, my experiences with L7 Filter had raised a few questions in my mind with regard to its accuracy. After looking online, I did not find any evidence that L7 Filter is actually an accurate or reliable traffic classifier. In this blog post, I present some preliminary results from my own investigation into the correctness (or lack thereof) of L7 Filter's classifications using packet traces containing traffic for only a single known application.

03

Dec

2012

Back into the swing of things this week. Continued collecting traces of various popular Internet applications to use for validating L7 Filter. So far, L7 Filter is very disappointing - it cannot even correctly classify some basic HTTP flows and often misclassifies SSL traffic as Skype.

Worked with Meenakshee to develop a proper LPI collector that we can run on passive monitors and write live application stats to a database (ideally using Nathan's code). The new collector will use libwandevent and export its results over the network rather than via stdout. To help with this, I extracted the counter / statistic management code from the old lpi_live tool and tidied it up for more general purpose use. Updated lpi_live to use the extracted code.

Spent my spare moments looking over Richard's new ring buffer code for Linux native interfaces in libtrace. In particular, my aim has been test it in situations outside of the standard libtrace paradigm, e.g. using trace_event(), trace_copy_packet() and exporting over the RT protocol.

Alistair from CAIDA has updated libtrace and wdcap for capturing using the BSD native interface (something we never did, so the code was missing or half-assed). I've started integrating his changes back into both code-bases and will also look at the problem of decoding RT packets that were capturing using a native interface that is not supported by the recipient machine, e.g. BPF packets exported to a Linux host.

26

Nov

2012

The week before I left for IMC:
* Finished my draft of the libprotoident paper for TMA. Because of the broken Auckland box, I wasn't able to re-run my analysis using the more up-to-date classification software. Instead, I've just submitted a draft based on the old results, with an eye to possibly updating them should we get accepted.
* Released a new version of libprotoident including all the new protocol rules that I'd added over the past couple of weeks.
* Started working on a little project to measure exactly how hopeless L7 Filter is for traffic classification. So many papers and tools use L7 Filter as either the basis for their rules or as ground truth for validation, which I think is a very bad idea. Hoping to get a paper out of it all. The initial phase of my evaluation involves capturing traffic from a number of common Internet applications and testing whether L7 Filter can correctly identify them. So far, it has managed to get 1/3 right :)

Spent the week before last in Boston for IMC. Managed to successfully present my paper on the Copyright Amendment Act and got a fairly good reception. Also got a chance to meet a few folks and put some faces to names. Some of the presentations were interesting, but there was also a lot of stuff that I found to be less useful (social networks lol).

07

Nov

2012

Libprotoident 2.0.6 has been released today.

This release adds support for 17 new protocols including Spotify, Runescape, Cryptic and Apple Facetime. The rules for a further 7 protocols have been improved.

This release also fixes a couple of bugs - in particular one where lpi_live would report erroneously high packet or byte counts.

We've also deprecated the P2P_Structure category as it was no longer serving the intended purpose due to the rise in BitTorrent file transfers over UDP that are indistinguishable from DHT traffic. All protocols that used to be P2P_Structure are now placed in the P2P category.

The full list of changes can be found in the libprotoident ChangeLog.

Download libprotoident 2.0.6 here!

05

Nov

2012

Short week this week.

Managed to add a couple more protocols to libprotoident: SUPL and Cryptic (an MMO game company). Spent a lot of time still trying to hunt down the particular Korean P2P application that I'm seeing a lot of in my data, but no success. Nonetheless, I've written a rule for it and added it to our set of "mystery" protocols.

Started looking over our old libprotoident technical report with an eye to submitting it for publication again. There are a few problems with this approach though: 1) OpenDPI doesn't exist anymore. A fork called nDPI lives on, but I'll need to re-run all the validation/comparison tests using nDPI. 2) nDPI uses all the same function and variable names as PACE so these had to be all renamed to prevent horrible linking errors when building / running my comparison program, which links against both libraries. 3) The Auckland monitor that has the only copy of the full-payload traces I had used for part of the original validation is no longer responsive.

29

Oct

2012

Finished up my basic analysis of the libprotoident data from last month. Wrote a blog post (that's on the front page of the website) presenting and discussing the latest results. Some pretty interesting trends are becoming apparent - the surge in HTTPS traffic and the movement towards UDP BitTorrent being the two main ones - which are begging for further investigation.

Continued looking at unknown traffic in libprotoident -- spent much of Friday investigating Korean P2P apps to try and resolve a mystery application that has a very obvious payload pattern, but had little success. Did get to watch a few Starcraft championship games though :)

Wrote and presented a practice version of my IMC talk. Got a few refinements to make but mostly I need to streamline the whole thing so I can deliver it in around 10 minutes without sounding like I'm hyped up on amphetamines.

23

Oct

2012

Spent a fair chunk of my week reading over various chapters from Brad and Joe's Honour's reports, as well as Meenakshee's interim report.

In between times, continued poking at my recent libprotoident analysis looking at the "unknown" traffic. Managed to add quite a few new protocols to libprotoident as a result, including Runescape, Spotify, Fring, Roblox and FASP. Starting to think about a new release with all the protocols I've added over the past couple of weeks.

Also continued my analysis of the September LPI statistics - getting closer to producing some graphs and a blog post discussing the changes over the past year :)

15

Oct

2012

Short week this week - took leave on Thursday and Friday.

Released a new version of libtrace (3.0.15) on Monday. Mostly just a few little bug and build fixes, but it had been a while since the last release. Also submitted a patch for the FreeBSD libtrace port which had been broken for a very long time.

Did a bit more refinement on my Plunge and ArimaShewhart event detectors. They're at a stage now where the number of false positives is close to none. False negatives are a bit harder to identify, of course. The next sensible step is probably to think about testing against real-time data and manually validate the events as they roll in.

Spent a day looking at the latest LPI data from a live analysis I have running on our ISP monitor. Managed to get some up-to-date stats on application usage for last September but haven't had a chance to look over it in detail yet.

I did note a bit of an increase in the amount of unknown UDP traffic, so chased up a few of the more common patterns. Have added 3 new protocols to libprotoident as a result: ZeroAccess (a trojan), VXWorks Exploit and Apple's Facetime / iMessage setup protocol.

16

Aug

2012

At present, accurate traffic classification requires the use of deep
packet inspection to analyse packet payload. This requires significant
CPU and memory resources and are invasive of network user privacy. In this
paper, we propose an alternative traffic classification approach that is
lightweight and only examines the first four bytes of packet payload observed
in each direction. We have implemented our approach as an open-source library
called libprotoident, which we evaluate by comparing its performance against
existing traffic classifiers that use deep packet inspection. Our results show
that our approach offers comparable (if not better) accuracy than tools that
have access to full packet payload and requires less processing resources.

This is simply a technical report, not a published conference or journal paper. We're hoping to publish an improved version of this paper soon, but mainly need to improve the validation process to be more convincing to external reviewers.

Author(s): 
Shane Alcock
Richard Nelson

30

Apr

2012

Managed to master the art of wavelet transforms - the problems I was having was due to mismatching the scale and wavelet values when inverting the transformation. After a lot of debugging, I was able to ensure that I could reliably transform my data and then invert it back to the same original values for any given number of nested transformations. Once that was working, I was able to get sensible results when denoising my time series.

Now that I had a denoised time series, I turned back to looking at forecasting techniques. Holt-Winters still wasn't a good fit for the denoised data, so I started learning about ARIMA models. Unfortunately, the test data I have doesn't really fit the basic ARIMA models, which made it difficult to get the right fit. Anyway, I now have a decent understanding of how ARIMA works in general, but need to come up with a way to use the ARIMA model in an on-line, self-updating context.

Released libprotoident 2.0.5 on Friday, mainly as something to do so I could have a break from mathematics for a bit :)

27

Apr

2012

Libprotoident 2.0.5 has been released today.

This release adds support for 19 new protocols, including Omegle, Apple Push Notifications and DCC. It improves the rules that are used for matching a further 17 protocols.

This release also adds a new tool, lpi_arff, which produces protocol usage stats in a format that can be used by the WEKA machine learning software.

The full list of changes can be found in the libprotoident ChangeLog.

Download libprotoident 2.0.5 here!

11

Apr

2012

Another rather fragmented week. Continued helping out where I could with the funding proposals, particularly finding references and tidying up some of the wording. Now we just have to wait and see if we actually get any of the funding we're asking for.

Taught 513 this week - we covered the recently published libtrace paper. I think I did a reasonable job of selling the students on libtrace. Wrote a possible libtrace programming assignment for the class which will be set if Richard gives it the go-ahead.

Prepared a 1.0.3 release for libtcpcsm. I've sent the release candidate off to a user who has been using the library quite a bit for testing prior to an actual release.

Started preparing for a new libprotoident release as well.

On the time series front, decomposing the time series seems to produce a trend line that can highlight genuine events in the data but there are still some caveats. In particular, none of our existing detectors work that well with the resulting data and it isn't clear that we can do the decomposition reliably when running live.

05

Mar

2012

Released a new version of BSOD client on Tuesday.

Did some planning with Brendon, thinking about how we're going to bring all the components of the MSI project together into something usable.

Played around with a live libprotoident application, getting it to write results into a postgresql database and an RRD. Postgresql required a fair bit of revision of SQL and database theory. The RRD was much easier to get up and running.

Continued improvements to libprotoident - trying to get that accuracy rate up even further!

20

Feb

2012

Spent most of my week working on the draft version of the paper on the effect of the CAA on DSL users. Finished the draft on Friday, having included plenty of (hopefully) interesting results. Anyone interested in reading over the paper should get in touch with me and I give you a copy.

Patched libtrace to support --with-foo configure options for all the optional dependencies. Apparently this is a bit of an issue with some Linux distros, e.g. Gentoo.

Released a new version of BSOD server on Friday to fix a crash issue that was occurring with recent libprotoident releases.

Spent some time looking at traffic that was being classed as SSL by libprotoident. Turns out that, with a bit of port and payload size analysis, I can sub-classify the SSL as Google talk, Apple push notifications, Facebook chat, PSN store, POP3S and NNTPS.

07

Feb

2012

Worked on collecting some more numbers measuring the impact of the CAA, with an eye towards writing a paper on the topic. The number of users doing P2P has also dropped dramatically, with rises in the expected categories too (such as tunneling).

Looking at the results more closely, I decided that the HTTP_P2P classification was proving to be incorrect more often than not, so traffic matching that is now treated as web rather than P2P. This change should have only a minor effect on the numbers I had presented at NZNOG.

The libtrace paper was accepted for publication in CCR. This was my fifth attempt to publish that particular paper, so pretty pleased to finally get that one done.

02

Feb

2012

Donald Clark discussed the Copyright Amendment Act study that I presented at NZNOG 2012 on Radio New Zealand: National's Nine to Noon program this morning. He did an excellent job of summarising our results and the conclusions that can be drawn from them.

Anyone who would like to listen to Donald's segment can find it here. The discussion of our work begins around 9:30 but I would recommend listening to the whole segment if you have the time.