Libprotoident is a library that performs application layer protocol identification for flows. Unlike many techniques that require capturing the entire packet payload, only the first four bytes of payload sent in each direction, the size of the first payload-bearing packet in each direction and the TCP or UDP port numbers for the flow are used by libprotoident. Libprotoident features a very simple API that is easy to use, enabling developers to quickly write code that can make use of the protocol identification rules present in the library without needing to know anything about the applications they are trying to identify.
Finished up the implementation chapter of the libtrace paper. Added a couple of diagrams to augment some of the textual explanations. Got Richard S. to read over what I've got so far and made a few tweaks based on his feedback.
Spent a decent chunk of time looking at Unknown UDP port 80 traffic in libprotoident. Found a clear pattern that was contributing most of the traffic, which I traced back to Tencent. Unfortunately Tencent publishes a lot of applications so that knowledge wasn't conclusive on its own.
My initial suspicion was that it might have been game traffic so I downloaded and played a few popular multiplayer games via the Tencent games client, capturing the network traffic and comparing it against my current unknown traffic. No luck, but then I had the bright idea to look a bit more closely at video call traffic in WeChat (a messaging app). Sure enough, once I was able to successfully create two WeChat accounts and get a video call going between them, I started seeing the traffic I wanted.
Also added rules for Acer Cloud and OpenTracker over UDP.
Started writing some content for the parallel libtrace paper. Managed to churn out an introduction, a background and a little bit of the implementation section.
Fixed a couple of bugs in netevmon prior to the deployment: crashing when trying to reconnect to a restarted NNTSC and some confusing event descriptions for changepoint events.
Finished setting up a mobile app test environment for JP. I've configured my old iPhone to act as an extra client for 2-way communication apps (messaging etc.). So far the environment has already been helpful, as we've managed to identify one of the major outstanding patterns as being used by the Taobao mobile shopping app.
Finished up the demo for STRATUS forum and helped Harris put together both a video and a live website.
Spent a bit of time trying to fix some unintuitive traceroute events that we were seeing on lamp. The problem was arising when a normally unresponsive hop was responding to traceroute, which was inserting an extra AS transition into our "path".
Rebuilt DPDK and Ostinato on 10g-dev2 after Richard upgraded it to Jessie so that I can resume my parallel libtrace development and testing once he's done with his experiments.
Installed and tested a variety of Android emulators to try and setup an environment where JP and I can more easily capture mobile app traffic. Turned out Bluestacks on my iMac ended up being the most useful, as the others I tried either lacked the Google Play Store (so finding and installing the "official" apps would be hard) or needed more computing power than I had available.
Played around with getting netevmon to produce some useful events from the Ceilometer data and updated amp-web to be able to show those events on the dashboard. Some of our existing detection algorithms (Plateau, BinSegChangepoint, Changepoint) worked surprisingly well so we should have something useful to demo at the STRATUS forum on Friday.
Helped Brendon get netevmon up and running on lamp. There were a few issues unfortunately, mostly due to permission issues and R being terrible, but managed to get things running eventually. Spent a bit of time fixing some redundant event groups that we observed from the lamp data which were a side-effect from the fact that a group of traceroute events can be combined with both latency increase and decrease events. We also worked together to track down some bad IP traceroute paths that were being inserted into the database -- new amplets were not including a 'None' entry for non-responsive hops which NNTSC was expecting so an 11 hop path with 6 non-responsive hops was being recorded as a 5 hop contiguous path. Updated NNTSC to recognise a missing address field as a non-responsive hop.
Gave JP a crash course in libprotoident development so he can get started on his summer project.
Spent the early part of my week reading over Dan's and Darren's revised Honours reports and offering a final batch of suggestions.
Continued poking at libprotoident and the unknown traffic on various Web ports. Finally managed to get Blade and Soul (a Chinese MMO) installed and running and was able to confirm that it was responsible for some of my unknown flows.
Started turning my attention towards our STRATUS research this week. Initially, we are going to look at general metrics that we can extract from cloud infrastructure and see if any of our existing event detection techniques are useful for finding anomalous behaviour. For a start, we are using data collected by the Ceilometer module on the Waikato OpenStack instance. Spent some time bringing Harris up to speed on NNTSC and netevmon so that he can experiment with the data within our system. In the meantime, I'm going to take a closer look at the data that we've collected to see which series will be most suitable to focus on in the short term.
Gave more details about our STRATUS work / goals to the designers who will be producing a poster about our research for the upcoming STRATUS forum.
Also played with a service called ThisData which claimed to offer something similar to what we have envisioned from STRATUS. ThisData is certainly pretty, but doesn't really seem to offer much more than daily revision control for your cloud data.
Spent a fair chunk of my week proof-reading, first a document responding to questions about the BTM project, then Dan and Darren's Honours reports.
Tracked down and fixed a bug in parallel libtrace where ticks were messing with the ordered combiner, causing some packets to be sent to the reporter out of order. Also managed to replicate and fix the memory leak bug that was causing Yindong's wdcap on wraith to invoke the OOM killer.
Continued poking at unknown port 443 and port 80 traffic in libprotoident. Most of my time was spent trying to install and capture traffic from various Chinese applications that I had reason to suspect were causing most of my remaining unknown traffic, with mixed success.
Finally released the libtrace4 beta on Tuesday, after doing some final testing with the DAG cards in the 10G dev machines.
Managed to find a few more protocols to add to libprotoident, but am now trying to move towards releasing a new version. Starting having a closer look at TCP port 80 and TCP port 443 traffic in my Waikato traces, with the aim of trying to get as much traffic correctly classified as I can prior to doing an in-depth analysis of what is actually using those ports.
Spent Friday afternoon reading over Darren's honours report and providing some hopefully useful feedback.
Fixed the issues with BSD interfaces in parallel libtrace. Ended up implementing a "bucket" data structure for keeping track of buffers that contain packets read from a file descriptor. Each bucket effectively maintains a reference counter that is used to determine when libtrace has finished with all the packets stored in a buffer. When the buffer is no longer needed, it can be freed. This allows us to ensure packets are not freed or overwritten without needing to memcpy the packet out of the buffer it was read into.
Added bucket functionality to both RT and BSD interfaces. After a few initial hiccups, it seems to be working well now.
Continued testing libtrace with various operating systems / configurations. Replaced our old DAG configuration code that uses a deprecated API call to use the CSAPI. Just need to get some traffic on our DAG development box so I can make sure the multiple-stream code works as expected.
Managed to add another two protocols to libprotoident: Google Hangouts and Warthunder.
Finished the parallel libtrace HOWTO guide. Pretty happy with it and hopefully it should ease the learning curve for users who want to move over to the parallel API once released.
Continued working towards the beta release of libtrace4. Started testing on my usual variety of operating systems, fixing any bugs or warnings that cropped up along the way. It looks like there are definitely some issues with using the parallel API with BSD interfaces, so that will need to be resolved before I can do the release.
Now that I've got a full week of Waikato trace, I've been occasionally looking at the output from running lpi_protoident against the whole week and seeing if there are any missing protocols I can identify and add to libprotoident. Managed to add another 6 new protocols this week, including Diablo 3 and Hearthstone.
Met with Rob and Stephen from Endace on Thursday morning and had a good discussion about how we are using the Endace probe and what we can do to get more out of it.
Continued working on wdcap4. The overall structure is in place and I'm now adding and testing features one at a time. So far, I've got snapping, direction tagging, VLAN stripping and BPF filtering all working. Checksum validation is working for IP and TCP; just need to test it for other protocols.
Still adding and updating protocols in libprotoident. The biggest win this week was being able to identify Shuijing (Crystal): a protocol for operating a CDN using P2P.
Helped Brendon roll out the latest develop code for ampsave, NNTSC, ampy and amp-web to skeptic. This brings skeptic in line with what is running on prophet and will allow us to upgrade the public amplets without their measurement data being rejected.
Noticed a bug in my Plateau parameter evaluation which meant that Time Series Variability changes were being included in the set of Plateau events. Removing those meant that my results were a lot saner. The best set of parameters now gives a 83% precision rating and the average delay is now below 5 minutes. Started on a similar analysis for the next detector -- the Changepoint detector.
Continued updating libprotoident. I've managed to capture a few days of traffic from the University now, so that is introducing some new patterns that weren't present in my previous dataset. Added new rules for MongoDB, DOTA2, Line and BMDP.
Still having problems with long duration captures being interrupted, either by the DAG dropping packets or by the RT protocol FIFO filling up. This prompted me to start working on WDCap4: the parallel libtrace edition. It's a complete re-write from scratch so I am taking the time to carefully consider every feature that currently exists in WDCap and deciding whether we actually need it or whether we can do it better.
Made a video demonstrating BSOD with the current University capture point. The final cut can be seen at https://www.youtube.com/watch?v=kJlDY0XvbA4
Alistair King got in touch and requested that libwandio be separated from libtrace so that he can release projects that use libwandio without having libtrace as a dependency as well. With his help, this was pretty straightforward so now libwandio has a separate download page on the WAND website.
Continued my investigation into optimal Plateau detector parameters. Used my web-app to classify ~230 new events in a morning (less than 5 of which qualified as significant) and merged those results back into my original ground truth. Re-ran the analysis comparing the results for each parameter configuration against the updated ground truth. I've now got an "optimal" set of parameters, although the optimal parameters still only achieve 55% precision and 60% recall.
Poked around at some more unknown flows while waiting for the Plateau analysis to run. Managed to identify some new BitTorrent and eMule clients and also added two new protocols: BDMP and Trion games.
Continued digging into the unknown traffic in the day-long Waikato trace I captured last week. Diminishing returns are starting to really kick in now, but I've still managed to add another 9 new protocols (including SPDY) and improved the rules for a further 8.
Worked on a series of scripts to process the results of running the Plateau detector using a variety of different possible configurations (e.g. history and trigger buffer sizes, sensitivity thresholds etc). The aim is to find the optimal set of parameters based on the ground truth we already have. Of course, some parameter combinations are going to produce events that we have never seen before so I've also had to write code to find these events and generate suitable graphs so I can use my web-app to quickly manually classify them appropriately.
Spent a fair bit of time helping Yindong with his experiments.
Continued working on adding new rules to libprotoident based on unknown flows seen with the new Waikato capture. Since getting access to fresh traffic, I've added 12 new protocols and improved the rules for another 13 existing ones.
Some of the more notable protocols that I've added are QUIC, SPDY, WeChat, Git and Speedtest. Also added a rule for the AMP throughput test, as this is one of the biggest contributors of "Unknown" traffic.
Captured a full weekday of traffic to use as a basis for working out how regularly we can take permanent captures and what sort of duration we can reasonably expect to capture for. A single day is around 116 GB (snapped and compressed). To put this in context, ~100 days of similar capture from 2007 was 491 GB -- a little over 4 days worth of traffic now.
More work on the dashboard this week:
* added the ability to remove "common" events from the recent event list and made the graphs collapsible.
* added a table that shows the most frequently occuring events in the past day, e.g. "increased latency from A to B (ipv4)".
* polished up some of the styling on the dashboard and moved the dashboard-specific CSS (of which there is now quite a lot) into its own separate file.
Started thinking about how to include loss-related events in the event groups, as these are ignored at the moment.
The new capture point came online on Wednesday, so the rest of my week was spent playing with the packet captures. This involved:
* learning to operate EndaceVision.
* installing wdcap on the vDAG VM.
* adding the ability to anonymise only the local network in wdcap.
* performing a short test capture.
* getting BSOD working again, which required the application of a little "in-flow" packet sampling to run smoothly.
* running libprotoident against the test capture to see what new rules I can add.
Brad managed to track down a newer video card for quarterpounder, so now BSOD is up and running again.
Added Meena's lpicollector to our github so now I can finally deprecate the lpi_live tool that comes with libprotoident. Spent a bit of time updating some documentation and reworking the example client scripts so that everything is a bit easier to use. Also fixed a couple of memory bugs that I may have introduced last time I worked on the collector.
Continued working with the new event groups. Found a problem where I was incorrectly preferring shorter AS path segments over longer ones when determining whether I could remove a group for being redundant. Having fixed that, many event groups now cover several ASNs so I've redesigned the event list on the dashboard to be better at displaying multiple AS names.
The source code for both BSOD and Meenakshee Mungro's reliable libprotoident collector have been added to the WAND github page. Developers can freely clone these projects and make their own modifications or additions to the source code, while keeping up with any changes that we make between releases.
This is the first time we have released the libprotoident collector under the GPLv3 license. This project is a replacement for the lpi_live tool included with libprotoident, which should now be considered deprecated.
We're also more than happy to consider pull requests for code that adds useful features to either project.
WAND on GitHub
Finished updating NNTSC to deal with traceroute data. The new QueryBuilder code should make query construction a bit less convoluted within the NNTSC dbselect module. Everything seems to work OK in basic testing, so it's now just a matter of migrating over one of our production setups and seeing what breaks.
Continued working through the events on amp.wand.net.nz, looking at events for streams that fall in the 25-100ms and the 300+ms ranges. Results still look very promising overall. Tried to fix another common source of insignificant events (namely a single very large spike that moves our mean so much that subsequent "normal" measurements are treated as slightly abnormal due to their distance from the new mean) but without any tangible success.
Moved libtrace and libprotoident from svn to git and put the repositories up on github. This should make the projects more accessible, particularly to the increasing number of people who want to add support for various formats and protocols. It should also make life easier for me when it comes to pushing out bug fixes to people having specific problems and merging in code contributed by our users.
The source code for both our libtrace and libprotoident libraries is now available on GitHub. Developers can freely clone these projects and make their own modifications or additions to the source code, while keeping up with any changes that we make between releases.
We're also more than happy to consider pull requests for code that adds useful features or support for new protocols / trace formats to our libraries.
Look out for more of our open-source projects to make their way onto GitHub soon!
Started going through all the NNTSC exporting code and replacing any instances of blocking sends with non-blocking alternatives. This should ultimately make both NNTSC and netevmon more stable when processing large amounts of historical data. It is also proving a good opportunity to tidy up some of this code, which had gotten a little ropey with all the hacking done on it leading up to NZNOG.
Spent a decent chunk of my week catching up on various support requests. Had two separate people email about issues with BSOD on Friday.
Wrote a draft version of this year's libtrace assignment for 513. I've changed it quite a bit from last years, based on what the students managed to achieve last year. The assignment itself should require a bit more work this time around, but should be easily doable in just C rather than requiring the additional learning curve of the STL. It should also be much harder to just rip off the examples :)
Read through the full report on a study into traffic classifier accuracy that evaluated libprotoident along with a bunch of other classifiers ( http://vbn.aau.dk/files/179043085/TBU_Extended_dpi_report.pdf ). Pleased to see that libprotoident did extremely well in the cases where it would be expected to do well, i.e. non-web applications.