Shane Alcock's Blog
Started writing some content for the parallel libtrace paper. Managed to churn out an introduction, a background and a little bit of the implementation section.
Fixed a couple of bugs in netevmon prior to the deployment: crashing when trying to reconnect to a restarted NNTSC and some confusing event descriptions for changepoint events.
Finished setting up a mobile app test environment for JP. I've configured my old iPhone to act as an extra client for 2-way communication apps (messaging etc.). So far the environment has already been helpful, as we've managed to identify one of the major outstanding patterns as being used by the Taobao mobile shopping app.
Finished up the demo for STRATUS forum and helped Harris put together both a video and a live website.
Spent a bit of time trying to fix some unintuitive traceroute events that we were seeing on lamp. The problem was arising when a normally unresponsive hop was responding to traceroute, which was inserting an extra AS transition into our "path".
Rebuilt DPDK and Ostinato on 10g-dev2 after Richard upgraded it to Jessie so that I can resume my parallel libtrace development and testing once he's done with his experiments.
Installed and tested a variety of Android emulators to try and setup an environment where JP and I can more easily capture mobile app traffic. Turned out Bluestacks on my iMac ended up being the most useful, as the others I tried either lacked the Google Play Store (so finding and installing the "official" apps would be hard) or needed more computing power than I had available.
Played around with getting netevmon to produce some useful events from the Ceilometer data and updated amp-web to be able to show those events on the dashboard. Some of our existing detection algorithms (Plateau, BinSegChangepoint, Changepoint) worked surprisingly well so we should have something useful to demo at the STRATUS forum on Friday.
Helped Brendon get netevmon up and running on lamp. There were a few issues unfortunately, mostly due to permission issues and R being terrible, but managed to get things running eventually. Spent a bit of time fixing some redundant event groups that we observed from the lamp data which were a side-effect from the fact that a group of traceroute events can be combined with both latency increase and decrease events. We also worked together to track down some bad IP traceroute paths that were being inserted into the database -- new amplets were not including a 'None' entry for non-responsive hops which NNTSC was expecting so an 11 hop path with 6 non-responsive hops was being recorded as a 5 hop contiguous path. Updated NNTSC to recognise a missing address field as a non-responsive hop.
Gave JP a crash course in libprotoident development so he can get started on his summer project.
Added support for the ceilometer data to ampy and amp-web, so now we can look at graphs of the data. Noticed that our initial labelling of the network streams was a bit unhelpful, but with a couple of extra lookups I was able to match the streams back to a VM and a MAC address. Updated NNTSC to use the new stream identifiers, as well as fix a few other issues with Harris' original implementation (mostly fixing the spelling of "ceilo" in a lot of places!).
Added a new "Choose Time Period" button to the amp-web graphs to make it easier to jump to time periods that were more than a month ago. This is particularly useful for the ceilometer data as the data spans back to May last year and it was a real pain to navigate to with the current web-site. Managed to find and fix a few other small bugs in the amp-web code while I was looking at it.
Also helped Harris get netevmon up and running on his VM.
Tested and fixed my vanilla PF_RING libtrace code. I've been able to get comparable performance with the pfcount tool included with the PF_RING release so I'm fairly happy with that. Started working on adding support for the ZC version of the PF_RING driver, which uses an entirely different API.
Helped Harris get his head around how NNTSC works so that he could add support for the Ceilometer data. Set myself up with an OpenStack VM so that I can start working on the web graphs to display the data now that it is in a NNTSC database. Also spent a bit of time writing up an explanation of how netevmon works so that Harris can start looking into running our detectors against the Ceilometer data.
Worked with Brendon on Friday to get NNTSC and netevmon installed and running on the lamp machine.
Wrote some scripts to do some basic exploration of the Ceilometer data to check which collections and series are most suitable for using with netevmon. I've found that around 30-35% of the series for CPU utilisation, network byte rates and disk read/write rates are at least long enough to be worth using -- this works out to ~600 series for each metric, so we'll have a reasonable sample size. The spacing between measurements is more of a concern, as it is very inconsistent. There are some parts of NNTSC and netevmon that assume a fairly constant measurement rate, so these will need to be re-evaluated.
Started adding PF_RING support to libtrace. For a start, I'm just working with the standard PF_RING driver (not the ZC extension) and I've written code that should work with the old API. Once I've tested that, I'll start adding native parallel support using one thread per receive queue in the driver.
Also spent a bit of time planning a paper on parallel libtrace. I'm anticipating the main narrative to be about how we've achieved better potential performance by adding parallelism (depending on the workload and the number of threads), while still maintaining the key design goals of library (e.g. abstraction of complexity, format agnosticism, etc.). We'll show that the same parallel libtrace code can achieve better performance across multiple input formats, i.e. DAG, ring, PF_RING (once complete).
Spent the early part of my week reading over Dan's and Darren's revised Honours reports and offering a final batch of suggestions.
Continued poking at libprotoident and the unknown traffic on various Web ports. Finally managed to get Blade and Soul (a Chinese MMO) installed and running and was able to confirm that it was responsible for some of my unknown flows.
Started turning my attention towards our STRATUS research this week. Initially, we are going to look at general metrics that we can extract from cloud infrastructure and see if any of our existing event detection techniques are useful for finding anomalous behaviour. For a start, we are using data collected by the Ceilometer module on the Waikato OpenStack instance. Spent some time bringing Harris up to speed on NNTSC and netevmon so that he can experiment with the data within our system. In the meantime, I'm going to take a closer look at the data that we've collected to see which series will be most suitable to focus on in the short term.
Gave more details about our STRATUS work / goals to the designers who will be producing a poster about our research for the upcoming STRATUS forum.
Also played with a service called ThisData which claimed to offer something similar to what we have envisioned from STRATUS. ThisData is certainly pretty, but doesn't really seem to offer much more than daily revision control for your cloud data.
Spent a fair chunk of my week proof-reading, first a document responding to questions about the BTM project, then Dan and Darren's Honours reports.
Tracked down and fixed a bug in parallel libtrace where ticks were messing with the ordered combiner, causing some packets to be sent to the reporter out of order. Also managed to replicate and fix the memory leak bug that was causing Yindong's wdcap on wraith to invoke the OOM killer.
Continued poking at unknown port 443 and port 80 traffic in libprotoident. Most of my time was spent trying to install and capture traffic from various Chinese applications that I had reason to suspect were causing most of my remaining unknown traffic, with mixed success.
Finally released the libtrace4 beta on Tuesday, after doing some final testing with the DAG cards in the 10G dev machines.
Managed to find a few more protocols to add to libprotoident, but am now trying to move towards releasing a new version. Starting having a closer look at TCP port 80 and TCP port 443 traffic in my Waikato traces, with the aim of trying to get as much traffic correctly classified as I can prior to doing an in-depth analysis of what is actually using those ports.
Spent Friday afternoon reading over Darren's honours report and providing some hopefully useful feedback.
The long-awaited libtrace 4 is now available for public consumption! This version of libtrace includes an all new API that resulted from Richard Sanger's Parallel Libtrace project, which aimed to add the ability to read and process packets in parallel to libtrace. Libtrace can now also better leverage any native parallelism in the packet source, e.g. multiple streams on DAG, DPDK pipelines or packet fanout on Linux interfaces.
At this stage, we are considering the software to be a beta release, so we reserve the right to make any major API-breaking changes we deem necessary prior to a final release but I'm fairly confident that the beta release will be fairly close to the final product. At the same time, now is a good time to try the new API and let us know if there are any problems with it, as it will be difficult to make API changes once libtrace 4 moves out of beta.
Please note that the old libtrace 3 API is still entirely intact and will continue to be supported and maintained throughout the lifetime of libtrace 4. All of your old libtrace 3 programs should still build and run happily against libtrace 4; please let us know if this turns out to not be the case so we can fix it!
Learn about the new API and how parallel libtrace works by reading the Parallel Libtrace HOWTO.
Download the beta release from the libtrace website.
Send any questions, bug reports or complaints to contact [at] wand [dot] net [dot] nz