Shane Alcock's Blog
Updated the event tooltips to better describe the group that the event belongs to, as it was previously difficult to tell which line the event corresponded to when multiple lines were drawn on the graph.
Brad's rainbow graph is now used whenever an AMP traceroute event is clicked on in the dashboard. Fixed a couple of bugs with the rainbow graph: the main one being that it was rendering the heavily aggregated summary data in the detail graph instead of the detailed data.
Replaced the old hop count event detection for traceroute data with a detector that reports when a hop in the path has changed.
Fixed a tricky little bug in NNTSC where large aggregate data queries were being broken up into time periods that did not align with the requested binsize, so a bin would straddle two queries. This would produce two results for the same bin and was causing the summary graph to stop several hours short of the right hand edge.
Started working on making the tabs allowing access to "similar" graphs operational again. Have got this working for LPI, which is the most complicated case, so it shouldn't be too hard to get tabs going for everything else again before the end of the year.
Spent most of the week adding view support to all of the existing collections within ampy. Much of the work was modifying the code to be more generic rather than the AMP-specific original implementation Brendon wrote as a proof of concept.
Added a new api to amp-web called eventview that will generate a suitable view for a given event, e.g. an AMP ICMP event will produce a view showing a single line for the address family where the event was detected.
Updated the legend generation code for views to work for all collections as well. Added a short label for each line so it will be possible to display a pop-up which will distinguish between the different colours for the same line group.
Finished the re-implementation of anomalyfeed to support grouping of streams into a single time series. Now our AMP ICMP tests are considered as one time series despite being spread across multiple addresses (and therefore multiple streams).
Brendon changed the way that we store AMP traceroute test results to improve the query performance, so this required a further update to anomalyfeed to be able to parse the new row format.
Updated NNTSC to always use labels rather than stream ids when querying the database. Eventually, all incoming queries will use labels but ampy still uses stream ids for many collections so we have to support both methods still. Any queries that are using stream ids are converted to labels by the NNTSC client API.
Updated Brendon's view / stream group management code in ampy to not be so AMP-specific. The collection-specific code has now moved into the parser code for each collection so it should be much easier to implement views for the remaining collections now.
Spent the first part of the week fixing various bugs and less than ideal behaviours in netevmon and nntsc. Some examples include:
* Preventing an event from being triggered when an amp-traceroute stream reactivates after a long idle time
* Fixed a crash bug in anomalyfeed due to an incorrect field name being used
* Fixed a problem in NNTSC where the HTTP dataparser would fall over if a path contained a ' character.
* Added a rounding threshold to the Mode detector so that it can be used with AMP ICMP streams, as these measure in usec rather than msec. Now we can round to the nearest msec.
Brendon finally merged his view changes back into the development branches of our software. This caused a number of problems with netevmon, as this had been overlooked when testing the changes originally. Managed to patch up all the problems in a rather hurried session on Tuesday afternoon and got everything back up and running.
Restarted netevmon with the TEntropy detectors running. They seem to be performing very well so far and are a useful addition.
Started working on adding the ability to group streams into a single time series within anomalyfeed. The main reason for this is to be able to cope better with the variety of addresses that AMP ICMP typically tests to. It makes more sense to consider all of these streams as a single aggregated stream rather than trying to run the event detectors against each stream individually, especially considering many addresses are only tested to intermittently. Grouping them will ensure there should be a result at every measurement interval. So far I've got this working for AMP ICMP, AMP traceroute and AMP DNS and will need to reimplement the other collections using the new system.
Spent a fair chunk of time reading up on belief theory and Dempster-Shafer so that I could give Meena some pointers on what she will need to be able to apply them to our event data. Managed to come up with some rough ideas that seem to work, but not sure if the theory is being applied 100% correctly.
Spent some time tweaking the new TEntropy-based detectors in netevmon to reduce the number of false positives and insignificant events that they were reporting. Mostly this involved tuning the various thresholds used by the Plateau detector that is run over the TEntropy values rather than the TEntropy methodology itself.
As I was doing this, I started putting together a gigantic spreadsheet of the events observed, their significance, which detectors were picking them up, and the delay between the event starting and the detector reporting it. This is useful for two main reasons:
* As I adjust and tweak the existing detectors I can easily compare the events I used to detect with what I am detecting now (and what I think I should be getting).
* We will need to calculate the probability that a given detector is right for the next major phase of Meena's project. This spreadsheet will form the basis for estimating these probabilities.
Added support to NNTSC for collecting and storing AMP HTTP test results. Seems to work reasonably well (after fixing a bug or two in the test itself!) but it'll be interesting to see how query performance pans out once the table starts to get large, given our travails with the traceroute data.
Managed to write libprotoident rules for a couple of new applications, WeChat and Funshion. Released a new version of libprotoident (2.0.7).
Added support for the AMP DNS test to NNTSC, netevmon and amp-web. Wrote a new detector that looks for changes in response codes, e.g. the DNS response going from NOERROR to REFUSED or some other error state. This should also be useful for the HTTP test in the future.
Fixed a bug in the ChangepointDetector where it wasn't dealing well with streams that featured large values (i.e. >100,000). Also spent a bit more time tweaking the Plateau detector, mainly dealing with problems that show up when either the mean or the standard deviation are very small.
This release adds support for 14 new protocols including League of Legends, WhatsApp, Funshion, Minecraft, Kik and Viber. A new category for Caching has also been added.
A further 13 protocols have had their rules refined and improved including Steam, BitTorrent UDP, RDP, RTMP and Pando.
This release also fixes the bug where flows were erroneously being classified as No Payload, despite payload being present.
The full list of changes can be found in the libprotoident ChangeLog.
Short week due to remaining in Aus for a holiday after LCN.
Upon my return, I spent a bit of time trying to capture traffic for WhatsApp and other mobile messaging services. I had earlier found some flows that were possibly WhatsApp in some traffic I had captured before going away and wanted to confirm it.
It turned out to be a bit trickier to get this traffic than originally anticipated. WhatsApp required a mobile phone number to register an account so we needed to acquire a couple of new 2degrees SIM cards and receive the confirmation text messages on them. Also, the Android VM that we had created for this purpose wouldn't install WhatsApp because the image was intended for a tablet rather than a phone so we had to use Blue Stacks instead.
I also captured traffic for Kik, another similar application, and found that we were erroneously classifying Kik traffic as Apple Push notifications as they both use SSL on port 5223. Fortunately, some very subtle differences in the SSL handshake allowed me to write a rule that could reliably identify Kik traffic. Also tried to capture GroupMe traffic but could not reliably receive the text message required to register an account.
Spent most of Friday going over events reported by the Plateau detector in netevmon and made a number of tweaks which should hopefully make it quicker to pick up on obvious changes in latency time series as well as more reliable than before.
Spent most of the week in Sydney attending the LCN 2013 conference. Gave my presentation in the Workshop on Network Measurement to little fanfare.
Learned a few things at the conference:
* Named Data Networking exists and some people are taking it seriously: http://named-data.net/ . My first thoughts were that we've had enough trouble getting people to adopt a new IP version, let alone a system that completely changes how routers work.
* Lots of people are still using ns-2 to validate their research.
* The bar for publication is pretty low in some conferences / workshops, as long as you do something "innovative".
Spent most of the week preparing for my Sydney trip. Wrote the talk I will be presenting this coming Thursday and gave a practice rendition on Friday.
The rest of my time was spent fixing minor issues in Cuz -- trying not to break anything major before I go away for a week. Replaced the bad SQLAlchemy code in the ampy netevmon engine with some psycopg2 code, which should make us slightly more secure. Also tweaked some of the event display stuff on the dashboard so that useful information is displayed in a sensible format, i.e. less '|' characters all over the place.
Had a useful meeting with Lightwire on Wednesday. Was pleased to hear that their general impression of our software is good and will start working towards making it more useful to them over the summer.