Spent most of the week adding view support to all of the existing collections within ampy. Much of the work was modifying the code to be more generic rather than the AMP-specific original implementation Brendon wrote as a proof of concept.
Added a new api to amp-web called eventview that will generate a suitable view for a given event, e.g. an AMP ICMP event will produce a view showing a single line for the address family where the event was detected.
Updated the legend generation code for views to work for all collections as well. Added a short label for each line so it will be possible to display a pop-up which will distinguish between the different colours for the same line group.
I've been creating the traceroute map this week (the first half spent initially coding and the second half spent fixing it to work with a much larger real data set). Instead of trying to port the existing PHP or RaphaelJS (SVG library) code, I decided it would be easier to roll my own for Flotr2. I've had a fair amount of success so far, and my graph now looks like this image:
You can also hover over a path to highlight the entire path and show information about it, and hover over an individual hop to highlight all hops to the same host.
Colours represent unique hosts and there are certain conditions governing path divergence and convergence so that it remains clear to the human eye which path is which. Implementationally the data structure used is a tree in which each node is a path who has zero to many branches. Therefore one constraint is that a path will only ever join up with its root node (the same path it diverged from) and it will currently only ever diverge and converge a maximum of once each. Each alternative path is also drawn on a new line for clarity. I think these constraints help to strike an effective balance between accurate network representation versus visual complexity.
Decided that we needed to simplify the database schema for storing
traceroute data, so spent some time working on that. The new schema
works better with the existing aggregation functions and is faster to
query. Moved all the existing data to the new schema.
Merged in the rainbow traceroute graphs that Brad created and got them
using data from the new traceroute data. Moved the default view of
combined traceroutes to use smokeping rather than a basic line graph to
better show what is happening with multiple addresses.
General tidyup of code that had got a bit crufty, removed some sections
that were duplicated or no longer required. Started work on moving the
DNS test to use views.
I have modified the Hidden Markov Model class I have written to use log transformed probabilities. This allows smaller probabilities to be expressed without risking underflow issues.
I am currently still working on the genetic algorithm that will be needed to determine the initial parameters of the Hidden Markov Model for use with anomaly_ts.
Updated scamper on Planetlab to use more destination addresses when carrying out per destination MDA (Multipath Detection Algorithm) analysis. The modified version has been run on one node to test the changes.
The code for detecting traces with no change in non load balancer nodes has been applied on a per trace basis to the code for detecting turnover. This has been run to produce more results, and these have been added to the paper.
I have been investigating the validity of the results I have for numbers of load balancing diamonds attached to the same load balancing node. Dumps of unique LB interfaces and successor sets have been examined and an added count of unique successor sets has been added. In calculating this, two sets with one address in common were taken as a match.
More work on the paper has been done. This has included Richards corrections, counts of collapsed load balancers and incorporation of data based on detection on route changes.
Finished a hash based version of splitting a trace to multiple mapper cores, along with the faster first in first served version for cases where state doesn't need to be maintained per mapper thread.
Implemented basic code outline for the reducer stage and passing results from the mappers. The basic implementation works with tracestats which reports a single set of results at the end. Started working on combining the results from each map step to deal with ordered results such as the modified packets returned by traceanon which can then be written back to disk during the reduce step.
Spent the first half of the week implementing a version of the Dempster-Schafer belief function in the eventing Python script. After debugging and testing to make sure that it worked properly, I went on to analysing the events for a few Smokeping streams. This consisted of finding the start of a detected event, finding it in the AMP graphs, giving it a significance rating of 0-5, with 0 being a FP and 5 being a very significant event, and then entering details of the event group in a spreadsheet. This was rather tedious and depending on the stream, sometimes took forever.
I plan on Seeing Dr. Joshi from the Stat Dept. next week to confirm the Dempster-Schafer calculations, after which I will have to resume the event analysing.
Finished the re-implementation of anomalyfeed to support grouping of streams into a single time series. Now our AMP ICMP tests are considered as one time series despite being spread across multiple addresses (and therefore multiple streams).
Brendon changed the way that we store AMP traceroute test results to improve the query performance, so this required a further update to anomalyfeed to be able to parse the new row format.
Updated NNTSC to always use labels rather than stream ids when querying the database. Eventually, all incoming queries will use labels but ampy still uses stream ids for many collections so we have to support both methods still. Any queries that are using stream ids are converted to labels by the NNTSC client API.
Updated Brendon's view / stream group management code in ampy to not be so AMP-specific. The collection-specific code has now moved into the parser code for each collection so it should be much easier to implement views for the remaining collections now.
Further to my simple event marker implementation (showing "signpost" markers above the plot area of a graph, along with the vertical lines currently used to mark events), I set out to determine the best way to draw these markers and lines so that they would be clearly visible but not so much as to obscure or draw attention away from the plot. One particular challenge was to be able to guage the severity of the event at a glance, which is currently done by colour (yellow/orange/red) but to avoid conflicting with colours used by series on the plot. One solution for this I tried was to prevent graph styles using any colours in a particular hue range (so as to avoid yellow-red which would be used by the events) but due to both the events and plot being in colour it still made it confusing to tell which were which.
After much very scientific experimentation in the field of colour perception I found that plots drawn with the same saturation or lightness appeared to be related, and changing the hue divided this related group further; it was easiest to tell event markers apart from data when one of these groups was drawn with a different level of saturation, regardless of hue. The obvious conclusion therefore was to draw one group at 0% saturation and the other at 100%, for maximum differentation. As such, the data plot is drawn at 100% saturation with any hue, the event lines are drawn grey and the event markers are still drawn with colour where they will not detract from the plot.
For the rest of the week I have been looking at reproducing (to some extent) the AMP traceroute map in Cuz using Flotr2. I've been familiarising myself with old code and researching alternative representation methods, such as directed graphs, as the data we want to visualise is not necessarily a tree, although we want to view the data in a tree-like form and we may not want the nodes to be as interconnected as they would be in a graph. While I've been thinking about this I've created a graph type that can draw static plots while moving through time and I'm going to start drawing things in it next week.
I spent the week dealing with the disappearing packets. My initial attempts to recreate the problem in a more simple setup kept resulting in kernel panics. To try to expedite the process of diagnosing the kernel panics we set up a virtual machine. This didnt particularly help with diagnosis however, as it fixed the problem immediately.
The mystery of the disappearing packets seems to be related to recirculating packets. That is, when you add or remove an mpls label the packet is recirculated to update the header information.
When I push an mpls label and send the packet to another table, if the flow on that table attempts to push another label the packet will be dropped instead. The flow count for the second flow is incremented but its actions dont seem to occur.
This only seems to happen with pushing labels. Other actions, like updating the mpls label fields (ttl or label), dont seem to cause this.
There are some other bizarre outcomes I am coming across. Popping mpls labels seems to be popping the innermost label, I end up with packets with one mpls tag, without the bottom of stack bit set and with the wrong label arriving at the hosts.
Also in some cases the flow counters are not getting updated, which could be a big problem for me.
All of this also is occurring in what appears to me at least to be a delightfully non-deterministic fashion.
I emailed the guy who maintains the branch I am using, but havent heard back yet. Hopefully he can shed some light on things.
But, in general, things are not going as well as they might be just at the moment.