Continued marching towards being able to migrate our prophet database to the updated NNTSC database schema. Discovered a number of cases where AMP tests were reporting failed results but values were still being inserted into the database for fields that should be invalid due to the test failing.
Updated the RRD-Smokeping schema to store the individual ping results as a single column using an array. This caused some problems with our approach for calculating the "smoke" that we show on the graphs, but Brendon the SQL-master was able to come up with some custom aggregation functions that should fix this problem.
Finished looking at the events on amp.wand.net.nz. Also managed to come up with a solution to the single-large-spike problem I had last week. It's not perfect (mainly in that it only works if the spike is exactly one measurement, a two measurement spike will still have the same problem), but it gets rid of a few annoying insignificant events.
Modified the traceroute pathchange detector to try and reduce the number of events we've been getting for certain targets, most notably NetFlix. The main change is that we now only consider a hop to be "new" if it doesn't match the subnet of any existing hops for that TTL. It's all very naive: a /24 is considered a subnet for IPv4, a /48 for IPv6, but it results in a big improvement. Eventually, I expect us to plug a BGP feed into the detector and look for changes in the AS path rather than the IP path, but this should tide us over until then.
Worked with Brad to set up a passive monitor to help ITS diagnose some problems they are having on their network related to broadcast and multicast traffic. Just waiting on ITS to let us know when the problems are occurring so we can narrow down our search for strange behaviour to just traces covering the time periods of interest.
Research has been carried out into the possibility that some load balancers partition packets to their successor nodes based on packet fields other than those that are considered usual. Thirteen fields where chosen for the test and scamper, the program we use for traceroute MDA, was modified to carry them out. Traceroute MDA is an analysis that scans the Internet for load balancers and other network topology information.
Small numbers of load balancers have been found that do this, and many of the same load balancers are found by different field tests. The next question was to find out if these load balancers behaved as load balancers all the time. The case of also being a per packet load balancer was excluded from the counts i.e one that partitions randomly. Three cases emerged that of a small number of hits and misses, a large number of misses and a small number of hits and no hits. A miss means that no load balancer was found though the same node that was a load balancer in another trace exists in the topology, but with only one successor if any. There may be some more special cases to exclude from the data such as the node being last in the trace etc. It appears that some of these load balancers show fairly consistent load balancing across a couple of field types, others show the load balancing behaviour only rarely.
The Internet simulator simulates doubletree and traceroute in its unmodified state. Traceroute determines Internet topology between a source and destination, but not including load balancers. Doubletree is a modified version of traceroute that is designed to avoid repeatedly analysing the same parts of the Internet, by sharing information between vantage points.
We are currently trying to simulate unmodified doubletree using very large datasets. This caused the simulator to run a long time. Analysis with valgrind callgrind indicates that the gang analysis case which is being used is the major problem here. So, the question is now being asked: can we do without gang analysis for this work?
A fast mapping like protocol has been set up for detecting black holes in load balancers. It involves one round of traceroute MDA analysis followed by eight rounds of Paris traceroute, followed by one further round of MDA. Paris traceroute causes the same path through a load balancer to be consistently followed. Comparing the initial and final MDA traces determines if a change in topology has occurred in the Internet. If this is the case finding a short Paris trace may not be due to a blackhole. After excluding all of the special cases that I can think of there remain some cases which appear to be blackholes in load balancers in our data.
In order to run this type of analysis on PlanetLab we prefer to use ICMP packets. Modifications were made to scamper on Yoyo our University computer outside the firewall, to randomly choose a flow ID for each trace and a run was initiated.
It has been decided that regression equations may not be sufficiently fundamental to understand the mathematical models predictions. In the first instance I am extracting 'proportional to' type relationships from these equations to see if more fundamental inferences may be made about the structure of the model.
It will also be desirable to try and derive these proportional to relationships from theoretical arguments. For example, is the number of packets sent from a vantage point proportional to the number of destinations probed in system where there are no savings measures taken. If a match or resolution is found between these two approaches then that would be good progress.
It may also be necessary to focus on the data sets with a large packet limit and determine if incomplete load balancer information should be counted if it seems sensible to do so. This is important because a small number of very high cost load balancers may exist, and these are the most important ones to avoid repeated analysis of.
Another week split between figuring out how to do the path generation and helping Brad with the REANNZ SDN.
We built a nice vlan capabale switch, and then built several more versions of it and got annoyed as they all failed to work properly on the pronto. But we got there in the end.
I also have been implementing the redundant path pair code, but it is proving more and more complicated all the time. I have what I am convinced is a correct algorithm but actually turning it into functioning code is really awkard.
Spent a fair bit of time finishing my detector probability script and making it look less awful. Then, spent the res of the week updating the eventing script to use the new detector probabilities and also updated the initial Sig and FP probabilities used by the Bayes fusion method. Then, added options to the eventing script to allow outputting the values of the different fusion methods, detectors and event grouping methods. The remaining time was spent testing the new changes and fixing a couple of minor bugs.
Worked on my proposal this week, along with some more research review.
I found quite an interesting paper 'libpcap-MTA General Purpose Packet Capture Library with Multi-Thread' which was in Chinese where they created a parallel version of libpcap. Many of the issues that where discussed confirmed what I had found out over summer. Made myself a github account now that the libtrace repository and downloaded a copy of the repository.
Updated the HTTP test to use a particular source address or interface if
specified. Though libcurl has options to set one of these, it doesn't
work well in the case where you need to set both an IPv4 and IPv6 source
address, before knowing what the target name resolves to. Luckily it has
a callback to completely replace the call to socket(), after name
resolution, so I can create my own socket and bind to the appropriate
Had similar problems while updating the throughput test to use a
particular interface - need to listen on both address families and then
once the control connection happens, make sure the test connection is on
the same interface.
Updated the DNS test to query the local servers listed in
/etc/resolv.conf by default if no targets are given. This works fine for
the standalone test, but it's not quite clear the best way to schedule a
test like this when it may get merged with others that do have destinations.
Added a few new unit tests for the DNS test coding/encoding of names and
fixed a few things that unit testing, valgrind and different versions of
Spent some time looking at paris traceroute and the AMP traceroute test.
Turns out that our traceroute already keeps the important header fields
stable during a test run and so behaves like paris. Confirmed this with
the fakeroute tool used to test paris traceroute.
Finished updating NNTSC to deal with traceroute data. The new QueryBuilder code should make query construction a bit less convoluted within the NNTSC dbselect module. Everything seems to work OK in basic testing, so it's now just a matter of migrating over one of our production setups and seeing what breaks.
Continued working through the events on amp.wand.net.nz, looking at events for streams that fall in the 25-100ms and the 300+ms ranges. Results still look very promising overall. Tried to fix another common source of insignificant events (namely a single very large spike that moves our mean so much that subsequent "normal" measurements are treated as slightly abnormal due to their distance from the new mean) but without any tangible success.
Moved libtrace and libprotoident from svn to git and put the repositories up on github. This should make the projects more accessible, particularly to the increasing number of people who want to add support for various formats and protocols. It should also make life easier for me when it comes to pushing out bug fixes to people having specific problems and merging in code contributed by our users.
Wrote and handed in my honours proposal on Friday(/Saturday), in the process doing a lot more research into the 6LoWPAN/RPL/CoAP stack and compiling reading material for future reference. I wrote my proposal in LaTeX also, which is new to me -- I figured it would be useful to learn now in order to use efficiently later.
My project is fairly open ended right now (evaluating the state of the art of 802.15.4 CoAP security), as nailing down goals is still going to involve further research. I found one paper in particular titled "Security of CoAP-based End-to-End Scenarios for the Internet of Things" (a master's thesis) that I think could be very relevant and perhaps help narrow my focus, but the thesis doesn't appear to be available online anywhere. Now that I've finished my initial proposal I may look at contacting the author.
The source code for both our libtrace and libprotoident libraries is now available on GitHub. Developers can freely clone these projects and make their own modifications or additions to the source code, while keeping up with any changes that we make between releases.
We're also more than happy to consider pull requests for code that adds useful features or support for new protocols / trace formats to our libraries.
Look out for more of our open-source projects to make their way onto GitHub soon!
My week has been split into two main projects, the first was continuing to work on my graph theory problems. In the end I decided since it the requirement of minimising flows is a specifically openflow problem it could be worth spending a bit of effort on, so I believe I have sorted out all the kinks in that and have just started implementing it again.
The other part was helping Brad set up the REANNZ SDN. We have it all connected now, and it passes packets, but we met a few snafus. The main one being that when you try to use the "normal" forwarding action the prontos seems to ignore the vlan for the first packet. So with 2 prontos along the path we were losing 4 pings (and 4 arp messages in each direction) every time we started over. The only solution to this was to either not use vlans or not use the normal action.
Brad also plugged things into the wrong ports and lots of nonsense like that. He might tell you it wasnt his fault, but he never writes weekly reports...