Further development and debugging of the warts based simulator based on Traceroute MDA load balancer data has been carried out. This is a simulation of a process called Megatree which limits the number of times the same load balancer is discovered. This simulator carries out all simulations in one pass overnight. This means that a new round of debugging can be carried out each day, if needed. Development has included a new factor which is varying the number of destinations for the simulations. Also planned are some analyses of many sources to few destinations, which involve using the warts data in reverse direction, and application of Doubletree to MDA data. This latter may involve combining Megatree and Doubletree. Debugging has included some more complex algorithms to keep only one copy of local and global load balancer sets when an identical set would be otherwise created.
One of the statistics output files from the conventional Traceroute based simulator is missing as the file was corrupted. I am waiting for the disk to be replaced on Wraith, so that the simulation may be repeated. In the mean time I am working on an adaptation of the graphing routine that comes with the simulator that allows categorical data to be used in the graphs, in particular the parameter referred to as the range in the software. This process has been made more complex because the range parameter value of the Traceroute files does not default to a single value like ttl does. The range parameter in this case, is instead direction of trace: either “few” or “many” sources. NB the initial ttl for Traceroute is always the beginning of the trace, nominally one, whereas Doubletree starts somewhere in the middle and this may vary according to different approaches. Because the Traceroute data uses two values of the range parameter it will be necessary to update the way in which Traceroute statistics data is incorporated into the graphs.
Spent some time working through my use of libunbound to get a better
understanding of exactly what it was doing at each point, and fixed the
memory leak I was experiencing. All of my worker threads can see
responses to any query (or none at all), so knowing when all their names
are resolved and the test can continue is important. They can update
each others results lists, so proper locking is also needed.
Updated Debian packaging files in preparation of making a new release on
all our current monitors. Tried a few iterations as the upgrade path
from the old version needed a bit of work, especially in conjunction
with puppet managed configuration spaces. This will go out after the
data migration next week.
Making the path builders work proved far harder than I thought possible, for just implementing a fairly simple algorithm.
I have three systems now, one which just finds any two minimally overlapping paths. This was fairly simple to put together. The second was the one that proved the most difficult, and it involves having two minimally overlapping paths between each pair of nodes using only two flows per switch. The third doesnt create full paths, instead for each node it has two paths it can use to send packets to a given destination, each leading to another node that is closer to the destination, or to the destination itself.
Yeah, implementing the second one of these turned into a bit of a nightmare, what I thought would be a one afternoon job took me all week. So many little edge cases with no elegant solutions I could find all turning the code more and more labyrinth like. It wasnt a very fun week.
Anyway, its all going now, so I am going to do comparisons based on the number of flows, the length of paths found by these algorithms, and how effectively they distribute packets around the network.
Then its mostly writing from here on. I would like to have a full network running this that I can introduce some loss into and test how quickly I can locate the problem and work around it and how many packets get lost in the process, but that seems like a distant goal at the moment considering the time I have left.
Following the authoring of my last weekly report, I successfully discovered and resolved the packet truncation issue I was having. It appears that there is a (possibly reintroduced) bug in the Open vSwitch implementation where the OFPCML_NO_BUFFER option (which instructs the switch not to create a buffer for an incoming packet) was being ignored where the incoming packets came in on a flow set with a priority of 0. Changing the priority to a different number allowed us to successfully read the full contents of a DHCP packet, options and all - no truncation. I was able to discover the problem by using scapy to throw some fixed-length packets to my controller and observe how the controller interpreted them as they were intercepted by different flows. A simple fix but exasperating to find.
I've spent the rest of this week assembling my presentation for my in-class honours practice talk, which I gave it today. It went ok, but I ran over time and found I had too many slides with too much content - but then again I found it difficult to find a happy medium in accommodating my audiences understanding of some of the concepts I was to speak about. Hat-tip to Brad and other members of the WAND group for their constructive feedback about my draft presentation.
Other than that, I have spent the rest of my project time this week working on my interim project report. I haven't found it too much of a struggle to grasp Latex, and now I'm finding myself wondering how I managed report writing before now without it. I've spent a lot more time than I intended writing longer introduction and background chapters, with the intention of borrowing some of it for my final report where applicable.
I gave my practice presentation to the 520 class last Wednesday, it went well.
I finished updating tracertstats to use the tick packets and removed the previous system of holding temporary results. This has also removed a lot of duplicated code which is nice.
I've started on the mid-term report which is due this Friday, I'm hoping to be able to reuse much of the introductory content in the final report.
I've been flat out with other assignments for the past little while and haven't managed to get any honours work done, but I'm about to spend this week writing my interim report. As with my proposal I plan for it to consist mostly of background research, since my experiences with the STM32W RFCKIT have thus far yielded more questions than results.
Spent Mon-Wed on Jury service.
Continued fixing problems with gcc-isms in libtrace. Added proper checks for each of the various gcc optimisations that we use in libtrace, e.g. 'pure', 'deprecated', 'unused'. Tested the changes on a variety of system and they seem to be working as expected.
Started testing the new ampy/amp-web on prophet. Found plenty of little bugs that needed fixing, but it now seems to be capable of drawing sensible graphs for most of the collections. Just a couple more to test, along with the matrix.
Replaced the libc resolver with libunbound. Wrote a few wrapper
functions around the library calls to give me data in a linked list of
addrinfo structs in a similar way to getaddrinfo() so that it don't need
to modify the code around tests too much. The older approach with each
test managing the resolver didn't allow caching to work (there was no
way for them to share context/cache), so I moved that all into the main
process. Tests now connect to the main process across a unix socket and
ask for the addresses for their targets.
Using asynchronous calls to the resolver has massively cut the time
taken pre-test, and the caching has cut the number of queries that we
actually have to make. We shouldn't be hammering the DNS servers any more.
Spent a lot of time testing this new approach and trying to track down
one last infrequently occurring memory leak.
Finished most of the ampy reimplementation. Implemented all of the remaining collections and documented everything that I hadn't done the previous week, including the external API. Add caching for stream->view and view->groups mappings and added extra methods for querying aspects of the amp meta-data that I had forgotten about, e.g. site information and a list of available meshes.
Started re-working amp-web to use the new ampy API, tidying up a lot of the python side of amp-web as I went. In particular, I've removed a lot of web API functions that we don't use anymore and also broken the matrix handling code down into more manageable functions. Next job is to actually install and test the new ampy and amp-web.
Spent a decent chunk of time chasing down a libtrace bug on Mac OS X, which was proving difficult to replicated. Unfortunately, it turned out that I had already fixed the bug in libtrace 3.0.19 but the reporter didn't realise they were using 3.0.18 instead. Also, received a patch to the libtrace build system to try and better support compilers other than gcc (e.g. clang) which prompted me to take a closer look at some of the gcc-isms in our build process. In the process, I found that our attempts to check if -fvisibility is available was not working at all. Once I had replaced the configure check with something that works, the whole libtrace build broke because some function symbols were no longer being exported. Managed to get it all back working again late on Friday afternoon, but I'll need to make sure the new checks work properly on other systems, particularly FreeBSD 10 which only has clang by default.
Further development has been carried out on the warts analysis based MDA (load balancer topology) simulator. Local data is looking reasonable now and a start has been made on coding the global or distributed data processing part. The program is in the process of being debugged. Furthermore it seems like some analysis of the many to few scenario would be a good idea for this work as it would tie in with the emphasis on many vantage points. Factors included so far include various numbers of stages where controller info is made use of, and a window of 500 traces. The frequency of sending control data is also varied.
The Doubletree and Traceroute simulator runs have been carried out for the data from one day of Caida data collection. I am now in the process of producing graphs from it. Factors included many versus few sources, Doubletree versus Traceroute and varying numbers of stages where control data is sent. It seems like a good idea to also reduce the number of vantage points in several stages, and repeat the simulator runs.