Continued the ampy reimplementation. Finished writing the code for the core API (aside from any little functions that we use in amp-web that I've forgotten about) and have implemented and tested the modules for the 3 AMP collections and Smokeping. Have also been adding detailed documentation to the new libampy classes, which has taken a fair chunk of my time.
Read over a couple of draft chapters from Meena's thesis and spent a bit of time working with her on improving the order and clarity of her writing.
Fixed a libtrace bug that Nevil reported where setting the snap length was not having any effect when using the ring: format.
I am a little suspicious that the results for finding load balancing behaviour of normally non load balancing fields may be caused by some kind of noise. There are cases however where a particular node is tested a small number of times and some cases find a load balancer. I then looked at some cases of load balancers found and noticed that they tend to involve only two routers which are joined together by multiple links. In particular, if a node has been excluded from the summary if it has ever been seen to show per packet behaviour. It is not at this stage, clear why some nodes would show this limited type of load balancing and other times only show a single link and no load balancing.
After the successful run on the Internet Simulator with one days team data, a new stat was included in the program to show the number of packets used not including controller packets, as well as the existing statistic that includes both controller and probe packets. A run was carried out using the modified code and sensible results were observed. Using the existing memory map, the simulation took only one day. This may mean that using this amount of data, gang probing may again be a possibility though not essential.
The second phase driver for the scamper black hole detector was tested and debugged. The latest testing involved manually adding a line to the targets file once the driver was running and waiting. Once that test was successful a full blackhole detection run was initiated on Yoyo, and appears to be behaving correctly.
The Megatree MDA simulator kernel is being tested to see if the program can correctly alternate between data from different vantage points for a given run. This work is in the debugging and rerunning stage and is gradually starting to behave correctly. This simulator is based on the scamper warts analysis program that I use for analysis, and will involve directly accessing topology data rather than running packet by packet. This will make the processing of large data files feasible, and will still allow access to packet level information.
Working through pausing code to ensure all packets are cleaned up correctly and that packets are not lost in the case of a file etc. I've implemented pausing for the case a hasher thread is in use, this ensures all queued packets are emptied to the perpkt threads for processing. This is the favored behaviour if reading from a file because it means that packets are not lost. Added 'state' to the trace object to help ensure pause cannot be called twice, or start when a pause is still in progress etc.
I still have a couple of things to complete the pause functionality nicely, the main thing that's left to finish off is to make copies of packets in the reducer/reporter queues if the buffers belong to the trace format, since pausing is often implemented by the underlying format as closing the trace and invalidating these packets.
I might also look into the data structures used here since FIFO operation is pretty much the only way this will be used so optimisation would be good.
This pause code will also be called first before stopping a trace say from a SIGINT.
Found and fixed a couple of other bugs as I've been working on this.
Fixed up the local rabbitmq configuration to properly generate vhosts,
users and shovels that will actually work and allow data to be passed
around. The configuration file needed some tweaking to make this work,
and may need to be reworked in the near future to make it more clear how
the collector needs to be configured.
Created a Debian init script to deal with starting multiple instances of
amplet clients on a single machine, each using different configuration
files. They be started/stopped individually or as a whole. Updated some
of the other Debian packaging scripts to deal with the new config
directory layout. Started testing them to make sure that everything
required is present and still ends up in the right places.
Added options to create a pidfile in the amplet2 client so that it works
better with the new init scripts, and targeting individual instances of
Spent some time looking into test results and performance after seeing
some results from a student testing on a loaded system. Latency
measurements degrade as load increases, but those made by ping remain
quite stable. Briefly tried testing the difference between using
gettimeofday() and SO_TIMESTAMP but found no obvious differences under load.
Started on re-implementing ampy afresh. The ampy code-base had grown rather organically since we started on the project and the structure was quite messy and difficult to work with.
The main changes so far are as follows:
* Better use of OO to minimise code duplication, especially in the collection handling code
* Top-level API is all located in one module rather than being spread across several modules
* Added a StreamManager class that handles the dictionary hierarchy for storing stream properties. Collections can now simply express their hierarchy as an ordered list, e.g. ['source', 'dest', 'packetsize', 'family']. Inserting and searching are handled by the StreamManager -- no need to write code for each collection to manage dictionaries.
* Simplified view management code that does NOT call back into the collection modules.
* Fresh implementation of the block management and caching code, which will hopefully be easier to debug.
* Removed a whole lot of redundant or unused code.
So far, I'm about half-way through the re-implementation. Most of the API is there but I've only implemented one collection thus far. Since the goal is to make it easier to add new collections to ampy, hopefully adding the rest shouldn't take too long :)
Have been busy with assignments the last couple of weeks, although one was a literature review on wireless sensor networks that I did together with Richard Sanger, which conveniently relates to my project. It gave me some really good further background knowledge (and a solid page of references) with regards to the technologies I'm working with, so I'm hoping that will be useful when it comes to writing later on.
This week I was able to sit down for a few hours and play with Contiki a bit more. I spent much longer than I should have trying to get a toolchain set up for cross compiling Contiki from my own Debian system, so that I don't have to rely on the Instant Contiki virtual machine any more. It turns out that the process of setting up the toolchain (at least for the platform I am working with) is actually really straightforward - there's just no documentation on it. I also worked out some interesting differences between the most recent stable release of Contiki and the master branch currently in Git - it seems there has been quite a lot of development since that release which brings about useful changes, but the mbxxx platform hasn't quite been brought up to date with the core changes and so it's not so usable. I've decided to go off the 2.7 release for application implementations (example CoAP/HTTP servers etc) but backported the improvements made to the stm32 tools so that it's possible to flash the devices from Debian without HAL.
So I've got to the stage where I've flashed a device with a CoAP server but I don't have a way of easily testing it since I don't have a gateway device. I'm thinking of putting a client on the second device with a shell that I can control it through, and I'll have to look into how the devices actually pair with each other etc. I ran into issues with overflowing RAM and ROM with newer versions of the apps from the master branch, but r2.7 versions seem fine. Once I've determined whether memory is going to be sufficient on this platform we might need to acquire a couple more devices to test RPL (or the simulator might also do the trick, but that's boring).
I took the first step towards actually writing this week. Figured out my Chapter layout and made a bunch of notes about what to include.
I also completed my first test. It took pretty much all week to run, so I may have to do slightly fewer tests some how. Either fewer repetitions or test fewer different values. Probably both.
I figured out a solution to my multiple paths algorithm, basically just by prioritising the paths from the current node, and having everything fall into line with that. It makes it fairly slow, and it means that you end up prioritising longer paths over shorter ones in a lot of situations, but there is a limit to how bad the paths can be and it works. May still be non polynomial, but it is precaclulatable at least.. Networks cant have all that many nodes right?
Further work has been carried out on the black hole detection system based on a fast mapping approach. An initial data set has been collected and the construction of an analysis routine has begun to investigate the series of MDA and Paris traceroute runs. Much of the same code will be able to be used as in the earlier routine, however the new data sets have all the traces mixed in together so the ones for analysis must be identified and grouped according to destination address. This so that destination cases where a black hole is found can be reported.
Another angle relating to this same work is the development of the drivers. It turns out that the program loop waits at some points if no new results need processing. This means that scheduled regular tasks will not be triggered if they rely on the loop circulating. In particular changes to the targets list will not be processed and new targets will not be analysed. This will require investigation into how to avoid the waiting at certain steps. Once this is achieved some sleeps will also need to be added to avoid too much CPU usage.
The Internet simulator appears to have carried out a successful simulation when the data set was reduced to a third. This success was achieved after having to make a change to an existing assertion about some data variables. It seems that under certain circumstances the was able to detect an allowable condition. The following is the assertion: assert(firstHintTime <= simTime); There is a method which can occasionally reset the firstHintTime and possibly make it greater than the simTime: initialiseHints(void).
I have also started on an algorithm to process warts data and approximate a simulation without the great cost of processing packet by packet. This approach is still able to provide information about packet costs as warts records most commonly needed packet details.
Updated some configuration in amp-web to allow fully specifying how to
connect to the amp/views/event databases.
Set up some throughput tests to collect data for Shane to test inserting
the data. While doing so I found and fixed some small issues with
schedule parsing (test parameters that included the schedule delimiter
were being truncated) and test establishment (EADDRINUSE wasn't being
picked up in some situations).
Started adding configuration support for running multiple amplet clients
on a single machine. Some schedule configuration can be shared globally
between all clients, but they also need to be able to specify schedules
that belong only to a single client. Nametables, keys, etc also need to
be set up so that each client knows where they are.
Started writing code to configure rabbitmq on a client and isolate our
data from anything else that might already be on that broker (e.g.
another amplet client). Each amplet client should now operate within a
private vhost and no longer require permissions on the default one.
Fixed problems we were having with netevmon causing NNTSC to fill up its queues and therefore use huge amounts of memory. There were two components to this fix: the most effective change was to modify netevmon to only ask for one stream at a time (previously we asked for them all at once because this was the most efficient way to query the old database schema). The other change was to compress the pickled query result before exporting it which reduced the queue footprint and also meant we could send the data faster, meaning that the queue would drain quicker.
Fixed a bug in ampy that was preventing events from showing up on the graphs or the dashboard. We now have a fully functioning netevmon running on prophet again.
Spent a couple of days going over the AMP event ground truth I generated a few weeks back after Meena reported that there were a number of events being reported now that didn't have ground truth. This was due to the changes and improvements I had made to netevmon while working on the ground truth -- as a result, some events disappeared but there were also a few new ones that took their place. Noticed a few bugs in Meena's new eventing script while I was doing this where it was reporting incorrect stream properties, so I tracked those down for her while I was at it.
Wrote a NNTSC dataparser for the new AMP throughput test. Found a few bugs in the test itself for Brendon to solve, but both the test and the dataparser seem to be working in the most basic cases.
Had a play with Nevil's python-libtrace code and reported a few bugs and missing features back to him. Looking forward to those being fixed as it is pretty nifty otherwise.