Started on re-implementing ampy afresh. The ampy code-base had grown rather organically since we started on the project and the structure was quite messy and difficult to work with.
The main changes so far are as follows:
* Better use of OO to minimise code duplication, especially in the collection handling code
* Top-level API is all located in one module rather than being spread across several modules
* Added a StreamManager class that handles the dictionary hierarchy for storing stream properties. Collections can now simply express their hierarchy as an ordered list, e.g. ['source', 'dest', 'packetsize', 'family']. Inserting and searching are handled by the StreamManager -- no need to write code for each collection to manage dictionaries.
* Simplified view management code that does NOT call back into the collection modules.
* Fresh implementation of the block management and caching code, which will hopefully be easier to debug.
* Removed a whole lot of redundant or unused code.
So far, I'm about half-way through the re-implementation. Most of the API is there but I've only implemented one collection thus far. Since the goal is to make it easier to add new collections to ampy, hopefully adding the rest shouldn't take too long :)
Have been busy with assignments the last couple of weeks, although one was a literature review on wireless sensor networks that I did together with Richard Sanger, which conveniently relates to my project. It gave me some really good further background knowledge (and a solid page of references) with regards to the technologies I'm working with, so I'm hoping that will be useful when it comes to writing later on.
This week I was able to sit down for a few hours and play with Contiki a bit more. I spent much longer than I should have trying to get a toolchain set up for cross compiling Contiki from my own Debian system, so that I don't have to rely on the Instant Contiki virtual machine any more. It turns out that the process of setting up the toolchain (at least for the platform I am working with) is actually really straightforward - there's just no documentation on it. I also worked out some interesting differences between the most recent stable release of Contiki and the master branch currently in Git - it seems there has been quite a lot of development since that release which brings about useful changes, but the mbxxx platform hasn't quite been brought up to date with the core changes and so it's not so usable. I've decided to go off the 2.7 release for application implementations (example CoAP/HTTP servers etc) but backported the improvements made to the stm32 tools so that it's possible to flash the devices from Debian without HAL.
So I've got to the stage where I've flashed a device with a CoAP server but I don't have a way of easily testing it since I don't have a gateway device. I'm thinking of putting a client on the second device with a shell that I can control it through, and I'll have to look into how the devices actually pair with each other etc. I ran into issues with overflowing RAM and ROM with newer versions of the apps from the master branch, but r2.7 versions seem fine. Once I've determined whether memory is going to be sufficient on this platform we might need to acquire a couple more devices to test RPL (or the simulator might also do the trick, but that's boring).
I took the first step towards actually writing this week. Figured out my Chapter layout and made a bunch of notes about what to include.
I also completed my first test. It took pretty much all week to run, so I may have to do slightly fewer tests some how. Either fewer repetitions or test fewer different values. Probably both.
I figured out a solution to my multiple paths algorithm, basically just by prioritising the paths from the current node, and having everything fall into line with that. It makes it fairly slow, and it means that you end up prioritising longer paths over shorter ones in a lot of situations, but there is a limit to how bad the paths can be and it works. May still be non polynomial, but it is precaclulatable at least.. Networks cant have all that many nodes right?
Further work has been carried out on the black hole detection system based on a fast mapping approach. An initial data set has been collected and the construction of an analysis routine has begun to investigate the series of MDA and Paris traceroute runs. Much of the same code will be able to be used as in the earlier routine, however the new data sets have all the traces mixed in together so the ones for analysis must be identified and grouped according to destination address. This so that destination cases where a black hole is found can be reported.
Another angle relating to this same work is the development of the drivers. It turns out that the program loop waits at some points if no new results need processing. This means that scheduled regular tasks will not be triggered if they rely on the loop circulating. In particular changes to the targets list will not be processed and new targets will not be analysed. This will require investigation into how to avoid the waiting at certain steps. Once this is achieved some sleeps will also need to be added to avoid too much CPU usage.
The Internet simulator appears to have carried out a successful simulation when the data set was reduced to a third. This success was achieved after having to make a change to an existing assertion about some data variables. It seems that under certain circumstances the was able to detect an allowable condition. The following is the assertion: assert(firstHintTime <= simTime); There is a method which can occasionally reset the firstHintTime and possibly make it greater than the simTime: initialiseHints(void).
I have also started on an algorithm to process warts data and approximate a simulation without the great cost of processing packet by packet. This approach is still able to provide information about packet costs as warts records most commonly needed packet details.
Updated some configuration in amp-web to allow fully specifying how to
connect to the amp/views/event databases.
Set up some throughput tests to collect data for Shane to test inserting
the data. While doing so I found and fixed some small issues with
schedule parsing (test parameters that included the schedule delimiter
were being truncated) and test establishment (EADDRINUSE wasn't being
picked up in some situations).
Started adding configuration support for running multiple amplet clients
on a single machine. Some schedule configuration can be shared globally
between all clients, but they also need to be able to specify schedules
that belong only to a single client. Nametables, keys, etc also need to
be set up so that each client knows where they are.
Started writing code to configure rabbitmq on a client and isolate our
data from anything else that might already be on that broker (e.g.
another amplet client). Each amplet client should now operate within a
private vhost and no longer require permissions on the default one.
Fixed problems we were having with netevmon causing NNTSC to fill up its queues and therefore use huge amounts of memory. There were two components to this fix: the most effective change was to modify netevmon to only ask for one stream at a time (previously we asked for them all at once because this was the most efficient way to query the old database schema). The other change was to compress the pickled query result before exporting it which reduced the queue footprint and also meant we could send the data faster, meaning that the queue would drain quicker.
Fixed a bug in ampy that was preventing events from showing up on the graphs or the dashboard. We now have a fully functioning netevmon running on prophet again.
Spent a couple of days going over the AMP event ground truth I generated a few weeks back after Meena reported that there were a number of events being reported now that didn't have ground truth. This was due to the changes and improvements I had made to netevmon while working on the ground truth -- as a result, some events disappeared but there were also a few new ones that took their place. Noticed a few bugs in Meena's new eventing script while I was doing this where it was reporting incorrect stream properties, so I tracked those down for her while I was at it.
Wrote a NNTSC dataparser for the new AMP throughput test. Found a few bugs in the test itself for Brendon to solve, but both the test and the dataparser seem to be working in the most basic cases.
Had a play with Nevil's python-libtrace code and reported a few bugs and missing features back to him. Looking forward to those being fixed as it is pretty nifty otherwise.
I have tests up and running.
There was a bit of an issue with the fact that it relies on using os.system to call tc. This activates loss after a random time period and then it times how long it takes for it to notice the packet loss. There seemed to be a problem if the random time period was too short, the whole program would lock up. So I solved that by adding a minimum time period to the sleep. This seems like a bad way of fixing this problem, but it worked and I have no idea how else I would get around it.
Anyway the test is running and I'm guessing it will be running all week.
There is a new ovs version and a new picos version so I am checking all of our past issues against those again just incase someone fixed them. Fingers crossed..
Also played around with our SDN with Brad. We fixed the issue with it constantly disconnecting from the controller. It seems our drop rule was taking precedence over the openvswitch hidden flows it uses to allow inband control. So all our control traffic was getting dropped. We had an awful work around, but then the new picos version fixed it.
I have been working on the fastmapping like approach. Two drivers are required for this. One detects Paris Traceroute runs that are shorter than the original MDA (load balancer detecting) run and the other uses the data from this to initiate a further series of Paris traceroute runs using the flow ID that was used in the short Paris run. Paris traceroute uses consistently the same flow ID within a trace analysis. The first driver has worked correctly and the second is under test.
For the Internet simulator a shortage of virtual memory has been encountered when processing a complete team traceroute data set from Caida. Tony has said it may be possible to require less memory by adjusting the simulator program. The focus of this work is to try and quantify the cost of control packets when using Doubletree, and to compare this with Traceroute.
It was also decided recently to try and simulate Megatree using the Internet simulator. However the amount of data required to do this is much larger than the already too big case above. Alternatively it may be possible simulate Megatree using a variation of my warts analysis programs which operates at the discovered topology level rather than necessarily the packet level. The warts data could be processed in the order it was collected and the Megatree savings could be made using the available data. An approximation of packet usage for a given load balancer will be required as only the grand total of packets used for bringing forward are recorded. Bringing forward is the way that MDA gains access to nested load balancers and finds flow IDs that give access to the successor set of nodes for probing. The actual probing packets are recorded. This approach should require much less computing power and still give detailed information on the performance of Megatree with various factors and levels applied. These include sequential versus parallel probing of destinations and various degrees of this, with and without distributed Megatree, and with and without Megatree.
Spent some time tidying up the code to adjust nameservers for AMP at
runtime, and adding in configuration options to allow them to be set.
While doing this realised that name resolution wasn't neccessarily going
to respect the interface/address bindings set up for the tests, so
looked into ways I could make this happen. The best/easiest way so far
seems to be to create my own sockets for the resolver to use and then
bind them how I like. This appears to work with my testing so far, but
is possibly getting a bit too specific to the internals of the libc
library I'm using.
Also wrote some unit tests around the ICMP test response packet
processing to help make sure that malformed or incorrect packets are
correctly dealt with.
Updated the AMP dataparser in NNTSC to process more messages in a single batch before committing. This should improve speed when working through a large message backlog, as well as save on some I/O time during normal operation. This change required some modification to the way we handle disconnects and other errors, as we now have to re-insert all the previously uncommitted messages so we can't just disconnect and retry the current message.
Tried to bring our database cursor management in line with suggested best practice, i.e. closing cursors whenever we're done with them.
Improved exporting performance by limiting frequency calculations to the first 200 rows and using a RealDictCursor rather than a DictCursor to fetch query results. The RealDictCursor means we don't need to convert results into dictionaries ourselves -- they are already in the right format so we can avoid touching most rows by simply chucking them straight into our result.
Spent some time helping Meena write a script to batch-process her event data. This should allow us to easily repeat her event grouping and significance calculations using various parameters without requiring manual intervention. Found a few bugs along the way which have now been fixed.
Was planning to work the short week between Easter and Anzac day but fell ill with a cold instead.