Fixed the URL parsing to allow partial specification of the desired
data. If the URL is incomplete then the user is returned a list of valid
values for the next portion. Default values are automatically selected
if there is only a single possible value. If the URL is missing all
parameters then the user is presented with documentation giving a basic
overview of the API.
Added some smarts to deal with all the different data columns that the
amp-latency tests can return (icmp, dns, tcpping are all slightly
different). This keeps a consistent order of columns and makes sure that
the labels all line up appropriately with the data.
Updated the way data is fetched to be in a more sensible json format
that can easily be converted to CSV so that both formats can be supported.
Spent some time checking that normal behaviour was not impacted by some
small changes I had made to ampy, and tidying up a couple of places
where changes had accidentally affected graph drawing.
Fixed my remaining issues with threaded anomaly_ts. Had a few problems where a call to sscanf was interfering with some strtok_r calls I was making, but once I replace the sscanf with some manual string parsing everything worked again.
Continued looking into my NNTSC live queue delays. Narrowed the problem down to there being a time delay between publishing a message to the live rabbit queue and the message actually appearing in the queue (thanks to the firehose feature in rabbitmq!). After doing a fair bit of reading and experimenting, I theorised that the cause was the live queue being 'durable'. Even though the published messages themselves are not marked as persistent, publishing to a durable queue seems to require touching disk which can be slow on a resource-constrained machine like prophet. Removed the durable flag from the live queue and managed to run successfully over the long weekend without ever falling behind.
Migrated all netevmon configuration to use a single YAML config file for all three components. Previously, each component supported a series of getopt command line arguments which was a bit unwieldy.
Continued to work on the raw data interface to fetch AMP data through
the website. It took some time to find the appropriate place to deviate
from the normal aggregated fetching used for the graphs, but now with
minimal code changes there is now a path that will follow the full data
fetching used by standalone programs (e.g. netevmon).
Fetching now works for data described by a stream id, following almost
the same path as usual for graphs. To allow some degree of data
exploration and easy generation of URLs it's also important to deal with
data described by the human readable stream properties. I'm currently in
the process of converting a URL with stream properties into a stream id,
and alerting the user to missing properties that are required to define
To get my application using an open-source collector instead of nProbe, I had the idea of writing a Python program to parse the output files by nfcapd. nfcapd has the option to call a program when a new file becomes available, so I call nfdump and read all flows from the new file and output them as comma separated strings. I then pipe this output to my Python program so save to a database. This solution will also let me control the size of the database as well as deleting flows that are older than a certain date.
This week Brad sliced of a portion of the pronto for me to experiment with, an OVS bridge with some ports attached. The processing resources on the pronto are still shared with the production RouteFlow and Valve instances.
I encountered a few minor issues with rule priorities and VLANs and some bugs with my addition to RouteFlow.
From preliminary testing it seems that the PACKET-IN performance of the pronto is very low. This seems like it is related to exhausting the CPU time on the switch.
I also started looking to see if I could find any other research into this, so far research seems primarily to be targeting the performance of modifying OpenFlow rules.
Finished off making word count program that outputs comma seperated format: word, occurances, frequency. To help give a better understanding of the document's that are being worked with. Next I think it will be useful to get the counts stats on how many words occur only once, how many occur > 10 etc.
On Friday 22/5 I had a meeting with Bob Durrant from the Statistics department to see get his opinion on possible approaches etc. I now have a better idea of what I'm going to do next firstly ignoring Topic Modeling for now and looking at Clustering, specifically to start with only the bearwall firewall logs to get started then look into comparing multiple types of log files later.
Firstly I will start with a simple version of K-Means and add functionality onto it as I go along e.g. seeing if there is more meaning in an IP address by separating the network and host portions and many other possibilities. Will also need to look into the languages and libraries I have found for this type of work to decide what I will be using.
Also I have been working on my presentation that I will have on Wednesday 27.
Carried out validation on the trace based simulators and the IS0 simulator. The parts completed have been written into my thesis.
The validation programs for IS0 needed to be updated because of the extra terms that I have included in the analysis. These are sources windows and maximum destinations per AS. Sources windows are groupings of source addresses which are analysed at the same time, before analysing the next group. Limiting the number of destinations per AS makes the analysis a manageable size and maximises spread of analysis across the Internet.
The current state of 802.15.4 in the linux kernel has support for basic functionality only, send / receive (no scan or associate) and only SoftMAC drivers are available at this stage.
Moving away from low level implementation (drivers) and more toward the higher level interaction (but not quite application layer).
Will be setting up the IPv6 gateway using Radvd for addressing clients (assuming the nodes / gateway are already setup on the same channel / pan id as association is not functional yet).
Another possible solution for automatic addressing would be to use DHCPv6 (IPv6 version of DHCP) but this would require a dhcp server on the gateway and knowledge of address blocks / range.
Started testing my new parallel anomaly_ts code. The main hiccup was that embedded R is not thread-safe, so I've had to wrap any calls out to R with a mutex. This creates a bit of a bottleneck in the parallel system so we may need to revisit writing our own implementation of the complex math that I've been fobbing off to R. After fixing that, latency time series seem to work fairly well in parallel but AS traceroute series definitely do not so I'll be looking into that some more next week.
This week saw a good step in progress. I can connect a host, it achieve a DHCP lease and it can talk to a router. Beyond this, it should work, just need to get a VM running that has access to the interwebs, or run a webserver.
I will probably look at expanding my test environment to multiple hosts connecting to the HOP on different VLANs so I can start emulating a network as similar to a real ISP network as possible in terms of customer connectivity to the internet.
I have a few assignments due over the next couple of weeks, as well as the interim report so I'll probably focus on those until the end of the semester, or at least til study week when they're all due.