Read a paper about T-Entropy and started implementing a detector that uses sliding windows, calculates some statistics, assigns an appropriate character/"class" to each window, which will then be concatenated into a string of characters, which will in turn be used to obtain the average T-Entropy for a sliding window. However, NNTSC/Netevmon was down until Wednesday so I didn't get to test it after that.
The rest of the week was spent taking care of GA duties, marking a ridiculous amount of assignments, updating Moodle grades, yadda yadda. Didn't manage to get any work done on the project, unfortunately.
Next week, I plan on working on the T-Entropy detector some more, especially adding new statistics and trying to figure out a combination of stats that "work".
I've been writing the proposal and thinking about how things will scale.
To poll both ends of a path I can add an extra flow on each switch specifying the ingress switch, so that when a packet leaves the system it is counted with the other packets that entered the fabric at that switch. This will require tables and stacked mpls labels (3 layers of mpls), though it could probably be made to work with two.
This way I can poll both ends of a path, but inbetween I am aggregating paths, since the alternative means I have a number of flows to create paths on each switch that is quadratic in the number of switches in the fabric. This is going to be a complication for accurately locating problems just by polling counters.
The internet simulator has been run using a complete data team from Caida. I was necessary to increase the allowed heap size and the trace list to be carried out was reduced to 100,000.
The next round of scamper data collection on Caida and Planetlab has been initiated.
The modification to the analysis of scamper data has been further updated to count diamonds based on the same load balancing node. There is a certain amount of trial and error involved, so it will take some more time to do this.
Spent some more time looking at the broken NAT in virtual box (it looks
to use lwip) and how it was affecting my traceroute tests. In doing so,
found and fixed a few issues where incorrectly sized packets could be
sent, or correctly sized packets could be sent with incorrect length
fields. After switching from NAT to bridged mode everything seems to
A lot more time than I like was spent trying to differentiate between
problems caused by the NAT and problems with my tests. The ability to
run tests as standalone programs was very helpful for this and made it a
lot easier to pinpoint problems such as the NAT incorrectly sizing
embedded packets in ICMP responses.
Spent most of the week on leave, so not much got done this week.
In the time I was here, I fixed a number of bugs with the auto-scaling summary graph that occurred when there was no data to plot in the detail view.
I implemented yet another new algorithm for trying to determine if a time series is constant or noisy, as the previous one was pretty awful at recognising that the time series had moved from constant to noisy. The new one is better at that, but still appears to have problems for some of our streams -- it now tends to flick between constant and noisy a little too frequently -- so it will be back to the drawing board somewhat on that one.
Continued work on the change-point detection algorithm, optimised memory usage so now only a single vector of probabilities is stored. Testing against data shows that this is too sensitive at the moment.
I got quite busy this week with assignments so I only worked Wednesday.
Again not a lot to report this week as my time was spent on other assignments
Tested out the effect of manual nova compute access policy changes. Have decided on project-role-based access control as opposed to project-based access control in order to minimise changes to the access policy file. Also need to test the ability to custom create user accounts on the admin panel of the dashboard.
Added config options to amp-web for specifying the location of the netevmon and amp meta-data databases. Previously we had assumed these were on the local machine, which proved troublesome when Brad tried to get Cuz running on warlock.
Capped the maximum range of the summary graph to prevent users from zooming out into empty space.
Fixed some byte-ordering bugs in libpacketdump's RadioTap and 802.11 header parsing on big endian architectures.
Got the AMP packaging for CentOS working well enough that I can now
build and install packages that almost work straight out of the box.
Split the package into two parts - one that operates with a local broker
and one without. Merged some of the required changes back into the
Debian versions too.
While testing the packaging on a CentOS virtual machine I found a few
interesting issues. Tests where the targets were resolved at runtime
were being run to two targets, but with the same address - for some
reason identical, duplicate IPv4 addresses were being returned if
getaddrinfo() was given AF_UNSPEC (which I do because I want both IPv4
and IPv6 addresses if available). Also, with the traceroute test I was
seeing some very high latency to some hops, extra hops at the end of my
paths, etc. Some of this was due to late responses arriving and being
treated the same as on time ones, and some of this appears to be related
to possibly broken behaviour in the VM NAT - ICMP TTL expired messages
are being received where the TTL in the embedded packet is too large to
have expired on that hop!
Spent some time trying to figure out exactly what was going on in the VM
case, and how best to make the test robust in these cases.