Continued investigating why traceroute tests were sometimes lingering
when the main amplet2 process was terminated. Eventually discovered that
I wasn't closing some file descriptors after forking, so that the test
children were able to connect to a listening local unix socket that
should have been closed. Despite listening, no running process was
actually expecting this connection, so it stalled waiting for it to be
Also tidied up more of the ASN socket querying code to better detect if
it had closed, and to actually report the error back so that it could be
dealt with in a smarter way, helping prevent the test hanging around in
a bad state.
Had a quick look at the HTTP test after seeing a few unusual results and
found that some software does a poor job of following the standards
(surprise!). Updated the header parser to be slightly smarter and deal
with some different combinations of capital letters, whitespace and
Spent some time working with Brad to get an example amplet machine
running that he can use to work through the upgrade process, bringing
them up to date with Debian.
Continued the painful process of migrating my python prototype for mode detection over to C++ for inclusion in netevmon. Managed to get the embedded R portion working correctly, which should be the trickiest part.
Spent a bit of time with our new libtrace testbed, getting the DAG 7.5G2s configured and capturing correctly. Ran into some problems getting the card to steer packets captured on each interface into separate stream buffers, as the firmware we are currently running doesn't appear to support steering.
Version control was added to the latest configuration of the event based Doubletree simulator. This simulator then had a sources windows mode added. This is when traces are executed in blocks and the up and coming block has control information sent to it just before it starts probing. This creates an economy of probe and control packets. This was achieved in the simulator, as huge savings over the basic mode of operation were achieved.
A bug in the non event based Doubletree simulator was corrected. It is now necessary to rerun these simulations. Information about where the program is up to is now printed out in the std error stream allowing one to know where it is up to. The simulations are taking a bit longer than before but seem to be running correctly.
The PhD conference talk was shortened to only include Megatree, as Doubletree is being rerun, and there is only room for one analysis in the time allocated.
Spent the first couple of days rewriting/restructuring the eventing script since it was a real abomination of a script (atleast the functions had been well documented/named so it was not too painful). Also rewrote the probabilities script so that each time series subtype (e.g. AMP ICMP/rrd Smokeping) would be a separate module and have its own sets of probabilities. This also makes it easier to add new modules later on. Using the AMP-specific probabilties, I re-ran anomalyts using the original series used for the ground truth and got a list of event groups and their significance ratings. Then, I attempted to match the output produced by the eventing script (i.e. event groups and their significance probability) and the original manually classified ground truth. In theory, most of the detectors' behaviour should have been very similar to those found from the ground truth since they are using the exact same latency values, but for some reason there were missing/additional events. This was expected behaviour for the Changepoint/HMM Detectors, but there were some differences with detectors that relied on the Plateau algorithm (Plateau, TEntropy-Stddev, and TEntropy-Meandiff detectors). Spent the remainder of the week comparing events from the two sources and flagging those that needed to be investigated.
Spent some time building new amplet2 Debian packages to make sure that
the build process was up to date with any new dependencies added with
the recent changes. Had to deal with a few packages in Debian Lenny
being well out of date and missing features (though an upgrade is on the
Installed new packages on a test amplet, and configured the schedule
using the web interface. In doing so, found a few test options that
weren't properly hooked up and were setting the wrong values, and that
sites were including themselves in their test schedules.
Accidentally left some firewall rules in place while testing and found
some broken behaviour when parts of tests failed. Watchdog timers
weren't being removed if the test exited badly, which was leading to
extraneous messages reporting tests being killed (when they had already
stopped). Broken connections to the ASN server could also trigger a
SIGPIPE when querying the local cache, which weren't being properly
Spent the latter part of the week reading student honours reports.
I found a bug in the non event based Doubletree simulator. I had noticed that the packet counts for the local stop set only case were lower than expected. After fixing the code I set several reruns going. This will affect the design of my PhD student conference slides depending on when the new results are ready. If it takes too long I can present the Megatree results instead of Doubletree.
I processed some more black hole detector results.
I designed a structural layout for my thesis. This included chapter titles and the problem addressed.
Managed to get my python prototype doing a reasonable job of finding modes in a selection of time series from the current prophet database. Added a new system for determining the 'width' of a detected mode -- wide modes cover a large range of values in the probability density function and so therefore are more likely to indicate a noisy data series. Width is calculated using both the relative standard deviation and the quartile coefficient of dispersion.
Started converting the python prototype into C++ code so it can be incorporated into netevmon.
Spent the remainder of my week reading over Richard and Craig's Honours reports and making plenty of little suggestions as to how to improve the language and make sure the important points come across clearly to the reader.
Started working on using a data fusion method(Dempster-Shafer) with AMP-specific values. The AMP ground truth dataset collected as part of my masters project was used to calculate a set of probabilities for each detector based on the variability of the time series during a particular event group. These probabilities might not be final because of the introduction of a new variability (MultiModal), so i will need to collect more data later on, especially for MM streams. Wrote a script that parsed a time series log file and sent the latency values to anomaly_ts so as to test anomaly_ts against the ground truth data since it is no longer in the active database. I also generated graphs leading up to and 2 hrs after each event in the ground truth to determine if any of those streams were multimodal, but no luck so far. This also means that the probabilities for Noisy and Constant will remain unchanged.
My conference slides were updated to include a many sources to many destinations diagram of the situation that we wish to find out about. Also a stages count blurb was included to describe the stage information in the graphs. In particular, that there is no control information sent on the final stage as it would serve no purpose.
The event based simulator has been updated to use source windows to give it a more realistic approach to the many sources to few destinations scenario. Source windows involve dividing traces into equal sized blocks and only forwarding control information to the next block to be run. The traces in each block or window run simultaneously. Initially the changes did not compile easily, until a few type handling related bugs were ironed out. The first run is underway, so I will soon see if the results are sensible.
Spent some more time checking up on the traceroute test, after merging
all the stopset/ASN changes. Found and fixed a case where ICMP error
codes weren't being properly recorded. Also found and fixed what appears
to be the main cause of the test running too long - some targets will
decrement the TTL before responding with a port unreachable message,
which throws the path length estimate off by one and can cause the same
TTL to be probed multiple times.
Added the ability to signal tests that their time is running out, giving
them an opportunity to report any partial results they have collected
and to gracefully exit before they get killed. This is configurable per
test type, depending on whether or not it is possible to get useful
information without the test entirely completing.
Updated the schedule interface display a bit more information about test
timings, and tidied up some documentation about the new format. Fixed
the raw interface to properly check if-modified-since headers from
amplets requesting new configs, so only new configs are sent.