Brendon Jones's blog
Built new Debian and Centos packages for the updated libwandevent code,
and used those to build new amplet2 packages for Centos. Debian packages
still need a bit more work to build in my new environment. Deployed a
couple of the new packages to further test some of the new traceroute
reporting for Shane.
Hooked up the rest of the test arguments in the form to schedule a new
test, so they are all now properly added to the database when the form
Filtered the YAML output to only include meshes that are used in the
schedule to reduce file size. Added code to track the time that
schedules were last updated, so that I can return a 304 not modified to
clients that request the YAML when there have been no changes.
Spent Wednesday watching student honours presentations. Well done to our
students who presented.
Fixed the way I build the data for the YAML output so that the emitter
can better tell which parts should be used as aliases/anchors (which
makes groups of test destinations a lot tidier looking).
Added more dynamic content to the schedule pages using data from the
actual metadata/schedule tables rather than hard coding it to test
layout/behaviour. Sources, destinations are all fetched from the
database, and current test schedules are displayed.
Added API functions to insert tests into a schedule, and hooked it up to
the data coming from the schedule modal form. Most of the data for
creating a new test is now understood and inserted into the schedule table.
Built the basic interface to schedule new tests for sites and meshes,
based around the modal system we already have in place for displaying
relevant, it's a pretty simple interface that should let us do what we
want easily. Each test shows only its own options, with sensible
defaults, and only shows as much time scheduling information as needed.
Wrote the database schema to describe test schedules and their
destinations. Started working on the code to put tests into and fetch
them from the database. So far I can successfully fetch the test
configuration for an individual site and display it in a textual form on
the website. The YAML output is mostly working, but needs some changes
to the way I structure the data in order to properly use aliases/anchors.
Changed some of the options in the traceroute test to better match what
Shane is expecting to see when saving the data, and to better specify
what data is present.
Built a sample yaml schedule file and updated the schedule parsing code
to generate test events from the new format. The old approach was very
deterministic, with every field present and in a fixed order, while the
new approach can have fields appearing in any order and makes better use
of default values.
Started exploring how I might construct an interface to easily schedule
tests, and to visualise the tests within a schedule.
Fixed the AS lookups in the traceroute test to ignore RFC1918 addresses.
Wouldn't have been too much of a problem, but the NXDOMAIN responses
were only being cached for 5 minutes and ended up generating too many
extra queries. Also tidied up some checks for stopset membership to use
the TTL to better match, or prevent inserting pointless addresses.
Merged in the timing changes I made the other week into the main branch
so that they can be used.
Had a good meeting with Shane and Brad about where we need to go with
AMP in the next few months. After this, started looking at better ways
to represent and generate test schedule files so they are easier to
understand and edit. Spent some time looking into YAML and how an
example schedule might look, and experimented with the libyaml parser to
see how the data looks.
Tidied up the traceroute stopset code to add addresses in a more
consistent manner regardless of whether an address in the stopset was
found or if the TTL hit 1. This also allowed me to more easily check
that parts of a completed path don't already exist in the stopset (they
might have been added since they were last checked for) to prevent
Added the ability to lookup and report AS numbers for all addresses seen
in the paths (using the Team Cymru data). This currently works for the
standalone test (which doesn't have access to the built-in DNS cache)
but requires some slight modification to run as part of amplet itself.
Added local stop sets to the traceroute test to record paths near to the
source and prevent them from being reprobed for every single
destination. Due to the highly parallel nature of the test this
initially had only a very minimal impact on the number of probes
required. At a suggestion from Shane I began probing destinations using
a smaller, fixed sized window rather than all at once in order to
populate the stopset early on in the test. With the current destination
list, this reduced the number of probes required by about 30% without
any real impact on the duration of the test.
Spent some time confirming that the results were the same as the
original test produced, and that they matched the results of other
traceroute programs. Found some slightly different behaviours where I
was treating certain ICMP error codes incorrectly, which I fixed.
Started to look at doing optional AS lookups for addresses on the path.
Appears the easiest solution is to have the test itself look them up
before returning results. Using something like the Team Cymru IP to AS
mappings (which are available over DNS) is simple and would make good
use of caching to minimise the number of queries.
Finished updating the traceroute test to use libwandevent3 to schedule
packets and track timeouts. The aim was to make each action
self-contained and easily understandable, to aid in adding the extra
complexity of stop sets and AS lookups later on. Modified the probing
algorithm to start partway through the path and probe forward, then
probe backwards from the initial point - we can probe forward into paths
that we likely haven't seen before, and then stop probing on the reverse
when we see familiar addresses.
Spent some more time reading student theses to provide feedback.
Started planning the best way to approach changing the traceroute test
to be faster and more network friendly. Making it more event driven and
sending packets when we know they have left the network should help
speed up the test, rather than probing in waves and having to wait at
each TTL for all responses. Before changing the test in this way it made
sense to move from the deprecated libwandevent2 to libwandevent3, which
I did. I've also made the first few changes in the traceroute test to
use an event based approach.
Read up a bit on doubletree and had a look into how some other
traceroute implementations dealt with it. Will hopefully be able to
apply some of the ideas around stop sets to the updated traceroute test
too. Tidied up a bit more low hanging fruit in the amplet packaging and
Spent some time proofreading reading student theses to provide feedback.
Fixed a crash when changing the name of test processes, where getopt was
being unhappy after having argv changed underneath it, despite being
given a different array to operate on after forking. Logging has also
been made more sensible, with all amp processes using a fixed prefix
rather than using the full process name.
Spent some time comparing results of the new timestamping mechanism
against iputils ping. Timestamps are looking much more stable now in all
situations. There was a consistent small offset between the amp and ping
values, which appears to mostly be due to one timestamping packets
immediately before sending them and the other immediately after.
Changing amp to record timestamps at the same time as ping removes this
offset. Testing between a pair of hosts directly connected at gigabit
gives very similar results for both approaches, with identical quartiles
and only 0.2 microseconds difference in mean.
Tidied up packaging scripts for Debian and Centos, removing some default
configuration files that were being installed but are no longer needed.
Updated Centos init scripts to be more similar to the new Debian ones
that allow multiple clients to be run.