User login

Brendon Jones's blog

08

Oct

2014

Finished moving all the standalone traceroute ASN fetching from DNS to
the TCP bulk interface. Decided to reuse the trie datastructure to make
an actual unique set of addresses to query (rather than the previous
simple system that just looked at nearby ones), minimising the data
needing to be sent/received. Fixed a few bugs in the buffer management
that meant new ASN data was possibly clobbering the last unprocessed
portion from a previous read. Merged all these changes and they should
now be running on atest amplet deployment.

Fixed up some bugs in the new schedule parsing code that didn't work
properly when the test type was not specified. Most other settings were
optional and had sensible default values, but it wasn't expected that
the most important option would be missing from (usually generated)
files. Schedule items without a test type are now properly ignored. Also
merged all these changes which are now running on a test deployment.

Added parameters for the throughput and HTTP tests to the scheduling web
interface. Slightly modified the throughput test options to make it much
easier to schedule the sorts of tests that it is commonly used for. Also
updated the HTTP test to follow 3XX redirects and to record that they
happened (with timings, sizes etc for both the redirect and the followup
request).

30

Sep

2014

Turned a lot of the scheduling web interface code into templates that
can be reused between creating and updating tests. They were similar
enough that most of it can be reused, with only a few minor changes
specific to each view.

Fixed up some small bugs in the ASN query code to make sure that all
addresses in the path are fetched (paths shorter than the initial TTL
weren't querying for the ASN of the final hop). The cache will now be
cleared regularly during operation and will also tidy up properly after
itself on program end. Started work on replacing the ASN fetching using
DNS with the TCP bulk whois for the standalone traceroute tests too.

Spent some time applying patches and building old bash from source to
update the old amplets against the new bash vulnerability. These
machines are really due for a software refresh!

24

Sep

2014

Spent some time setting up a properly scheduled throughput test between
machines in the real world. While doing so, found out a few things about
certificate management that may not have been full thought through yet.
The certificates used for connections to the control socket (for
starting the remote end of the test) are currently only configured to be
clients, they can't act as a server without an extra setting being
enabled. Also, the server currently tries to validate client hostnames,
which relies on reverse DNS and won't be effective in most real world cases.

Added caching to the ASN lookups that use the bulk TCP interface, using
a data structure that looks similar to a radix trie. Looks to work well
and fast. May also try to use this in the temporary test processes too,
to store addresses and ASNs while they get applied to a particular set
of traceroute data (it would more more easily remove duplicates from the
query).

17

Sep

2014

Found and fixed a few bugs in the traceroute test now that there is a
small test deployment testing to real locations. IPv4 paths shorter than
the initial probe TTL of 6 were exhibiting inconsistent behaviour with
the value of the TTL in the packet embedded in the response. I now use
the response packet itself to calculate path length. Also fixed a bug
where multiple ASNs being returned in a single result were being parsed
incorrectly.

Started working on fetching ASNs in a single bulk TCP connection rather
than using the DNS infrastructure, as requested by Team Cymru (whose
data we are using). Fetching all appears to work fine but there is
currently no caching, it will generate queries for all addresses everytime.

Updated the scheduling interface to allow scheduling/viewing tests for
meshes as well. All mesh tests are treated pretty much the same as those
for individual sites, and are merged when generating yaml configs for
sites. Tidied up some of the schedule display to hide headings and
sections that are empty or not currently relevant.

11

Sep

2014

Continued to work on the interface for scheduling tests. As well as
adding new tests to a site, you can now modify an existing test. Full
details on a specific test can be viewed in a modal window very similar
to that used to create the test, and options/scheduling can be modified
there. Extra destinations can be added and existing destinations can be
removed, and the test itself can be completely deleted.

Added backend support to deal with all the above - including
adding/deleting test destinations, deleting tests, modifying test
arguments, modifying test schedules.

Spent some time merging the javascript for adding and modifying tests -
it was very similar and didn't warrant being entirely separate, but had
diverged enough to be annoying. The templates for the modals will need a
similar job done on them.

03

Sep

2014

Built new Debian and Centos packages for the updated libwandevent code,
and used those to build new amplet2 packages for Centos. Debian packages
still need a bit more work to build in my new environment. Deployed a
couple of the new packages to further test some of the new traceroute
reporting for Shane.

Hooked up the rest of the test arguments in the form to schedule a new
test, so they are all now properly added to the database when the form
is submitted.

Filtered the YAML output to only include meshes that are used in the
schedule to reduce file size. Added code to track the time that
schedules were last updated, so that I can return a 304 not modified to
clients that request the YAML when there have been no changes.

Spent Wednesday watching student honours presentations. Well done to our
students who presented.

25

Aug

2014

Fixed the way I build the data for the YAML output so that the emitter
can better tell which parts should be used as aliases/anchors (which
makes groups of test destinations a lot tidier looking).

Added more dynamic content to the schedule pages using data from the
actual metadata/schedule tables rather than hard coding it to test
layout/behaviour. Sources, destinations are all fetched from the
database, and current test schedules are displayed.

Added API functions to insert tests into a schedule, and hooked it up to
the data coming from the schedule modal form. Most of the data for
creating a new test is now understood and inserted into the schedule table.

20

Aug

2014

Built the basic interface to schedule new tests for sites and meshes,
based around the modal system we already have in place for displaying
results. With a little bit of javascript to hide options that aren't
relevant, it's a pretty simple interface that should let us do what we
want easily. Each test shows only its own options, with sensible
defaults, and only shows as much time scheduling information as needed.

Wrote the database schema to describe test schedules and their
destinations. Started working on the code to put tests into and fetch
them from the database. So far I can successfully fetch the test
configuration for an individual site and display it in a textual form on
the website. The YAML output is mostly working, but needs some changes
to the way I structure the data in order to properly use aliases/anchors.

13

Aug

2014

Changed some of the options in the traceroute test to better match what
Shane is expecting to see when saving the data, and to better specify
what data is present.

Built a sample yaml schedule file and updated the schedule parsing code
to generate test events from the new format. The old approach was very
deterministic, with every field present and in a fixed order, while the
new approach can have fields appearing in any order and makes better use
of default values.

Started exploring how I might construct an interface to easily schedule
tests, and to visualise the tests within a schedule.

06

Aug

2014

Fixed the AS lookups in the traceroute test to ignore RFC1918 addresses.
Wouldn't have been too much of a problem, but the NXDOMAIN responses
were only being cached for 5 minutes and ended up generating too many
extra queries. Also tidied up some checks for stopset membership to use
the TTL to better match, or prevent inserting pointless addresses.

Merged in the timing changes I made the other week into the main branch
so that they can be used.

Had a good meeting with Shane and Brad about where we need to go with
AMP in the next few months. After this, started looking at better ways
to represent and generate test schedule files so they are easier to
understand and edit. Spent some time looking into YAML and how an
example schedule might look, and experimented with the libyaml parser to
see how the data looks.