Brendon Jones's blog
Successfully built Debian packages of the new amplet client and
installed them on a new machine with multiple network interfaces. Spent
some time making sure that all the configuration files ended up in the
right place, and that the init script performed as expected.
Spent a lot of time looking into how well the DNS lookups behaved with
multiple clients running at once, and that they respected interface
bindings when they were set. In general, everything co-existed nicely
and worked but some possible failure modes could bring the whole thing
down. If DNS sockets were reopened due to a query failure then they
would reset to normal behaviour. Started to investigate other approaches
to name resolution - it looks like using libunbound will be the way
forward from here as it also gives us asynchronous queries (synchronous
lookups were becoming time consuming) and caching.
Fixed up the local rabbitmq configuration to properly generate vhosts,
users and shovels that will actually work and allow data to be passed
around. The configuration file needed some tweaking to make this work,
and may need to be reworked in the near future to make it more clear how
the collector needs to be configured.
Created a Debian init script to deal with starting multiple instances of
amplet clients on a single machine, each using different configuration
files. They be started/stopped individually or as a whole. Updated some
of the other Debian packaging scripts to deal with the new config
directory layout. Started testing them to make sure that everything
required is present and still ends up in the right places.
Added options to create a pidfile in the amplet2 client so that it works
better with the new init scripts, and targeting individual instances of
Spent some time looking into test results and performance after seeing
some results from a student testing on a loaded system. Latency
measurements degrade as load increases, but those made by ping remain
quite stable. Briefly tried testing the difference between using
gettimeofday() and SO_TIMESTAMP but found no obvious differences under load.
Updated some configuration in amp-web to allow fully specifying how to
connect to the amp/views/event databases.
Set up some throughput tests to collect data for Shane to test inserting
the data. While doing so I found and fixed some small issues with
schedule parsing (test parameters that included the schedule delimiter
were being truncated) and test establishment (EADDRINUSE wasn't being
picked up in some situations).
Started adding configuration support for running multiple amplet clients
on a single machine. Some schedule configuration can be shared globally
between all clients, but they also need to be able to specify schedules
that belong only to a single client. Nametables, keys, etc also need to
be set up so that each client knows where they are.
Started writing code to configure rabbitmq on a client and isolate our
data from anything else that might already be on that broker (e.g.
another amplet client). Each amplet client should now operate within a
private vhost and no longer require permissions on the default one.
Spent some time tidying up the code to adjust nameservers for AMP at
runtime, and adding in configuration options to allow them to be set.
While doing this realised that name resolution wasn't neccessarily going
to respect the interface/address bindings set up for the tests, so
looked into ways I could make this happen. The best/easiest way so far
seems to be to create my own sockets for the resolver to use and then
bind them how I like. This appears to work with my testing so far, but
is possibly getting a bit too specific to the internals of the libc
library I'm using.
Also wrote some unit tests around the ICMP test response packet
processing to help make sure that malformed or incorrect packets are
correctly dealt with.
Tidied up some arbitrarily sized buffers in the icmp test to be the
actual size required for the data. Accidentally made them too small, so
fixed that and then wrote some more unit tests to cover the
sending/receiving of data and buffer management. Also updated the icmp
test to be able to short circuit the loss wait timeout once all data has
been accounted for - previously it was always waiting a minimum of
200ms, even if all responses had been received.
Spent some time examining query logs from the newly migrated test
database on prophet to see where slowdowns were now occurring. Found and
fixed a simple case where we were over-querying for data, and have a few
ideas for other places to look for more improvements.
Investigated how it might be possible to set DNS servers per process in
order to run multiple amplet clients on the same linux host without
putting them in individual containers. It isn't made obvious in libc how
to do this, but it seems to be possible by modifying some internal
resolver structures. If I set these right, then getaddrinfo() etc will
all work as normal except using the specified name server rather than
whatever is in /etc/resolv.conf. The alternative here seems to be
replacing the name resolution functions with another library or custom code.
Built new CentOS and Debian amplet packages for testing and deployed to
a test machine to check that both old and new versions of the transfer
format could be saved. After a bit of tweaking to the save functions
this looks to work fine.
Tested the full data path from capture to display, which included fixing
the way aggregation of data streams is performed for matrix tooltips.
Everything works well together, except the magic new aggregation
function fails in the case where entire bins are NULL. Will have to
spend some time next week making this work properly.
Wrote some more unit tests for the amplet client testing address
binding, sending data and scheduling tests. While doing so, found what
appears to be a bug in scheduling tests with period end times that were
shorter than hour/day/week.
Updated the throughput test to report data in a manner more consistent
with the other tests, including sending an ampname for the test target.
Added some simple unit tests to the throughput test to check connection
establishment/hello/configuration messages between server and client.
Updated the control socket to properly listen on specific
interfaces/addresses for both IPv4 and IPv6, rather than listening on
all or one single address.
Added long options to all of the standalone tests that didn't have them,
to be consistent with the manpages and other tests.
Fixed the parsing of reported test data to properly null fields that are
undefined because we received no response. This lets the database
insertion code properly record them without a value rather than storing
zero. Also added code to deal with both the old and new protocol
versions so that we can keep data that was reported during the
Wrote a pair of sql aggregate functions to operate on our new data
formats and perform percentile calculations across arrays of values and
single values. These should hopefully be able to replace some of the
more confusing query code with a simple call to the appropriate
Updated the HTTP test to use a particular source address or interface if
specified. Though libcurl has options to set one of these, it doesn't
work well in the case where you need to set both an IPv4 and IPv6 source
address, before knowing what the target name resolves to. Luckily it has
a callback to completely replace the call to socket(), after name
resolution, so I can create my own socket and bind to the appropriate
Had similar problems while updating the throughput test to use a
particular interface - need to listen on both address families and then
once the control connection happens, make sure the test connection is on
the same interface.
Updated the DNS test to query the local servers listed in
/etc/resolv.conf by default if no targets are given. This works fine for
the standalone test, but it's not quite clear the best way to schedule a
test like this when it may get merged with others that do have destinations.
Added a few new unit tests for the DNS test coding/encoding of names and
fixed a few things that unit testing, valgrind and different versions of
Spent some time looking at paris traceroute and the AMP traceroute test.
Turns out that our traceroute already keeps the important header fields
stable during a test run and so behaves like paris. Confirmed this with
the fakeroute tool used to test paris traceroute.
Got the new throughput test running both as a single binary, standalone
test in the style of iperf and as part of the amplet2 server. It no
longer blocks when the remote end doesn't like our SSL credentials as we
no longer expect them to follow the proper shutdown protocol, and I now
correctly check success of SSL reads and writes.
Wrote an initial attempt at a python throughput data extractor to use in
nntsc, but it is currently missing the rest of the chain (dataparser,
database tables, etc).
Spent a bit of time trying to polish small areas of documentation and
unit tests while waiting for throughput tests to run. Made basic
manpages and started work on adding a few more tests to check code
Added the ability to set the source interface/address in AMP so that all
tests will use the specified source. Individual tests can also be
configured in the schedule to use a particular source.
Updated the sample configuration files to include all the new config
options that have been added recently, with a small bit of documentation
about how they all work.
Started working on getting the new throughput test that Richard wrote to
work with the new remote-server code. The throughput server will now be
started when something asks for it on the control connection. I have
some issues around the control connection blocking or closing early,
making it hard to send/receive the listening port number, but quickly
hacking around that makes the test work - just need to find the correct
and tidy solution to this.
Built a new Debian Wheezy image for the emulation network, for John to
use with his virtual machine testing.