User login

Brendon Jones's blog

14

Apr

2014

Tidied up some arbitrarily sized buffers in the icmp test to be the
actual size required for the data. Accidentally made them too small, so
fixed that and then wrote some more unit tests to cover the
sending/receiving of data and buffer management. Also updated the icmp
test to be able to short circuit the loss wait timeout once all data has
been accounted for - previously it was always waiting a minimum of
200ms, even if all responses had been received.

Spent some time examining query logs from the newly migrated test
database on prophet to see where slowdowns were now occurring. Found and
fixed a simple case where we were over-querying for data, and have a few
ideas for other places to look for more improvements.

Investigated how it might be possible to set DNS servers per process in
order to run multiple amplet clients on the same linux host without
putting them in individual containers. It isn't made obvious in libc how
to do this, but it seems to be possible by modifying some internal
resolver structures. If I set these right, then getaddrinfo() etc will
all work as normal except using the specified name server rather than
whatever is in /etc/resolv.conf. The alternative here seems to be
replacing the name resolution functions with another library or custom code.

07

Apr

2014

Built new CentOS and Debian amplet packages for testing and deployed to
a test machine to check that both old and new versions of the transfer
format could be saved. After a bit of tweaking to the save functions
this looks to work fine.

Tested the full data path from capture to display, which included fixing
the way aggregation of data streams is performed for matrix tooltips.
Everything works well together, except the magic new aggregation
function fails in the case where entire bins are NULL. Will have to
spend some time next week making this work properly.

Wrote some more unit tests for the amplet client testing address
binding, sending data and scheduling tests. While doing so, found what
appears to be a bug in scheduling tests with period end times that were
shorter than hour/day/week.

31

Mar

2014

Updated the throughput test to report data in a manner more consistent
with the other tests, including sending an ampname for the test target.
Added some simple unit tests to the throughput test to check connection
establishment/hello/configuration messages between server and client.

Updated the control socket to properly listen on specific
interfaces/addresses for both IPv4 and IPv6, rather than listening on
all or one single address.

Added long options to all of the standalone tests that didn't have them,
to be consistent with the manpages and other tests.

Fixed the parsing of reported test data to properly null fields that are
undefined because we received no response. This lets the database
insertion code properly record them without a value rather than storing
zero. Also added code to deal with both the old and new protocol
versions so that we can keep data that was reported during the
database/nntsc upgrade.

Wrote a pair of sql aggregate functions to operate on our new data
formats and perform percentile calculations across arrays of values and
single values. These should hopefully be able to replace some of the
more confusing query code with a simple call to the appropriate
aggregate function.

24

Mar

2014

Updated the HTTP test to use a particular source address or interface if
specified. Though libcurl has options to set one of these, it doesn't
work well in the case where you need to set both an IPv4 and IPv6 source
address, before knowing what the target name resolves to. Luckily it has
a callback to completely replace the call to socket(), after name
resolution, so I can create my own socket and bind to the appropriate
source address.

Had similar problems while updating the throughput test to use a
particular interface - need to listen on both address families and then
once the control connection happens, make sure the test connection is on
the same interface.

Updated the DNS test to query the local servers listed in
/etc/resolv.conf by default if no targets are given. This works fine for
the standalone test, but it's not quite clear the best way to schedule a
test like this when it may get merged with others that do have destinations.

Added a few new unit tests for the DNS test coding/encoding of names and
fixed a few things that unit testing, valgrind and different versions of
gcc noticed.

Spent some time looking at paris traceroute and the AMP traceroute test.
Turns out that our traceroute already keeps the important header fields
stable during a test run and so behaves like paris. Confirmed this with
the fakeroute tool used to test paris traceroute.

18

Mar

2014

Got the new throughput test running both as a single binary, standalone
test in the style of iperf and as part of the amplet2 server. It no
longer blocks when the remote end doesn't like our SSL credentials as we
no longer expect them to follow the proper shutdown protocol, and I now
correctly check success of SSL reads and writes.

Wrote an initial attempt at a python throughput data extractor to use in
nntsc, but it is currently missing the rest of the chain (dataparser,
database tables, etc).

Spent a bit of time trying to polish small areas of documentation and
unit tests while waiting for throughput tests to run. Made basic
manpages and started work on adding a few more tests to check code
behaviour.

10

Mar

2014

Added the ability to set the source interface/address in AMP so that all
tests will use the specified source. Individual tests can also be
configured in the schedule to use a particular source.

Updated the sample configuration files to include all the new config
options that have been added recently, with a small bit of documentation
about how they all work.

Started working on getting the new throughput test that Richard wrote to
work with the new remote-server code. The throughput server will now be
started when something asks for it on the control connection. I have
some issues around the control connection blocking or closing early,
making it hard to send/receive the listening port number, but quickly
hacking around that makes the test work - just need to find the correct
and tidy solution to this.

Built a new Debian Wheezy image for the emulation network, for John to
use with his virtual machine testing.

03

Mar

2014

Spent some time fighting with automake to get the right build options to
be passed down to all the makefiles to allow me to build standalone
tests in AMP using remote test servers. Tidied up various loose ends
when starting remote servers and wrote a very simple example test to
show how starting them could work. It doesn't do much except print out
port numbers, but it calls the new functions and deals with the results.

Got regular schedule fetching happening in a separate thread so that the
test schedule can be kept up to date without requiring puppet (or
similar). If there is a change then the main loop gets signaled to load
the new schedule.

Brad put together a machine for testing databases and filled a large
postgresql database with data similar to AMP. Spent some time with him
looking at query performance and seeing what we can do to improve it.

24

Feb

2014

Added SSL support to the amplet client for querying a remote server to
fetch schedule files. This should give us the ability to have clients we
don't really control stay up to date with test schedules, but needs a
bit more thought put into how often it should run and how it should
interact with the main schedule process.

Added a control server to the amplet client that will accept connections
from other clients that require specific test servers to be run (e.g.
throughput tests), and run them. Currently it accepts the id of the test
to be run and returns a port number that the new server is running on so
that the test knows where to connect to. Wrapped all this up in SSL as
well, validating both the certificate and the hostname/commonname, but
not yet checking revocation of certs.

17

Feb

2014

Spent some time working on things to help keep the amplet code clean and
tidy. Added stricter compilation options and fixed up some cases where
these triggered warnings. Started working on unit tests for amplet based
on the built in automake target "check". Wrote very simple unit tests
for the icmp and traceroute tests as well as the nametable management.
While writing the nametable unit tests I found and fixed a bug that
would limit the nametable to only a single item.

Briefly had a look at different database options available to us that
might perform better with our data than postgres. There are still
further optimisations we can make to how we store our data in postgres,
but it will be interesting to see how they compare to something like
cassandra, HBase or riak.

10

Feb

2014

Tidied up the reporting done on the icmp, traceroute and dns tests in
AMP to use variable length strings for names, as well as properly
packing and byteswapping the reporting structures. The average report
message size should now be much smaller than it was. Also updated the
nntsc plugins for amp data to deal with the new format.

Tweaked the parser for the http test to better ignore strings that look
a little like urls within javascript blocks. It will still fetch
javascript files that are sourced, but won't try to include urls that
are generated on the fly within the <script> block.