Brendon Jones's blog
Short week as I was off sick on Monday and Tuesday.
Spent some time looking into using a headless web testing environment
as an alternative to the current HTTP test. This would give us
don't (due to them being generated programmatically or obfuscated). Not
all of the headless testing software appears to give full access to the
events that I'm interested in, while some are written such that they
will be awkward to integrate into an AMP test. Currently looking at
embedded Chromium as most likely to be useful.
Started refactoring some of the configuration parsing code in amplet to
remove some unnecessary globals and remove some cruft from the main loop
that didn't really need to be there.
Updated the website authentication to make it easy to toggle on and off,
as we don't want to protect the public site. Merged this and the rest of
the recent changes (raw data fetching etc) back into the develop branch.
Spent some time looking into what appear to be periodic MTU issues on
one of our test connections that are preventing the throughput test from
running. Confusing matters is that I'm not sure how well the route cache
deals with network namespaces - it sometimes appears as if it is all
shared between all connections, but sometimes it doesn't. It's possible
these symptoms would go away with a newer kernel version (route cache
was removed, better network namespace support).
Fixed the URL parsing to allow partial specification of the desired
data. If the URL is incomplete then the user is returned a list of valid
values for the next portion. Default values are automatically selected
if there is only a single possible value. If the URL is missing all
parameters then the user is presented with documentation giving a basic
overview of the API.
Added some smarts to deal with all the different data columns that the
amp-latency tests can return (icmp, dns, tcpping are all slightly
different). This keeps a consistent order of columns and makes sure that
the labels all line up appropriately with the data.
Updated the way data is fetched to be in a more sensible json format
that can easily be converted to CSV so that both formats can be supported.
Spent some time checking that normal behaviour was not impacted by some
small changes I had made to ampy, and tidying up a couple of places
where changes had accidentally affected graph drawing.
Continued to work on the raw data interface to fetch AMP data through
the website. It took some time to find the appropriate place to deviate
from the normal aggregated fetching used for the graphs, but now with
minimal code changes there is now a path that will follow the full data
fetching used by standalone programs (e.g. netevmon).
Fetching now works for data described by a stream id, following almost
the same path as usual for graphs. To allow some degree of data
exploration and easy generation of URLs it's also important to deal with
data described by the human readable stream properties. I'm currently in
the process of converting a URL with stream properties into a stream id,
and alerting the user to missing properties that are required to define
Updated the HTTP test to not include time spent fetching objects that
eventually timed out, as all that was doing was recording the curl
timeout duration. Instead, we need to report the number of failed
objects, last time an object was successfully/unsuccessfully fetched,
and possibly try to update the timeouts to match those commonly used by
Switched the meaning of "in" and "out" for throughput tests, as
somewhere along the way this got switched. This involved updating
existing data in the database as well as the code that saves the data.
Added a bit more information to log messages to help identify the
specific amplet client that was responsible, as it was becoming
confusing in situations with multiple clients running on the same machine.
Started adding an interface to download raw data from the graph pages.
Partway through it was taking longer than expected, so took a slight
detour and wrote a standalone tool to dump the data from NNTSC.
Thursday was my first day back after my break, so spent some time
catching up on things that had happened while I was away.
Shane and Brad had found some unusual data being reported, so I looked
into that and updated schedules to help solve some of the problems. Also
exposed some more tuning knobs so that we can change inter-packet delay
when sending probes (we were sending too fast in some cases) and merged
in some fixes that Shane had written.
Built some new Debian packages with these changes and pushed them out,
which appears to have immediately improved the quality of the data we
Configured the third throughput test target and updated the test
schedule to properly include all three throughput test targets. Went
through all the results to make sure that all are reporting - found and
fixed a couple where incorrect HTTP targets had been set and redirects
were happening. Double checked that some unusual throughput results were
correct (they appear to be).
Spent some time investigating some connections that appeared to up, but
wouldn't forward my data. The modem seems to think everything is fine
and there is nothing obviously wrong at my end, so I've asked for it to
Short week again with the Easter break. Finished the setup of the second
measurement machine and the central collector/graphing machine. Got the
measurement machine shipped, which should hopefully be physically
installed in a few days.
Added some temporary throughput tests to the main schedule to test
performance across different times of day to each of the sites. The
proper schedule will need to be slightly tweaked to include the extra
throughput targets. So far the test data shows the targets to be
Went to Auckland with Brad to install a new measurement machine. It's
now up and running and performing a subset of tests, but generally looks
to be doing the right thing. Will be watching the data and adding more
tests over the next little while.
Performed some testing on the first throughput targets that we have
available to make sure that they will be fast enough. Two out of the
three so far look good, but the third is probably not well connected
enough to push the amount of data we are expecting.
Started to install a second measurement machine based on the first, as
well as documenting parts of the process that will need to be performed
by remote hands.
Spent most of the week getting the last few things ready to go ahead of
the test deployment next week, including diagrams of how various
components fit together.
Fixed a copy and paste error when comparing schedule items that meant
that there was a small chance of two tests being considered the same
even if they had different end times. Also updated some error handling
branches to properly free some resources that had been forgotten about,
and had a general tidy up based on feedback from a slightly newer gcc.
Generated all the certificates for the test connections, and while doing
so found and fixed a bug that could prevent multiple certificates from
being generated when listed on the command line.
Built new packages with the recent minor updates and deployed them on
one of my test amplets to verify. While watching the results I found
some interesting behaviour with tests to www.amazon.com, where the
object counts are fluctuating between two values. A quick investigation
suggests that it's not caused by any changes to the test, but I haven't
discovered what is actually going on.