Brendon Jones's blog
Worked with Brad to update a test amplet and the first 2 production
amplets from Debian Lenny to Wheezy. Everything has gone well so far,
though some of the older machines have a lot of cruft to tidy up (some
have already been through multiple Debian upgrades in the past!).
Hopefully we can get the rest of the machines sorted over the next few
Built new 32bit amplet2 client packages for deployment on the NZ AMP
mesh as the machines are updated. Extracted all the current
configuration from the database on erg to use as a configuration guide
while updating them.
Spent some time getting all the AMP server components
(website/events/storage/etc) installed on the new Lightwire server. This
is the first time that most of these components have been installed in
this configuration and the first time on Jessie, so the process wasn't
particularly smooth. Everything is now installed and running without any
clients, so the next step will be to see if I can configure a new client
using the new web interface.
Brad added a new select dropdown widget that includes filtering of the
option list, and I spent some time adding missing functionality to it.
Keyboard navigation should all work as expected now -
pageup/pagedown/home/end all move around the list, and tab will select
and move on to the next input element. I also integrated this with all
of the dropdowns I've added for scheduling and site management which
involved making sure all the onchange events were properly hooked up and
that they properly followed visibility changes as dynamic forms are updated.
Spent some time tidying up labels, styling, etc on the scheduling pages
to make sure they are consistent with each other, and showing the right
level of information. Found and fixed a few instances where similar
fields were named differently between meshes and sites, leading to
missing data being displayed.
Spent most of my time working on input validation when editing schedules
and sites, and making sure that buttons and fields were enabled
appropriately based on user actions. Also updated some of the templates
to use the longer display names where possible (rather than short
internal names), and link them to the appropriate configuration pages.
Added permissions to the security model to allow separation between
users that are allowed to view the data and those that are allowed to
Confirmed that the data I was getting from the throughput test was the
same from both the 32bit and 64bit amplet clients. Initially they were
reporting quite different data, but after comparing TCP settings
discovered that the 64bit VM had been tuned somewhat - after applying
the same values to the 32bit VM they now agree.
Spent most of the week continuing to work on the test scheduling web
interface. The lists of meshes and sites are now the primary entry
points, and if you click through then you have access to the meta data
about the site/mesh and the specific schedule that applies. These can
all be edited to change the names displayed in the results interface,
and schedules that are updated are made available for amplet clients to
The layout and flow are mostly settled now, though will likely be
updated after more frequent use. I've got the base functionality working
and have started adding some of the nice features that help make sure
the right data gets added, or inform the user what is expected. Slow and
Updated the HTTP test to run correctly with the newer libcurl libraries
on Debian Jessie. As part of that I tidied up the overly complicated
main loop, and fixed a parsing bug when encountering "hreflang"
attributes. Also updated the amplet client build system to be more
explicit about which libraries are included so newer gcc doesn't complain.
Added an alternate path through nntsc to use old-style AMP save
functions for test data that isn't in the new protocol buffer format.
Hopefully these are only temporary, but they will be required for a
while during the transition period as we update all the old clients.
Spent some time comparing data between amplets running on Wheezy and
Jessie, as well as with 32bit Wheezy clients to make sure that all the
data is consistent. Most of the tests look good, except the throughput
data doesn't appear to agree between monitors and I still need to keep
an eye on the changes in the HTTP test to make sure that is fine.
Reworked the layout of the schedule webpages to include more information
about meshes/sites, and link them together for easy navigation.
Spent some time trying to diagnose the cause of incorrectly formatted
data being reported to prophet. All my test Wheezy clients were
reporting data that had correct looking elements, but didn't follow the
protocol. Thinking that it might be a bug in the Wheezy version of a
library, I started testing in Jessie before realising that the wrong
version of the reporting code was being used.
In the process I fixed NNTSC to deal better with malformed protocol
buffer data, and got the amplet client (and most of the tests) working
on Jessie. There are quite a few changes in libraries between Wheezy and
Jessie, in particular libpcap now needs to be forced into immediate mode
if you want to receive packets in a timely fashion. Something has also
changed in the behaviour of libcurl that I am still trying to track down.
Also fixed a possible race condition in the tcpping test that might lose
the response to the first probe, and tidied up some of the logging when
clients shutdown so that it still happens when triggered by the init script.
Tried a couple of different approaches to display test schedules for a
site/mesh and settled on a fairly condensed table. The most useful
information is shown in brief, and you can click on a row to expand it
and see the full listing of test arguments, targets, etc. From here you
can also add new tests to a site/mesh, with nice bootstrap modals and
human readable text hiding most of the raw scheduling options, which is
important if we want people to be able to easily update test schedules.
Spent some time working with Brad on the upgrade process for the amplets
running puppet. As part of that I updated the amplet client package with
logrotate configuration, and updated the init script to wait until it
was sure the client had started so that puppet didn't get overzealous
and try to start multiple copies. Also had to track down a few instances
of unusual behaviour to determine that everything was in fact acting as
Spent some time designing a new test schedule for an upcoming
deployment. Did the maths around how much storage is required per test
at the frequencies I want to measure, so hopefully have a pretty good
handle on the amount of storage required to support the schedule.
Investigated how our current database trimming scripts work, and why
they are so slow to remove data from prophets database - turns out that
there hadn't been an analyze run in quite some time (vacuum/analyze had
been "temporarily" disabled). Tried another look at timestamp
partitioning with a generic trigger (that wouldn't require upkeep) and
got it working, but inserting with a trigger is approximately 10 times
Picked up the work I had been doing previously on a web-based interface
to AMP test scheduling. Got it working again and mostly up to date with
the changes in the develop branch, before spending some time reading up
more on some possible visualisation techniques for schedules. Still not
sure there is a useful visual approach to this, and the right answer may
just be to very simply present the data in some sort of list or table.
Brought a lot of amplet documentation up to date, including getopt
strings and man pages that had been lagging behind and did not include
many new command line options. Merged in my recent changes that
restructure the configuration file parsing and tidied a bunch of global
variables that were only required locally.
Spent some time updating the packaging for ampy/nntsc and creating new
packages for some of their new components.
Finished up the simple reporting test for the HTTP test, making sure
that the data going into the protocol buffers agrees with what we are
reading out of it.
Tidied up some amplet makefiles to reuse already built components - it
was often compiling the same code repeatedly when building standalone
tests and unit tests. We can be much more efficient by compiling all the
shared code together once, then compiling and link in the small parts
Installed new amplet packages onto prophet to test reporting using
protocol buffers in a live environment. Found and fixed a couple of
minor issues with the way I was trying to access some protocol buffer
fields, and missing python dependencies. The very first message was
invalid in some way, but after a day spent investigating it and trying
(failing) to narrow down the cause I now have all our test clients
successfully running their full schedule. It hasn't happened again and
I'm hoping that it was just some data left in a queue somewhere or
Started work on some simple unit tests for the HTTP test, which has been
without them until now.
Spent Wednesday at the student honours conference. Was good to see all
our students had worked on their presentations since the practice run,
there were quite a few interesting talks of a high quality.