Brendon Jones's blog
Made some changes to the amplet client in response to things I observed
while installing test clients for the Lightwire machine. Changed the log
level of some informational messages to avoid filling logfiles,
rearranged startup to create the pidfile earlier to work better with
puppet and added some more smarts to guessing the ampname when one isn't
supplied. Also rearranged some directory structure to better represent
the python modules involved.
Found and fixed a few bugs in various things on the server side as well.
Values from the new dropdowns weren't being fetched appropriately in
some cases, percentage loss was sometimes calculated incorrectly and
incomplete traceroute paths weren't being stored correctly.
Got the event detection systems up and running on the Lightwire machine,
which was delayed due to issues with embedded R behaving slightly
differently in the Jessie version. Also spent some time with Shane
chasing up some unusual looking events and unusual merging of event groups.
Brad and I finished updating the last of the reachable amplets to Debian
Wheezy, which brings us up to 13 monitors all running the new code now.
Updated the website auth to allow locking down the graphs and the
configuration pages separately. Also added an option to hide/show the
terms of service on the login page. This should hopefully cover the
different public and private deployments we have.
Installed my scheduling branch on the new Lightwire server and started
working through an example client install to make sure that all the
components are operating correctly. Set up a new amplet client
configured to fetch keys, schedules etc from the server, which works
fine. Created meshes and began creating test schedules through the web
interface and fixed some missing parts - matrix metadata tables were not
being updated with mesh/test combinations, target sites weren't being
created in some codepaths. Found some test controls that weren't
properly updating the test schedule modals which I also fixed.
Spent some more time working with Brad updating more amplets to Wheezy.
We've got the bulk of them done now, just a few more of the older (and
significantly slower!) ones to go.
Worked with Brad to update a test amplet and the first 2 production
amplets from Debian Lenny to Wheezy. Everything has gone well so far,
though some of the older machines have a lot of cruft to tidy up (some
have already been through multiple Debian upgrades in the past!).
Hopefully we can get the rest of the machines sorted over the next few
Built new 32bit amplet2 client packages for deployment on the NZ AMP
mesh as the machines are updated. Extracted all the current
configuration from the database on erg to use as a configuration guide
while updating them.
Spent some time getting all the AMP server components
(website/events/storage/etc) installed on the new Lightwire server. This
is the first time that most of these components have been installed in
this configuration and the first time on Jessie, so the process wasn't
particularly smooth. Everything is now installed and running without any
clients, so the next step will be to see if I can configure a new client
using the new web interface.
Brad added a new select dropdown widget that includes filtering of the
option list, and I spent some time adding missing functionality to it.
Keyboard navigation should all work as expected now -
pageup/pagedown/home/end all move around the list, and tab will select
and move on to the next input element. I also integrated this with all
of the dropdowns I've added for scheduling and site management which
involved making sure all the onchange events were properly hooked up and
that they properly followed visibility changes as dynamic forms are updated.
Spent some time tidying up labels, styling, etc on the scheduling pages
to make sure they are consistent with each other, and showing the right
level of information. Found and fixed a few instances where similar
fields were named differently between meshes and sites, leading to
missing data being displayed.
Spent most of my time working on input validation when editing schedules
and sites, and making sure that buttons and fields were enabled
appropriately based on user actions. Also updated some of the templates
to use the longer display names where possible (rather than short
internal names), and link them to the appropriate configuration pages.
Added permissions to the security model to allow separation between
users that are allowed to view the data and those that are allowed to
Confirmed that the data I was getting from the throughput test was the
same from both the 32bit and 64bit amplet clients. Initially they were
reporting quite different data, but after comparing TCP settings
discovered that the 64bit VM had been tuned somewhat - after applying
the same values to the 32bit VM they now agree.
Spent most of the week continuing to work on the test scheduling web
interface. The lists of meshes and sites are now the primary entry
points, and if you click through then you have access to the meta data
about the site/mesh and the specific schedule that applies. These can
all be edited to change the names displayed in the results interface,
and schedules that are updated are made available for amplet clients to
The layout and flow are mostly settled now, though will likely be
updated after more frequent use. I've got the base functionality working
and have started adding some of the nice features that help make sure
the right data gets added, or inform the user what is expected. Slow and
Updated the HTTP test to run correctly with the newer libcurl libraries
on Debian Jessie. As part of that I tidied up the overly complicated
main loop, and fixed a parsing bug when encountering "hreflang"
attributes. Also updated the amplet client build system to be more
explicit about which libraries are included so newer gcc doesn't complain.
Added an alternate path through nntsc to use old-style AMP save
functions for test data that isn't in the new protocol buffer format.
Hopefully these are only temporary, but they will be required for a
while during the transition period as we update all the old clients.
Spent some time comparing data between amplets running on Wheezy and
Jessie, as well as with 32bit Wheezy clients to make sure that all the
data is consistent. Most of the tests look good, except the throughput
data doesn't appear to agree between monitors and I still need to keep
an eye on the changes in the HTTP test to make sure that is fine.
Reworked the layout of the schedule webpages to include more information
about meshes/sites, and link them together for easy navigation.
Spent some time trying to diagnose the cause of incorrectly formatted
data being reported to prophet. All my test Wheezy clients were
reporting data that had correct looking elements, but didn't follow the
protocol. Thinking that it might be a bug in the Wheezy version of a
library, I started testing in Jessie before realising that the wrong
version of the reporting code was being used.
In the process I fixed NNTSC to deal better with malformed protocol
buffer data, and got the amplet client (and most of the tests) working
on Jessie. There are quite a few changes in libraries between Wheezy and
Jessie, in particular libpcap now needs to be forced into immediate mode
if you want to receive packets in a timely fashion. Something has also
changed in the behaviour of libcurl that I am still trying to track down.
Also fixed a possible race condition in the tcpping test that might lose
the response to the first probe, and tidied up some of the logging when
clients shutdown so that it still happens when triggered by the init script.
Tried a couple of different approaches to display test schedules for a
site/mesh and settled on a fairly condensed table. The most useful
information is shown in brief, and you can click on a row to expand it
and see the full listing of test arguments, targets, etc. From here you
can also add new tests to a site/mesh, with nice bootstrap modals and
human readable text hiding most of the raw scheduling options, which is
important if we want people to be able to easily update test schedules.
Spent some time working with Brad on the upgrade process for the amplets
running puppet. As part of that I updated the amplet client package with
logrotate configuration, and updated the init script to wait until it
was sure the client had started so that puppet didn't get overzealous
and try to start multiple copies. Also had to track down a few instances
of unusual behaviour to determine that everything was in fact acting as
Spent some time designing a new test schedule for an upcoming
deployment. Did the maths around how much storage is required per test
at the frequencies I want to measure, so hopefully have a pretty good
handle on the amount of storage required to support the schedule.
Investigated how our current database trimming scripts work, and why
they are so slow to remove data from prophets database - turns out that
there hadn't been an analyze run in quite some time (vacuum/analyze had
been "temporarily" disabled). Tried another look at timestamp
partitioning with a generic trigger (that wouldn't require upkeep) and
got it working, but inserting with a trigger is approximately 10 times
Picked up the work I had been doing previously on a web-based interface
to AMP test scheduling. Got it working again and mostly up to date with
the changes in the develop branch, before spending some time reading up
more on some possible visualisation techniques for schedules. Still not
sure there is a useful visual approach to this, and the right answer may
just be to very simply present the data in some sort of list or table.