Brendon Jones's blog
Found and fixed a few bugs that had been reported once installing the new amplet2 client onto the first batch of live machines. Differences in the behaviour of libcurl meant that URL fragments were being treated differently depending on what version we were running, so I updated the parser to strip all fragments before fetching. Also updated URL splitting so that those with query parameters but no slash after the hostname were properly split on the '?' rather than treating the whole thing as a hostname. Fixed some timing issues where 32-bit clients were overflowing the next timeout value, so made sure that it was correctly treated as a 64-bit value.
Spent a bit of time trying to improve the amplet2 client configuration file. Updated the default value of "vialocal" to check if there is a rabbitmq-server, to try to just do the right thing rather than requiring it be manually set. Started trying to split off the SSL settings away from the collector settings, as they are used for a lot more than that and it was causing confusion having them in the same section.
Had a look at doing work during the Debian postinst stage, conditional on the version numbers of the package being updated to and from. Started using this with the new ampy packages to make sure that the database constraints are updated appropriately, and have other ideas for upgrade paths where it will be useful too.
Spent some time working on the test scheduling web interface to try to make the flow clearer for tests that don't require a hostname as a target (i.e. the HTTP test). Tests that don't require a hostname target will now hide that option and display the other options required. Still need to do some work around creating meshes of non-hostname targets.
Updated some more documentation to fill in gaps or bring it up to date after making changes. Based on feedback I changed some default values for configuration options, making some of them compulsory rather than defaulting to values that were not particularly useful.
Made various other smaller fixes - the traceroute test now uses the same library functions to receive and timestamp results as the other tests do, python ampsave code was tidied up slightly and made more consistent, CentOS packaging scripts were updated to build new packages, etc.
Finally got the amplet2 code up on GitHub.
Found and fixed a few small bugs that had shown up in my recent testing packages, such as one where the DNS loss timeout was set too large (and causing the test to be killed if any of the targets failed to respond). Also spent some time looking at the ASN lookup errors in my logs and added further logging around that to try to track down what was causing them - so far it appears to be the server having issues, not us.
Created a github repository for the amplet2 code and started adding documentation. Cleaned up the source to the manpages so that I can generate nicer markdown from them to put in the wiki. The code should hopefully be added in the very near future too, just waiting to test a few things in the latest client so I can make a new release at the same time. Made a few last minute tidy-ups to the repository ahead of this, making sure no autogenerated files remain and that I've removed unrelated content (e.g. libwandevent and librabbitmq CentOS spec files).
Updated CentOS spec files for the amplet2 client and new librabbitmq. Built and deployed new CentOS and Debian Wheezy packages onto a couple of test clients to test over the weekend.
Found and fixed a bug in the test scheduling while dealing with some user queries around test scheduling. The default value for test frequency (used if not explicitly specified) had the wrong units and so caused tests to be scheduled more frequently than intended. Also fixed a couple of places where wrapping could occur, and wrote some unit tests to cover those code paths.
Removed an unused option and related code paths from the traceroute test that weren't adding a lot except some hideous looking code. While looking at this I tightened up the timers being used for sending/timing out packets to make sure that timeouts were correctly based on the oldest outstanding probe, and that probes were being sent as close to the desired rate as possible. Also added some knobs for changing how many targets are probed at once, and now randomise the initial TTL probed to try not to hammer any nearby hops quite so hard.
Removed some old and unused files from the amplet2 repository. Updated more documentation to be accurate with the current state of things.
Fixed up another couple of minor issues that had been reported. Fixed the loss timer in the tcpping test to start after the last packet is sent so that long interpacket delays are possible if desired. Tidied up a regex to match certificate filenames more accurately. Made more documentation updates. Tried to improve packaging to make sure that default configuration files were as usable as possible without manual edits.
Built new packages and pushed them out to one of our test deployments. Worked through a few issues with getting the HTTP tests and target meshes lined up properly so that they display. Still need to figure out the correct way to fix this so that users don't need to worry about this special case.
Spent some time reworking my Debian package build system after accidentally building with incorrect source due to some release candidate versioning suffixes being missed. The new system will also better deal with release specific Debian directories.
Made lots of small changes based on things that had been reported by users or that I had noticed behaving incorrectly in the last week. Fixed a cap on a retry timer that was alternating between two different values. Updated the HTTP test to always store the full URL including scheme, even if the user didn't explicitly specify it. Updated some error messages to try to be more useful and accurate.
Fixed the apache2 configuration in the amppki packages to work properly once everything is installed properly in the correct system locations. There were issues around the python path being incorrect and not able to find the libraries, as well as naming collisions with the ampweb WSGI processes.
Spent some time with Shane trying to track down the cause of some missing data in the web graphs. Found the cause of the missing DNS data (wrong column names being used) and why some sites didn't have path length data available (it's only sourced from one style of traceroute test).
Tried to expose through the web interface the ability to force the address family to use when resolving test targets. This was a bit more complicated than expected, due to new targets getting automatically added to the database and it including the various suffixes used internally to represent address families.
Put together some new server packages to test the new changes and started working through verifying that they worked, ahead of another release.
Spent some time running the amplet2 client in a few different ways on my test machines, with all the tests running to different sorts of targets. Found a few more slight issues with tests that I investigated and fixed, including one where a test server would be listening on all interfaces and addresses even if the client that spawned it was bound to something specific.
Tidied up some memory management issues as reported by valgrind. Freeing the easy and obvious allocations before exiting makes the actual leaks or errors a lot easier to find. Also did some general tidying of data structures, removing some duplication that is no longer required and putting some limits on others.
Built some new server packages with updates from the last few weeks and installed them on one of our test deployments. Ran into some issues with dependencies not being correct, as ubuntu/wheezy don't have them all packaged.
Backported the current librabbitmq-c packages from stretch to use with our wheezy and jessie amplet2 packages, to replace the ones previously patched to include external authentication support. Fixed a few uses of deprecated/changed functions with the new version and deployed them on some test sides to check that they still worked as expected.
Found and fixed a few edge cases where failure to connect to remote test servers was causing crashes rather than gracefully exiting. Tidied up a bit more documentation, compiler warnings. Had another look at boot dependency ordering, this time around upstart. Couldn't find an easy way to make my init scripts work nicely, but forcibly delaying the start of my init scripts when run under upstart will do the trick until we move on to systemd.
Exposed configuration options for the udpstream test in the web interface so that it can now be scheduled. Made a few other small bug fixes here as well, including updating install documentation/scripts and allowing IPv6 addresses as valid names.
Investigated boot ordering in sysvinit and systemd to try to fix a problem observed in one deployment where sometimes the amplet client is starting too soon and not having access to dns (and occasionally rabbitmq). Attempted some fixes and built new packages, but have yet to hear if the changes made any improvement.
Rewrote the icmp and dns tests to use libwandevent to manage sending and receiving probe packets, which brings them into line with many of the other tests and removes some code that may have had uncertain licensing. Should be able to factor out a lot more of the similarities between these simple tests (icmp, dns, tcpping) at a later date.
Lots of minor tidy ups, quietened some log messages that weren't very relevant. Updated sample configuration files, code documentation and some licensing.
Continued tidying up parts of the amplet2-client code that I had been meaning to look at for a while, but not had the time. Reworked some of the server handling to need less information passed around to configure interfaces, addresses etc, as this was already available in other ways. Made the way that the tests use this information a lot more consistent. Split some of the server connecting code into SSL vs non-SSL sections so that they could be more easily reused rather than duplicating a lot of work each time.
Spent too much time trying to determine why some very simple code was giving incorrect results when I removed debugging output. Compiling without optimisations would also fix it. Ended up changing the way I called it to stop the compiler optimising it out.
Investigated and replaced a few small sections of code that had come from various sources, with implementations using more compatible licenses.