Brendon Jones's blog
Spent some time tidying up control messages and configuration when scheduling tests that require cooperation from the server. As part of the previous changes the port number was no longer being sent to tests, which meant it could only operate using the default port - this is now fixed and works for both scheduled and standalone tests. Also fixed up some parameter parsing when running standalone tests where empty parameter lists were not being created properly.
Wrote some basic unit tests for the udpstream test and it's control messages. Fixed a possible memory leak when failing to send udpstream packets. Made sure documentation and protobuf files agreed on default values of test parameters.
Started to install the server-side components of AMP on another machine for a test deployment so that I can use the documentation I write as I go to help build/update the packaging for the most recent versions.
Added a latency measure to the udpstream test by reflecting probe packets at the receiver. The original sender can combine the RTT information with jitter and loss to calculate Mean Opinion Scores, which was slightly annoying as (depending on the test direction) the remote end of the test now has to collate and send back partial result data. Updated the ampsave function to reflect the new data reported by the test.
Updated the display of tcpping test information in the scheduling website to reflect the new packet size options. Worked with Shane to update the lamp deployment to the newest version of all the event detection and web display/management software.
Tidied up some more documentation and sent it to a prospective AMP user. Will hopefully get some feedback next week as they try to install it and I can see which areas of the documentation are still lacking.
Lots of minor fixes this week. Fixed the commands to properly kill the entire process group when stopping the AMP client using the init scripts. Still need a cleaner way to do this as part of the main process. Updated the AMP schedule fetching to follow HTTP redirects, which was required to make it work on the Lightwire deployment. Fixed the tcpping test to properly match response packets when the initial SYN contains payload. Different behaviour was observed in some cases where RSTs would acknowledge a different sequence number compared to a SYN ACK, and only one of these was being checked for.
Updated all the tests to report the DSCP settings that they used. They are not currently saved into the database, but they are being sent to the collector now.
Set the default packet interval of the udpstream test to 20ms, which is closer to VoIP than the global AMP minimum interval that it was using. Also wrote most of the code for the test to calculate Mean Opinion Scores based on the ITU recommendations, just need to add a latency measure to complete the calculation.
Did some reading around calculating mean opinion scores for VoIP and started to add code to the udpstream test to calculate it both the Cisco way and the ITU E-model way. Neither of them explicitly take into account jitter which seems unusual, my best guess so far is that they count jitter as part of the delay. Other models I've found do include jitter as part of the delay calculation.
Spent some time writing more documentation about installing and configuring an amplet client. Install process, configuration options and schedule file options all get a first draft description, hopefully enough to help people install monitors with minimal assistance but I expect they will need to be expanded. Updated example configuration files to agree with the new documentation.
Various small fixes, including updating the standalone icmp and tcpping tests to print human readable icmp errors rather than printing the type and code, and using Python .egg format in the ampsave packages.
Merged my scheduling parts of the website back into the main branch so that others can start using the features I've added.
Worked with Brad to get access to the Netspace amplet and bring it slightly more up to date (site firewalls had been interfering with us making changes until now). That's the last amplet that was reporting to erg now upgraded.
Merged my amplet client control socket changes back into the main branch without too much apparent trouble. Need to do some more testing to make sure everything is sensible. After merging, added options to set DSCP bits for the new udpstream test like all the existing tests had.
Continued working on documentation and tidied up the sample program showing how to read data from a RabbitMQ queue and extract the AMP messages.
Had a quick visit to Lightwire on Thursday which generated some more interesting ideas, especially for me around automation of test target selection. Some of these line up nicely with wishlist items I already had, so hopefully I might be able to find time to work on those features soon.
Short week due to Easter holidays.
Went through the protocol buffer descriptions for all of the AMP test formats and wrote some short documentation about the fields, and some sample code to fetch, unpack and understand test result messages.
While working on the documentation I found a few instances of test options that were inconsistent or didn't quite behave correctly. In particular I made the TCP ping size parameter behave the same as the ICMP one (total packet size rather than payload size), and made sure that setting just the UDP payload size in the DNS test would correctly add an EDNS header.
Updated some of the signals used with the amplet client to provide
better management - as well as being able to reload configuration from
disk, it can now force a refetch of remote schedule files with a SIGUSR2.
Also made sure that all children (tests, servers, etc) have their
signals unblocked and the signal handler restored to the default.
Libwandevent sets all these in the main process, which was being
propagated to the children and causing some unexpected behaviour. The
init scripts now try to kill the entire process group of the amplet
client, which means children should now get the signal too.
Renamed server processes in ps so that it was obvious what task they
Refactored some more of the repeated server code out of the
udpstream/throughput tests so they are now a lot cleaner. Moved some of
the test server control message code around so that it was grouped
together in a sensible place.
Spent some time updating unit tests to work properly with the new
watchdog and control API. Improved checks to make sure that only valid
control messages are being parsed. Other small fixes to make sure that
errors are caught and reported properly.
Started refactoring the test control connections to use an SSL BIO so
that exactly the same code paths can be used to read and write control
messages whether SSL is in use (amplet, standalone tests) or not
(standalone tests), which has removed/simplified a lot of code. Also
figured out how to properly do non-blocking IO when the BIO functions
behave differently to normal read/write.
Went with Shane to visit Lightwire on Thursday and had a discussion
about how we can make event detection, measurements, graphs etc work
better for them.
Finished adding the ability to set DSCP bits for all the amplet tests
individually as well as globally. Slightly tidied up the way the global
options are turned into individual test options now that there are a few
more of them.
Tidied up the management connections to try to reuse the existing SSL
connection that started the server, rather than always expecting a
separate connection (as is sometimes the case when run standalone). As
part of this, added SSL support to the standalone tests, so now they can
be run standalone with/without SSL, or using it to connect to a normal
Reworked the way watchdogs worked to make sure they will properly
monitor new server threads, or remotely scheduled tests. The central
watchdog management has now been replaced by a timer inside each
server/test process that will ensure the test completes on time.
Decided to start doing systemd scripts properly and wrote a service file
for the amplet client. Also slightly tweaked the debhelper scripts that
are run to make sure the client doesn't start without configuration and
end up reporting errors to systemd. Had to officially split the Debian
directories now for Wheezy and Jessie as they are starting to diverge
Started work on adding the ability to set the differentiated services
bits in the IP header for all of the AMP tests. This can be set at a
global level, or on a per test basis. So far only the icmp test will
obey the setting, I'll update the rest of the tests next week.
Spent some time trying to remove an unnecessary extra control connection
for tests involving servers started by a remote amplet client. It looks
like I should be able to reuse the connection used to start the server
as the ongoing control channel, but I'm not quite sure how to make this
work best with standalone tests (that expect the server to already be
running, and don't currently encrypt anything). I should be able to tell
if I have a secure control connection or not and take the appropriate
actions, but a bit more planning is required.