Brendon Jones's blog
Fixed some minor edge cases that I found while repeatedly killing and
restarting various parts of the amplet client. Some strings describing
tests were invalidated when tests got reloaded, so are now stored. Some
sockets were able to be leaked if the control socket failed to start.
Tidied up the freeing of timers (tests, watchdogs, schedule updates) to
properly remove everything on exit. Also cleaned up the build process
and removed some extraneous flags, fixed some warnings and enabled
silent rules in automake.
Put together a new test schedule that covers more web targets (mostly
top ranked Alexa sites), as well as performing latency measurements to
some of the major gaming infrastructure targets. Took a while to find
some of these as they appear to be mostly undocumented.
Spent some time looking at measuring streaming video from youtube - the
embedded html5 player exposes some information that looks like it could
be used to make measurements. Currently trying to get it working with
the Zombie.js headless browser, but am stalled just before the video is
meant to start playing.
Spent some tracking down scheduling problems, particularly around the
edges of schedule periods. Added the ability to dump the current
schedule on demand and while using that figured out why some tests were
not being rescheduled. For some reason two queries were being made to
fetch timestamps and sometimes they fell either side of the boundary,
which broke the maths and generated a rescheduling time that
libwandevent would reject.
Also tidied up some other small issues that could cause hangs or crashes
- fixed a possible infinite loop in the tcpping test and removed an
assert in the watchdog code that was being hit in an uncommon but
actually legitimate case (if a test ended at roughly the same time as
the watchdog tried to kill it).
Built new packages for the amplet client and pushed them out to the
Updated the certificate packages to properly generate certificates to be
used with apache, and updated the configuration to enable SSL and
actually use them. Built new packages with the changes.
Updated the certificate signing script to also create and set up
rabbitmq users for new amplet clients. Also, made the user be more
explicit when revoking certificates in the case of ambiguity, and
improved checking of user input for host/cert names etc.
Rewrote the logic in the client around requesting/fetching certificates,
to make sure that the timeouts and wait periods apply to the whole
process, not just to the last step of fetching a certificate.
Started work on building Debian packages for the certificate signing
scripts. Spent some time making sure that all the required files were
included, installed at the correct location and with the correct
permissions. Created web configuration scripts to allow the web side of
things to run almost out of the box.
Tidied up the way signing requests were dealt with, to help make sure
that they weren't cluttering things up - checking for the certificate
before sending a (possibly unneccessary) request, deleting them when no
longer required, making sure memory is freed.
Spent Wednesday to Friday in Rotorua at the NZNOG conference.
First week back this week, so spent some time catching up on my notes
about where I left off last year. Began testing the AMP CA
initialisation, key/certificate generation and distribution from start
to finish to make sure the system worked together with the amplet
client. Also made lots of minor fixes, removing extraneous debug
messages, documentation updates, etc.
Some inconsistencies had crept into the directory structure that the
webscripts expected compared with the command line tool, so these all
now share a common configuration space. This also ties in to a new
initialisation command which will set up the directory structure as
I generate my certificates with slightly different options than the
default openssl tools do, which meant that the key portion was in a
different place in the certificate. Instead of blindly trying to load
portions of the certificate as a public key, I now properly parse them
all as ASN.1/DER strings and look for the object identifier tag that
describes and RSA key.
Started work on a program to help manage signing certificate requests
from amplet clients, similar to the puppet CA. Got most of the required
behaviour implemented - listing outstanding requests, signing them, and
revoking signed certificates. There also still needs to be a bit of
thinking done around how the amplet clients deal with revocation (how to
best do OCSP or similar) and how to reissue certificates that have expired.
Kept working through improving the SSL code to exchange keys and operate
the control socket on the amplet client. It now validates the
Diffie-Hellman parameters before using them (aborting if they are not up
to scratch), and I disabled compression to avoid another known attack
vector. Validated that the options I was setting were set, and the
protocols/ciphers I enabled were in fact the only ones being used. Spent
some time refactoring the code to be cleaner and easier to follow and
also added more logging to help make it obvious what was going on at
each step where something could fail.
Found a test that was failing to pass when building packages for 32bit
Debian and spent some time trying to fix it. Some constants I was using
to test edge cases in scheduling were too large and overflowing
variables giving incorrect results. Forcing them all to be the expected
size fixed that and the tests all pass.
Spent some time trying to figure out why my server and client could not
find any ciphers in common that they could use, despite having identical
lists in exactly the same order. Turns out that if you want to use DHE
or ECDHE ciphers there is extra setup required, but very little
documentation pointing this out (the documentation about how to do it is
fine, but nothing makes you aware that you have to do this).
Added code to tests so that they only resolve address families that are
present on available test interfaces. This used to be set with a flag to
getaddrinfo, but we've since moved to using libunbound instead. The old
flag would also consider all interfaces on the host, while the new code
is aware that a test can be bound to a particular interface and only
checks for address families there.
Merged in the scheduling fixes I had been testing, after they had been
running for some time without rescheduling problems. Tidied up a bunch
of compilation warnings, logging, documentation etc too, in preparation
for a new amplet release. Built some new packages and deployed them on a
test amplet to run over the weekend.
Got certificate request verification working, so now signed certificates
will only be sent to those clients that can prove their identity and
their right to retrieve that certificate. Requests are fingerprinted and
saved on the server to await signing.
Had a quick audit of the key/certificate management code with Brad, and
found a few places where security needs to be tightened. In particular,
careful attention needs to be paid to making sure that insecure ciphers
can't be used. Because we have full control over all the client and
server software, we can limit all communication to using only TLS >= 1.2
with a small set of strong ciphers. Spent some time investigating
exactly which ciphers we want to use, and how to enable them.
Also spent a lot of time reading the OpenSSL wiki and various
patches/changelogs to determine when certain fixes were made. The Debian
packages are fairly up to date, but are still missing some well known
changes that are needed to improve security, so I implemented them
within the AMP environment.
Spent most of the week working on the key distribution for the amplet
clients. The server will now save certificate signing requests as well
as offer signed certificates (if present) to clients that can prove that
the certificate is theirs. Everything looks to be in order here
(certificate requests, signatures, etc all look correct) except that I
can't get the final step to verify from within code, despite working
fine using command line tools.
Also spent some more time looking into the problem of tests being run
slightly early and then being rescheduled almost immediately. The fix I
tested over the weekend didn't work, so had to try a few more things.
Each test info block now also contains the wall-clock time that the test
was intended to run, and tests that are triggered too soon can be safely
rescheduled at the correct time. I've built a new client to test over
the week, so far it looks promising.