Brendon Jones's blog
Updated the certificate signing script to also create and set up
rabbitmq users for new amplet clients. Also, made the user be more
explicit when revoking certificates in the case of ambiguity, and
improved checking of user input for host/cert names etc.
Rewrote the logic in the client around requesting/fetching certificates,
to make sure that the timeouts and wait periods apply to the whole
process, not just to the last step of fetching a certificate.
Started work on building Debian packages for the certificate signing
scripts. Spent some time making sure that all the required files were
included, installed at the correct location and with the correct
permissions. Created web configuration scripts to allow the web side of
things to run almost out of the box.
Tidied up the way signing requests were dealt with, to help make sure
that they weren't cluttering things up - checking for the certificate
before sending a (possibly unneccessary) request, deleting them when no
longer required, making sure memory is freed.
Spent Wednesday to Friday in Rotorua at the NZNOG conference.
First week back this week, so spent some time catching up on my notes
about where I left off last year. Began testing the AMP CA
initialisation, key/certificate generation and distribution from start
to finish to make sure the system worked together with the amplet
client. Also made lots of minor fixes, removing extraneous debug
messages, documentation updates, etc.
Some inconsistencies had crept into the directory structure that the
webscripts expected compared with the command line tool, so these all
now share a common configuration space. This also ties in to a new
initialisation command which will set up the directory structure as
I generate my certificates with slightly different options than the
default openssl tools do, which meant that the key portion was in a
different place in the certificate. Instead of blindly trying to load
portions of the certificate as a public key, I now properly parse them
all as ASN.1/DER strings and look for the object identifier tag that
describes and RSA key.
Started work on a program to help manage signing certificate requests
from amplet clients, similar to the puppet CA. Got most of the required
behaviour implemented - listing outstanding requests, signing them, and
revoking signed certificates. There also still needs to be a bit of
thinking done around how the amplet clients deal with revocation (how to
best do OCSP or similar) and how to reissue certificates that have expired.
Kept working through improving the SSL code to exchange keys and operate
the control socket on the amplet client. It now validates the
Diffie-Hellman parameters before using them (aborting if they are not up
to scratch), and I disabled compression to avoid another known attack
vector. Validated that the options I was setting were set, and the
protocols/ciphers I enabled were in fact the only ones being used. Spent
some time refactoring the code to be cleaner and easier to follow and
also added more logging to help make it obvious what was going on at
each step where something could fail.
Found a test that was failing to pass when building packages for 32bit
Debian and spent some time trying to fix it. Some constants I was using
to test edge cases in scheduling were too large and overflowing
variables giving incorrect results. Forcing them all to be the expected
size fixed that and the tests all pass.
Spent some time trying to figure out why my server and client could not
find any ciphers in common that they could use, despite having identical
lists in exactly the same order. Turns out that if you want to use DHE
or ECDHE ciphers there is extra setup required, but very little
documentation pointing this out (the documentation about how to do it is
fine, but nothing makes you aware that you have to do this).
Added code to tests so that they only resolve address families that are
present on available test interfaces. This used to be set with a flag to
getaddrinfo, but we've since moved to using libunbound instead. The old
flag would also consider all interfaces on the host, while the new code
is aware that a test can be bound to a particular interface and only
checks for address families there.
Merged in the scheduling fixes I had been testing, after they had been
running for some time without rescheduling problems. Tidied up a bunch
of compilation warnings, logging, documentation etc too, in preparation
for a new amplet release. Built some new packages and deployed them on a
test amplet to run over the weekend.
Got certificate request verification working, so now signed certificates
will only be sent to those clients that can prove their identity and
their right to retrieve that certificate. Requests are fingerprinted and
saved on the server to await signing.
Had a quick audit of the key/certificate management code with Brad, and
found a few places where security needs to be tightened. In particular,
careful attention needs to be paid to making sure that insecure ciphers
can't be used. Because we have full control over all the client and
server software, we can limit all communication to using only TLS >= 1.2
with a small set of strong ciphers. Spent some time investigating
exactly which ciphers we want to use, and how to enable them.
Also spent a lot of time reading the OpenSSL wiki and various
patches/changelogs to determine when certain fixes were made. The Debian
packages are fairly up to date, but are still missing some well known
changes that are needed to improve security, so I implemented them
within the AMP environment.
Spent most of the week working on the key distribution for the amplet
clients. The server will now save certificate signing requests as well
as offer signed certificates (if present) to clients that can prove that
the certificate is theirs. Everything looks to be in order here
(certificate requests, signatures, etc all look correct) except that I
can't get the final step to verify from within code, despite working
fine using command line tools.
Also spent some more time looking into the problem of tests being run
slightly early and then being rescheduled almost immediately. The fix I
tested over the weekend didn't work, so had to try a few more things.
Each test info block now also contains the wall-clock time that the test
was intended to run, and tests that are triggered too soon can be safely
rescheduled at the correct time. I've built a new client to test over
the week, so far it looks promising.
Started working on implementing a nice way to generate/sign/distribute
certificates for amplets to use, similar to the way that puppet does it.
Clients that are missing certificates can send a signing request to the
central server (which will probably be signed after manual verification)
and then wait for the certificate to be signed before proceeding. So far
I have the client fetching the server cert, generating keys if required,
generating the signing request and then sending it. The server currently
offers its certificate, and waits for connections (but does nothing with
Spent some time trying to chase down a case where duplicate results are
being reported for some tests. It appears that tests are sometimes being
run slightly early (according to the host clock, which drifts) and so
the next scheduled run is set to occur almost immediately, resulting in
two tests run a fraction of a second apart. Even though actual
scheduling in libwandevent is based on the monotonic clock, the AMP
scheduling uses epoch based time so both clocks are required. I've
implemented a test fix and some more debugging, and will check how it
goes after the weekend.
Built new amplet packages for Centos and Debian to deploy the newest
version in the test mesh. Found a few problems running the tcpping test
on machines with multiple interfaces, which was fixed and the packages
rebuilt. Also updated the schedules on the test amplets to be closer to
what we are currently using on the main mesh in order to be closer to a
proper deployment scenario.
Added some more sanity checking to the way result messages are unpacked
by the server after (what appears to be a rather old, outdated version
of) the amplet client reported less data than it claimed to have
available, breaking the collector.
Spent some time looking into how puppet does initial certificate/key
distribution to its clients so that we might do something similar. We
need a sensible way to get certificates onto each amplet that doesn't
require a lot of manual generation and copying of files.