User login

Brendon Jones's blog

29

Jun

2011

Got my AMP package for Ubiquiti AirOS to the point where it will check
for updates on startup before running (with a small random offset), checks
for updates to AMP configs and checks for updates to the firmware image at
random but known times of day. Should there be an update it will apply it
and restart anything that needs to be.

Looking at having certain tests wait for the link to be idle before
running themselves seems to be best accomplished by using chained tests -
the first one can check the traffic on the link and delay as required,
before either aborting or allowing the chain to continue. Started to write
up a simple version of this test and it looks like it should do the trick.

21

Jun

2011

Got AMP running happily on the Ubiquiti AirRouter and reporting results to
another machine. There were a few byte ordering issues with the AirRouter
being MIPS and the collector running on an x86 machine, but most of the
work here had already been planned for so I wasn't required to make
wholesale changes to get it running. Still had to spend a bit of time
tracing through the code checking what values were being used where and
making sure all communications were appropriately byte swapped.

Some changes in between libcurl versions were throwing off results
generated by the http2 test which had to be tracked down and fixed to get
it running on the device.

Investigated in greater depth the init system used for the AirRouter and
how to get AMP running on startup. Looks to be a few options on how this
can be done, but I think I've figured out the nicest approach to get it
doing what I want.

14

Jun

2011

Spent most of the week getting better acquainted with the build process
for both packages and for firmware images. I got an AMP package building
fine within the environment last week but getting it running properly also
needs the supporting configuration files to be installed in the right
place with the correct file permissions. The filesystem layout is a bit
different to normal and (in most cases) is read only so I have to make my
changes at build time. Took me a while to discover that the final script
run before building the image clobbers all my permissions changes - had to
put in a few exceptions, which meant making changes outside of my
packages, which I would prefer not to have to do.

Also spent some time dealing with getting the ntpclient working properly
in my image. The slightly newer version I'm using accepts different
arguments to the version the Ubiquiti config generation binary blob
expects. Also noticed that this clobbers or adds to various configuration
files that are provided by the base packages. Chasing this also made
explicit to me the separation between firmware and configuration as stored
on the device itself, and the different ways that each may be updated.

07

Jun

2011

Started to really dig into OpenWRT this week. Downloaded and built the
build environment/toolchains to let me cross compile AMP for an OpenWRT
router and set about getting it going. The method of constructing packages
is quite similar to how Debian does it, so it was fairly easy to get the
Makefiles etc set up correctly. Ran into a few troubles with specifying
(and actually installing) dependencies and had to learn about "quilt" to
automatically patch the upstream source as part of the build process
(quilt seems like quite an interesting patch management tool).

An Ubiquiti AirRouter arrived on Friday for me to start testing with. I
moved my development over to using their SDK based around OpenWRT which
means we can keep the existing web interface though trades that off for an
older kernel version. Successfully flashed it with an image I had built
and managed to run the amplet client code! Getting config files into the
appropriate places doesn't quite work as expected so it exited instantly,
but it did actually work which is a confidence boosting start.

31

May

2011

Extracted some more data from ISP traces to use in predicting client MTA
and possible spam status. It isn't obvious that throwing more data at it
than I already had has helped, but even with the limited number of flows
with client MTAs that I can identify it is accurately predicting the MTA
70-80% of the time (on flows where it could be one of multiple MTAs -
ignoring flows that traverse a link used only by a single MTA).

Also ran some ISP traces with a full traffic mix against my machine
generated from a subset of SMTP flows. It missed lots of prematurely
terminated SMTP flows (eg ones that ended immediately after the HELO/EHLO)
because the training data only included connections that sent DATA. Of the
49,000 non SMTP flows it classified 158 as matching SMTP, these were
entirely FTP and POP3 flows which are quite similar.

Started reading up on OpenWRT in preparation for getting AMP to run on it.
Should hopefully have a device to test on in the next week or so but in
the meantime I need to get my head around how it all fits together.

23

May

2011

Spent some time getting the state machine generation code to read in a
machine from a previously output dot file so that the same machine can be
quickly reused to run different traces. Added extra reporting on spam
counts etc per link so that these can be used by programs later in the
chain when generating graphs of the paths spam/ham take through the
machine. This will hopefully let me run a few large traces through the
machine once and then use that data to test and evaluate others in a
fraction of the time.

Started investigating which MTAs the clients in my traces were using to
see if there were any interesting patterns. Approximately 20% of clients
accepted my test connection and 94% of those gave me something useful in
the banner or help message to identify them. Am currently waiting on
another run through of the trace with extra reporting identifying the MTAs
involved so I can compare between them.

Worked on getting the WRAMP simulator up and running on a more modern
version of wxWindows and with a newer compiler. Lots of search and
replace later it seems to be working fine. Most of the issues were with
wxWindows no longer accepting a good old fashioned char* as a string and
needing to convert everything to a unicode capable wxString.

17

May

2011

Successfully got the state machine generation running across ISP traces,
fixing a few bugs that the new dataset exposed along the way. Took the
machine that was generated using the ISP data and ran it with the older
data with known spam status to see how they compared (quite similar).
Again, it is quite clear what is spam after the point it is rejected by
the mail server but the distinction is much less clear prior to that.

Started to work on reading the machine back in from the output dot graph
files so that a pre-built machine can be used to run against any object
trace without having to rebuild the machine every time.

Spent some time working on documentation about embedding R in C code in
response to an email query I got. I've been tinkering with this off and on
for a while and should blog about it when it's more complete.

10

May

2011

Further refined the spam classification for my existing dataset based on
the spam assassin logs. Building the state machine for the new data shows
every flow tagged as spam going through the same set of transitions
(corresponding to 550 errors and exiting), which makes sense seeing as
anything considered spam gets rejected. From that point on it is very
clear which flows are spam and which aren't, but the small amount of spam
left in the dataset isn't enough to differentiate any of the preceeding
links.

Started looking at some of our recent ISP traces to build a larger dataset
with more spam flows. The data is more useful with some idea as to which
flows are spam and which are ham, so I've used the spamhaus block lists to
get an approximate classification. The data is current enough that the
block lists should be fairly relevant and accurate, and if this looks
promising I can capture new data or perhaps try to get access to mail
server logs. At the moment the state machine generation code is being run
over approximately 1400 SMTP flows (of which one third are spam) to see
how this differs from my old dataset.

Also made a few updates to the KAREN weathermap and spent some time on
documentation covering how to make similar updates.

02

May

2011

It looks like the major problem with the spam dataset I've been using is
the classification of greylisted flows as spam. Greylisting is a very
common thing to happen to incoming flows on our mailserver and mostly
looks to occur early on before data is sent. This meant there was a vast
number of almost identical flows that were being counted as spam. Removing
these flows from consideration gives me a smaller dataset, but one in
which almost every flow traverses at least one link that is entirely
classified as spam or ham. At this point the small number of flows that
don't do this appear to involve TLS and will require closer investigation.
Will also need to expand into newer and larger datasets, hopefully some
without greylisting that see larger volumes of spam.