User login

Blogs

10

May

2011

Further refined the spam classification for my existing dataset based on
the spam assassin logs. Building the state machine for the new data shows
every flow tagged as spam going through the same set of transitions
(corresponding to 550 errors and exiting), which makes sense seeing as
anything considered spam gets rejected. From that point on it is very
clear which flows are spam and which aren't, but the small amount of spam
left in the dataset isn't enough to differentiate any of the preceeding
links.

Started looking at some of our recent ISP traces to build a larger dataset
with more spam flows. The data is more useful with some idea as to which
flows are spam and which are ham, so I've used the spamhaus block lists to
get an approximate classification. The data is current enough that the
block lists should be fairly relevant and accurate, and if this looks
promising I can capture new data or perhaps try to get access to mail
server logs. At the moment the state machine generation code is being run
over approximately 1400 SMTP flows (of which one third are spam) to see
how this differs from my old dataset.

Also made a few updates to the KAREN weathermap and spent some time on
documentation covering how to make similar updates.

06

May

2011

This week (and some of last week) I took the Karen weathermap and implemented it using my current network map visualisation. I used a static layout for the POPs and just a basic star layout for regional devices connected to these POPs. The layouts can be applied to a given subnetwork's nodes and can be changed at runtime. As you zoom in on a POP, nodes and links connected become more visible and labels appear. The underlying concept being, that subnetworks can contain subnetworks that contain subnetworks etc.

04

May

2011

This is a test of a bug that Brad doesn't think exists

EDIT: and done!

04

May

2011

As you may have noticed, we've upgraded the WAND website. Aside from the new theme, the biggest change is the addition of blogs for each WAND member. This provides a means for us to keep the wider world up to date with the discoveries we are making as they happen. The blogs have also been tied to our weekly reporting system, so there will be weekly updates from all research staff and students at the very least. Feel free to comment on the blogs if you have anything useful to add or wish to ask questions about the work we're doing.

At this stage, the site is still somewhat of a work in progress. Now that we've migrated successfully, we'll be auditing the content to remove out-of-date information and replace it with new content that reflects what we are doing now rather than what we did several years ago. Expect to see a few changes over the next few months...

03

May

2011

Finished bayesian forecast algorithm and sampling code last week. I also checked out my old kalman filter and ARIMA code. Started changing the code to take the sampling code output as input. Both code requires some tweaking on parameters, to do it properly, I borrowed a book called "Time series analysis" from library and started reading related chapter.

02

May

2011

It looks like the major problem with the spam dataset I've been using is
the classification of greylisted flows as spam. Greylisting is a very
common thing to happen to incoming flows on our mailserver and mostly
looks to occur early on before data is sent. This meant there was a vast
number of almost identical flows that were being counted as spam. Removing
these flows from consideration gives me a smaller dataset, but one in
which almost every flow traverses at least one link that is entirely
classified as spam or ham. At this point the small number of flows that
don't do this appear to involve TLS and will require closer investigation.
Will also need to expand into newer and larger datasets, hopefully some
without greylisting that see larger volumes of spam.

02

May

2011

I said I'd start doing these again when the new WAND website came online, gambling that this was never going to happen. Unfortunately it seems that I was wrong.


Next week I'll try and have my automatic report generation system going - the prototype generates reports from SVN logs, though I want it to integrate with my Trac tickets as well. Until then, enjoy this painstakingly hand-crafted report.


This week I have been:

  • Cleaning up ClusterGL for packaging. There was a bunch of nasty Symphony-specific code (primarily the launch scripts) that needed to be made generic, and I'd quite like to have some .deb packages too. I ran into a few people at Eurographics who were interested in running it, so it's probably important I get it in a releasable state sooner rather than later.

  • Working on the hydrac compiler. In order to shift data from (relatively scarce) RAM to (relatively abundant) flash memory, I've developed a system that takes Java bytecode and converts it to a .c file. This means that read-only data structures can be created statically offline, and not have to be initialized at runtime on the nodes - this saves lots of RAM and CPU.


    This means I have enough space to fit more complex application code - before I ended up with enough space to fit 'hello world' and not much more, which was not overly useful.

  • Investigating accelerometer data processing on my sensor nodes to detect different types of human movement patterns. This is intended to integrate with a project we're running with Arts, and is a nice test of my phd work :)

  • Prototyping frustum-rotate code in CGL. Google are funding CGL development now, and they want it to run on their hardware platform. They don't use a display wall as such - theirs looks like this. As you can see, this is a rather different shape to our wall, so our existing viewport offset code won't be correct. Luckily, it appears I can simply inject a 'rotate by n degrees' command and it will be correct, or at least as correct as Googles current solutions.


    This gets slightly more complicated when more complex applications are involved, as they will attempt to do view-frustum culling on an incorrect frustum.
    I suspect this can be solved by simply applying a large application-side FOV - 'cl_fov 180' or something in OpenArena, for example. Further testing is required, but it's early days yet.

  • Recovering from jetlag. I wake up at 7am now, it's awful.



EDIT: switched to HTML formatting.

EDIT 2: Okay, markdown+html appears to work, kinda. Except every time I edit, it resets to 'filtered HTML', it won't save. This isn't entirely obvious, as it's a minimized field.

EDIT 3: This is all rather broken, perhaps I should have said I would do reports when it works properly. It would look a lot better if the list items were indented properly. Also this doesn't appear to respect paragraph breaks properly irregardless of input format, and forcing one using a br tag causes it to leave a much larger gap than you'd expect.

EDIT3: Readability formatting.

02

May

2011

I continued working on my paper to CCS. I've got the most of the paper sorted, and have been focusing on what numbers I wish to show and extracting and verifying them from my results.
I have also found a lovely macro for openoffice that converts spreadsheets to latex tables.

02

May

2011

Finished up the draft of the paper for ATNAC on inbound sessions.

Wrote a draft of my talk for ICT. Gave a practice run on Friday, which
received lots of helpful feedback. Will be working on incorporating that
in before I leave on Friday.

New website went up on Thursday, so I spent a fair bit of time poking at
it and finding problems for Brad to fix. Trac spam seems to be a fairly
big problem, even after adding captchas.

29

Apr

2011

This week has mainly been focused on finishing my 'user study' chapter and getting the chapter reviewed. The feedback was pretty good but there are a few more changes to make.

The aim over the weekend and next week is to get as many chapters ready for Tony as I can. The main difficulty will be making changes to my introduction and getting it reviewed again before passing on to Tony.

Chapters 2 and 3 have some changes to make, then 4, 5, 6, 7 and 8 are pretty much ready as they are. Chapter 9 requires some changes and then once those chapters are sorted, I can write my conclusion chapter.