User login

Search Projects

Project Members

Message Queuing for Network Monitoring

Network monitoring systems, such as the AMP project, are more useful when measurements are taken from a diverse range of locations in order to provide information from different parts of networks. Modern message queuing protocols, such as those based on AMQP (Advanced Message Queuing Protocol), allow for more flexible and potentially interoperable message routing and more sophisticated test co-ordination than those currently used by AMP.

This project is to investigate message queuing protocols and their reliability and scalability in wide area networks, particularly with regards to coping when networks may not be behaving correctly (where network monitoring is most useful); and to recommend a solution for network monitoring systems. The chosen system will then be integrated into AMP to replace the existing simple message queuing and test co-ordination systems, for increased flexibility.

24

Oct

2012

Final report all done and submitted finally, fairly happy with pretty much all of it though there are always improvements that could be made. Also did a little tidying and memory leak fixing of my code. Going to have a meeting with Brendon and Richard at some point in the next week after tidying up my commenting to discuss whether it is useful.

Found while writing my report there are a couple of new systems around since I made my choice. Qpid Proton is a lightweight AMQP 1.0 library with peer-to-peer support that sounds rather interesting (and was just starting out when I started). Mosquito MQTT also seems to have SSL support nearly added. STOMP 1.2 was also released yesterday but changes very little. Don't think any of these were far enough along to have had a chance in the project but it's a little irritating.

16

Oct

2012

Been chipping away at my report, made a lot of headway in the weekend; and I've now preliminarily filled in pretty much all sections, with most of my introduction, existing systems and evaluation chapters properly drafted. Have spent a couple of hours when I haven't been able to think of anything to write here and there tidying up non-blocking I/O in the implementation and wrote a useful wrapper around select() to properly wait on the right thing with should_read/should_write.

Only one more week to go!

09

Oct

2012

Spent a large part of the week on an assignment, working on my report here and there. Did some implementation over the weekend so now non-blocking connecting and reconnection is mostly sorted, using a state machine. I doubt I'll get any more time to do implementation, with so many assignments in addition to the report, but at least the foundation is there for the reliable part.

In terms of the report I have written a little over 5 pages so far, including a large part of the existing systems section and a few other paragraphs here and there in introduction and implementation. I hope I have enough to write for implementation given there won't be a lot of testing to talk about (I'm really glad I have been doing unit testing now).

30

Sep

2012

Made some pretty decent progress on the remaining implementation. Added proper subscription management, acknowledgement sending (including end-to-end) and sequence numbers. Also added the beginnings of general frame processing and receipt checking. Decided to use the persistence store for queuing for now (as it should be well cached). A little worried about adding persistence into the basic stomp handling, so I might separate things out a bit if I get time later. Left to do is finishing the other half of queuing, and finally dealing with attempting to re-establish connections when they go down. Neither of these things should take long but they are reasonably crucial to the project and there is not a lot of time left. Haven't made terribly much progress with my report yet, have set up the document and written a few hundred words - it definitely needs to take priority now.

24

Sep

2012

Managed to forget my report last week with studying for a test. Over the last couple of weeks I've got non-blocking sending working, wrote an outline of my final report, finished an assignment and almost finished one not due for another week so I should have plenty of time this week to work on my project this week. Acknowledgements and tying the persistence with the sending are the last things that need to be done in terms of implementation; I also need to make some decent headway into actually writing the final report this week as well as it is due in just four weeks. Still need to do some proper system testing too. I'll also need to do a little code tidying and function documentation at some point (the code itself is already well documented) so others at WAND may be able to use my work, as due to accepting a job offer last week (so nice to have everything organised!), I won't be able to finish things like peer-to-peer or integration into AMP off over summer as I had hoped.

10

Sep

2012

My presentation on Tuesday went pretty well, stumbled in a couple of places but overall was reasonably happy with how it turned out; and I enjoyed learning of the progress everyone has made on their projects. After the presentation I went home for a break for a few days. I spent a little bit of time tinkering at my project and other assignments here and there. I have the receiving pretty much done but hit a little snag with the sending - I had forgotten just how linear and non-conducive to partial sends it was. Had a chat with Richard about it and decided to go with re-using my encode all the headers into a string function that I made for persistence, despite the fact it will be a little less efficient and use a little more memory than the complicated state for keeping track of attempting to (possibly partially) send headers line by line, that seemed to need more complication the closer I got to finishing it. I intend to get this and the plumbing things like acknowledgements organised this week.

10

Sep

2012

My presentation on Tuesday went pretty well, stumbled in a couple of places but overall was reasonably happy with how it turned out; and I enjoyed learning of the progress everyone has made on their projects. After the presentation I went home for a break for a few days. I spent a little bit of time tinkering at my project and other assignments here and there. I have the receiving pretty much done but hit a little snag with the sending - I had forgotten just how linear and non-conducive to partial sends it was. Had a chat with Richard about it and decided to go with re-using my encode all the headers into a string function that I made for persistence, despite the fact it will be a little less efficient and use a little more memory than the complicated state for keeping track of attempting to (possibly partially) send headers line by line, that seemed to need more complication the closer I got to finishing it. I intend to get this and the plumbing things like acknowledgements organised this week.

03

Sep

2012

Updated my presentation for a WAND practice talk afternoon on Tuesday. Got a lot of useful feedback and have made some changes to improve things (still a few more to do). The main thing was it was too long, so I'll need to cut a few slides and gloss over the existing systems some more.

Read up some more and made some progress implementing non-blocking I/O (though it busy waits at the moment so I'd better learn select() in a hurry - according to the documentation select() on blocking won't work as a read may involve a write). Turns out it isn't nearly as complicated as I thought - BIO_should_read and BIO_should_write return codes are from the underlying socket/BIO's perspective (i.e. that it should do some reading before the next read). This only really has bearing on what to wait on in a select(), rather than the complicated system of sometimes needing to do a write before another read and the like that I first though it was. Things are also simplified by re-handshaking happening automatically on the client side (might slightly complicate things on the server side). At this point I am glad I am using BIO rather than the underlying SSL_* as re-handshaking is rather complicated on the server side.

Unfortunately with non-blocking I/O I can't rely on BIO_puts to succeed or fail like BIO_gets, it can do a partial write; this means I'll need to refactor frame sending a fair bit as it is spread over a number of writes at the moment that don't handle this.

Had a good think about some overall architecture things (how to check receipts and the like), which should make things easier to tie together. Realised I will need a queue on the client side when things are non-blocking to make sure everything is sent in the end.

My conference presentation is at 1:10PM in S.G.01 this Tuesday (4 September 2012).

28

Aug

2012

One of the busiest weeks I've had this week so had very little time to work on 520. Fixed some bugs in last week's code and did some investigation into using non-blocking OpenSSL. I found out that the BIO_gets() line reading also works correctly with non-blocking I/O (it returns a line or it doesn't), not sure how I missed that the first time around as it means I can get rid of all of my nasty pointer arithmetic and newline finding in my receive code. Made a lot of useful progress on Monday (when I am posting this) towards that, with much more elegant code. Combining this with BIO_pending() (which is documented in a strange part of the OpenSSL documentation) means I can avoid issues with reading too far in frame bodies without a content length by reading a byte at a time from the buffer without stooping to reading a single byte per user frame getting call or being only partly non-blocking. I have also started preparing for my conference presentation.

21

Aug

2012

Busy week last week. Earlier in the week I ran callgrind profiling to look at why persistence was being slow. I then broke persistence for several hours while trying to convert it to re-using prepared statements, which turned out to be something completely silly but used up time. Unfortunately it made little difference to the speed (so it must just be the disk transactions). I can always bundle transactions if it is a problem.

The middle of the week was mostly taken up with other assignments. On Thursday afternoon I managed to come down with the flu (which luckily only lasted a few days), which while not too bad, meant I couldn't work quickly on my assignment or concentrate properly on my project. I did manage to drag myself to in2securITy on Saturday, and was glad I did. So, instead of taking the risk of breaking everything re-writing sending/receiving as I planned, I cleaned up the error handling of persistence and tested it doesn't crash when the database is locked. I also took the time to get rid of some of the worst extraneous string duplication in other parts of the code to whittle down the list of refactoring to do later.

13

Aug

2012

Got a little bogged down for the first part of the week with other course work, and was a little slow starting after that. Made up for most of this over the weekend though. I have completed pretty much all of the basic persistence/retrieval of frames to/from the database and wrote some unit tests for this. Still to enable the WAL and recovery flags, and test reloading when the program exits (though I have inspected the database with sqlite3, and before I removed duplicates they were being read back successfully by the test program). The main thing left to do (apart from a large number of minor code cleanups, some of which I have been doing as I revisit code), is to refactor frame sending/receiving for asynchronous use, which I am determined to have done by the end of the week.

I have been having a think about what is needed for peer-to-peer and have realized that nearly all of my code is re-usable for server-client communications, the only additions that should be needed are accepting rather than opening a connection and responding to CONNECT, receipts and disconnects appropriately (all of which can use the existing basic sending/receiving); as well as some kind of destination to connection subscription tracking (both client-server and server-client).

05

Aug

2012

Haven't had a lot of time this week, but managed to make some decent headway into persistence. The SQLite API is quite straightforward (if a little verbose) despite the lack of examples in the documentation. I've written a function to serialize all the headers at once, which I'll use for sending them in one chuck and to store in the database (using a relational table seems like asking for bugs, though I can always change that). I'm halfway through implementing the SQLite access abstraction to store and retrieving frames (which I should easily be able to finish on Monday), then I can just write frames to the database before sending messages. I'll need to put some careful thought and refactoring into connections going down, multiple connections and when to re-send frames. I'm going to need to carefully manage my time in order to have time for at least some useful testing before the conference.

30

Jul

2012

This week I finally added a bunch of helper functions for handshaking and version negotiation, as well as for message sending and subscriptions. Also fixed a couple of fairly major frame sending and receiving header parsing bugs that weren't obvious until I tried to actually use the program like a STOMP client. Finally got around to running valgrind and fixed a few memory leaks, a couple of which were head scratchers. Found a handy trace logging mode for ActiveMQ which shows message contents, making debugging of sent messages much easier. Most of what is left to be done will need to relate well to persistence (like acknowledgement and receipts); so I think I'll do a final refactoring push on Monday then switch to getting basic persistence going as I really need to move forward.

23

Jul

2012

Spent the first half of the week preparing for my presentation that was on Wednesday. Thought it didn't go too badly, though I think I tried to cram too much information in. I've made some progress refactoring my code into something more like a library (abstracting stomp connections and the like). Still a fair bit left to do - will really need to get stuck in next week and get on to persistence as I'm about a week behind the rather tight remaining schedule. Richard and I decided it is better to test and evaluate the system thoroughly than try and half-implement it into AMP in a hurry, given the time left.

15

Jul

2012

Finally got arbitrary length frames more or less working, and in the process thought of some nice refactoring to do to tidy things up. After a little research and confirming with Brendon, decided on SQLite for persistence, using Write Ahead Logging (WAL) journal mode to reduce overhead while ensuring reliability. Will abstract away from the underlying storage system to allow swapping in a different one later if necessary. Started drafting my presentation for Wednesday.

08

Jul

2012

Wrote a bunch of tests and thus fixed a number of bugs in my stomp frame handling. Still having some trouble with an elegant way of receiving arbitrary length frames, made some progress towards reorganising things before going home for the weekend on Thursday evening. Think I'll need an internal buffer regardless, as it is theoretically possible for a read to include the start of another frame (I'll have to be careful of this for persistence).

Speaking of persistence, started having a look at potential systems over the weekend. I'm still leaning towards a well-known well tested fairly bulletproof lightweight embedded database and/or key-value store. The main contenders being Berkeley DB and SQLite, especially as they are so widely used so likely already installed; though I am looking at others like LevelDB and Tokyo/Kyoto Cabinet. Little worried about potential version issues with BDB, though it should be less of an issue as the queue isn't permanent. Also looking to see if there are any well tested persistent queuing libraries (which is a hard term to search for). Don't see much point in a totally custom or little used solution, as the potential for edge-case bugs wouldn't really give any benefit over current AMP. Avoiding proprietary/commercial systems also helps narrow down the list.

01

Jul

2012

Ran into an issue trying to use a BIO_f_buffer to read frames line by line, where even the normal read method tries to wait to fill the passed number of bytes; making it unhelpful for reading until the null byte at the end of messages (it happened to work last week because ActiveMQ STOMP terminates messages with \0\n). Had a go at handling an arbitrary number of lines per unbuffered read, coming up with a rather hacked together solution that so far only works for lines that don't straddle reads. Looked at libstomp for some inspiration, but that gets around the issue by reading a single byte at a time (and also does nested reads). I might use the idea there of a linked list of data blocks for messages without content-length though.

Rather than spending a couple more days staring at the code, I moved on to writing a number of the helper functions for dealing with creating and destroying frames and headers, finding a header in a list, and string escaping/unescaping for STOMP 1.1. Started writing a few tests for those (as the main sending/receiving isn't very useful yet). Also had a chat with Brendon about the best way of eventually having multiple connections.

24

Jun

2012

Managed to get OpenSSL certificates playing together with the Java system in the ActiveMQ STOMP server, which took far more effort than it should have and documented how I did it (there may be a more sensible way, I'll look into that later). Made some headway into some basic STOMP frame parsing - string manipulation in C is so simple and bullet proof secure! Didn't make quite as much headway as I would have liked but I'm getting there.

18

Jun

2012

Didn't spend any time on 520 this week as I had my final two assignments for the semester due. I plan to work more or less full time on the project for the next few weeks to catch up. This will probably be reduced a bit to start with now that I need to reformat my computer thanks to a new motherboard and CPU (RMAing the old motherboard was apparently going to take a month!).

11

Jun

2012

Spent the week writing my interim report (attached) and trying to diagnose my computer (which has now completely died and won't do anything when I press the power button). It is almost certainly not the PSU, so it is most likely the motherboard - fun. Think I went a little overboard in describe existing messaging systems in my report, but at least it should be useful for the final report. Had great fun collecting reference on my laptop constantly running out of RAM, being used to 16GB.

Had an interesting idea while writing up a potential architecture in the report. With the peer-to-peer STOMP extension idea, nodes could advertise address(es) they could be reached on via normal STOMP headers through the broker. Weighing up whether to have peer-to-peer part of the client or a very lightweight simple re-sender-like co-resident broker (which would mean no custom protocol changes). Liking the latter at the moment but it would mean more to do.