User login

Christopher Lorier's blog




So last week I did a bit of reading about SNMP, SQL and Perl, not much more.

This week, I started properly working on making a poller work outputting to an SQL database rather than RRD.

I looked about on the internet, and found a plugin for collectd that would already do that. So this week I had a go at making it work..

After a barrage of requests for help from Brendon, we eventually got all the software installed on a computer, and the plugin actually working. So now I am working on writing php to read the data out of the database. I also spent about 5 minutes working on a 5 minute talk for the summer scholarship end function.




This week I tidied up all of the loose ends with the code, it now will make a graph that displays unknown nodes, assuming that any unresponsive hop in between any two given nodes is the same unresponsive hop. There was a bug in this code that took me a very long time to find, so that took up most of my week. Other than that, I readded the memory freeing, fixed it so it will properly escape characters when necessary in its outputs. I improved the python programme that resizes the nodes after they have been grouped so it will also remove edge weights. And I wrote a text file that explains exactly how to use the programme.




Brendon had a larger set of alias data, so I had a go at using that, which took me a lot of effort that I still cant fully explain, and involved producing a graph which had a single node connected to over a third of all nodes on the graph. We couldnt rule that out as a possibility entirely, so that involved a lot of investigation. Eventually I got to the point where I could no longer reproduce the bug, so, I guess I fixed it..

Anyway, that led to trying to make the programme select aliases more intelligently, as opposed to picking them entirely arbitrarily. I made it so it picks an arbitrary address from the network that more of its addresses are part of than any other, or arbitrarily if there is a tie, with a special case for Reannz which gives it priority over other networks.

This required a reasonably major change of how the programme worked. which, on the plus side should mean I can fix the memory leak from last week much more easily, though I havent done that yet (actually I got rid of all of the memory leak removal stuff to get this to work, and will put that back next week).

Then I produced graphs with the data, and asked Shane how he found the idea of using the second graph to index the first. He suggested that on the network graph to size nodes based on the size of the networks, so I wrote a programme to resize the nodes in python. However gephi seems to be rescaling the node sizing when it produces the images, so whatever you do you end up with significant differences in the size of nodes.




I spent the week fixing memory leaks in the code, which are now all gone but one, which is small and not going to cause significant problems.

I also produced graphs using Brendon's alias data, which, disappointingly, looked pretty much the same as the original graphs. Though there are several hundred fewer nodes (in the ungrouped version). Which address is used for the node is chosen arbitrarily (whichever appears first in Brendon's list), and in the case of the original issue that tipped us off to the fact we needed alias detection - that Waikato appears to connect directly to things it goes through Karen to reach - it chose a Waikato address. However, looking over the alias data the only alternatives were a second Waikato address, or a Fijian address.




I tried graphviz again, and managed to produce a large black and white graph with a slightly smaller data set, but it doesnt cope with the full sized data.

I modified Brendon's preprocessing programme so it would check the asn list from team cymru before performing a lookup, so I could set the rtt cutoff much lower without having to do heaps and heaps of lookups.

Brendon made a list of aliases, so I made the programme read those in, and choose an arbitrary address from the aliases for each node. Though the results seem shaky at least with this first data set.

Brendon also got some data from a lot more sources, so I produced graphs with those, using the preprocessing to eliminate the international nodes, but not with the aliasing.

I'm away all next week.. merry christmas.




I tidied up my code a lot, and combined the 8 or so different versions of the programme into one that takes command line options.

I made a few more options for the data output, I tried removing all leaf nodes, so it would only display core nodes, but this had little impact in clarifying the data. I tried eliminating the first arbitrary number of nodes in each hop, but the graph would break up before there was any significant improvement in clarity. I also allowed it to remove any hop that never had an rtt less than 35ms, in order to remove the foreign nodes that had creeped in, though this led to a lot of false positives.

I also managed to get gephi to produce an ASN graph, though my solution was pretty clumsy. I saved the graph in gephi, edited the file and reopened it..

I also made a graph with LGL. Gephi is coping with the size of the graphs however, and does give a lot more control over how the data is represented.




I added reading in ASN data to my programme, from files from Team Cymru's website, and used a formula Perry gave me for converting ASNs to colours, so now graphs are coloured by ASN.

Assuming every node with an unknown address was a new node was resulting in 80% of the nodes in the graph being unkown, since the same routes are taken many times. So I made it not display unknown nodes.

Brendon got some new data that checks more times for an address before giving up, which reduced the number of unknown nodes, but it still has around 60% unknown nodes.

Gephi can group nodes according to an attribute into one larger mode, so I played around with that, but it leads to awkward looking results, which I havent been able to work out how to overcome.




I converted my graph file producing programme to c++ so it can take in warts files as inputs instead of text files, and so it can store addresses as Scamper addresses, allowing it to deal with ipv6. I had some larger sample data from Brendon and tried producing graphs with that using graphviz sfdp, but it didnt cope well with more than 1000 nodes, plus it tended to produce pretty ugly graphs..

So I had a go with Gephi, which, surprisingly, handled the data without any problems at all. Or at least without any more problems than it has handling small graphs.

I played around a bit with the algorithms in Gephi, and managed to produce something which I think reasonably elegantly demonstrates the data.

Currently I am trying to match the addresses to ASNs, so I can use that to colour the nodes.




I wrote programmes in java that convert a text file of traceroutes to graph formats for various graphing programmes, then played around with them with a small set of data.

I started rewriting them in C, so they can access the warts files directly.