User login

Stephen Eichler's blog

25

Jun

2014

The results from the first successful Doubletree run using the MDA data to approximate regular Traceroute make fairly good sense, and show that the steps taken to reduce unnecessary processing have been successful. The second half of the full analysis have now been included, now that initial testing is complete. A full analysis run is now underway. I also plan to analyse some many sources scenarios using the data in reverse direction.

The Megatree MDA data analyser has been updated to analyse some initial many sources scenarios, and a new run is underway. This meant modifying the factor structure of the program to include new combinations. So far Doubletree and Megatree use the same factor structure. Next I plan to include some many to few scenarios that activate varying sized sets of sources one after the other, and supplying global distributed information amongst the sources on this basis.

The ICMP black hole simulator run has completed successfully and results have been generated using warts analysis. This method finds less load balancers in total, but however is still useful as this can be countered by carrying out more runs, and this is intended to be done on Planetlab.

A graph has been produced of the probe packet counts versus timing modes using data from the Internet Simulator that runs on Caida data. The packet counts for few sources across the timing modes (1stage, 2stage, 10stage and staggered) are very similar showing that there is only a small affect of sharing global information. For many sources 1stage is higher than the others which are identical. Strangely the 1stage level is higher than the Traceroute level and so I am rerunning this data point after regenerating leaves topology data and then trace list data. The stage counts are the number of stages carried out before all subsequent probing is carried out simultaneously. It would appear that some of the window size options used in the other simulators might also be appropriate here. It is also interesting to note that the processing times recorded for these simulations are of the order of a few minutes, excluding generating the memory map file which in some cases is reused.

18

Jun

2014

A single simulation rerun was completed on the Doubletree Internet simulator. The automated graphs were updated to include this data point. The graphing was also modified to do bars rather than lines and points, and monochrome dashed boxes were also set up in the png file output. Looking at the data I noted that the numbers of control packets reported need to be checked. Once this is done I plan to run the simulator with reduced data to create another factor variable.

Automated graphing was also set up for the Megatree Internet simulator graphing. The monochrome png file settings were also applied to this program. Once again a number of categorical data variables were applied to the x axis. These included timing which means the number of consecutive traces that contributed to the local and global data sets, number of sources included in the simulations, and similarly the number of destinations included.

The black hole detector drivers were changed over to ICMP probing, and a run was set going on Yoyo. This was a step towards running on PlanetLab. The original data that made me think that I was finding black holes in load balancers was reviewed and a typical case was analysed. This was where Paris Traceroute ran one hop shorter than the two MDA Traceroute runs. The MDA traces showed that one arm of a terminal load balancer connected to the destination whereas the other had no further links found. One trace collected using Paris Traceroute goes down one arm of a given load balancer, because the flow ID is kept constant. It is interesting that the Paris trace stops short in the other arm in this case without reaching the destination, which there should be a valid path to. Note that the convergence point of the load balancer did not appear to be reached.

11

Jun

2014

The scamper warts based MDA load balancer simulator of Megatree was updated to calculate how many control packets are required using a size of 576 bytes where pairs of IpV4 addresses each take up 8 bytes. Megatree does an initial traceroute and determines if there are any load balancers that have been seen before, and if so avoids remapping them. After the first attempt at improved packet counting, it was necessary to adjust the counts to take into account partly full packets and count them as one packet.

A method for simulating Doubletree with the MDA data set used above has been developed. It was necessary to write routines to simulate conventional Traceroute and Doubletree. The calculations of control packets were along the same lines as above, but a bit different requiring redevelopment of the same code as a starting point. The first run of this code is underway after a small amount of debugging and some code checking.

I am also trying to decide if there is some way of combining Doubletree and Megatree for analysis of MDA data, however at first glance there is no easy way to do this. Doubletree could conceivably be applied at the initial Traceroute stage and at the later load balancer detection stage using MDA. This would be in addition to avoiding the remapping of load balancers.

The black hole detector is looking less promising than initially as I remove more and more exceptions to the discovery of a black hole. The most common remaining scenario is that of Paris traces that are one hop less than the MDA traces and the final hop of the Paris trace is not the MDA destination. I am in the process of looking at some warts dump data of these cases. In addition I have also started looking for cases where load balancers have dropped a successor due to a black hole in that arm of the load balancer. This is what might be expected if the network functions correctly. It is not immediately clear how long it would take a network to adapt to the loss of a node in a load balancer arm, and it may be that we would only very rarely detect a black hole in a load balancer.

05

Jun

2014

Further development and debugging of the warts based simulator based on Traceroute MDA load balancer data has been carried out. This is a simulation of a process called Megatree which limits the number of times the same load balancer is discovered. This simulator carries out all simulations in one pass overnight. This means that a new round of debugging can be carried out each day, if needed. Development has included a new factor which is varying the number of destinations for the simulations. Also planned are some analyses of many sources to few destinations, which involve using the warts data in reverse direction, and application of Doubletree to MDA data. This latter may involve combining Megatree and Doubletree. Debugging has included some more complex algorithms to keep only one copy of local and global load balancer sets when an identical set would be otherwise created.

One of the statistics output files from the conventional Traceroute based simulator is missing as the file was corrupted. I am waiting for the disk to be replaced on Wraith, so that the simulation may be repeated. In the mean time I am working on an adaptation of the graphing routine that comes with the simulator that allows categorical data to be used in the graphs, in particular the parameter referred to as the range in the software. This process has been made more complex because the range parameter value of the Traceroute files does not default to a single value like ttl does. The range parameter in this case, is instead direction of trace: either “few” or “many” sources. NB the initial ttl for Traceroute is always the beginning of the trace, nominally one, whereas Doubletree starts somewhere in the middle and this may vary according to different approaches. Because the Traceroute data uses two values of the range parameter it will be necessary to update the way in which Traceroute statistics data is incorporated into the graphs.

28

May

2014

Further development has been carried out on the warts analysis based MDA (load balancer topology) simulator. Local data is looking reasonable now and a start has been made on coding the global or distributed data processing part. The program is in the process of being debugged. Furthermore it seems like some analysis of the many to few scenario would be a good idea for this work as it would tie in with the emphasis on many vantage points. Factors included so far include various numbers of stages where controller info is made use of, and a window of 500 traces. The frequency of sending control data is also varied.

The Doubletree and Traceroute simulator runs have been carried out for the data from one day of Caida data collection. I am now in the process of producing graphs from it. Factors included many versus few sources, Doubletree versus Traceroute and varying numbers of stages where control data is sent. It seems like a good idea to also reduce the number of vantage points in several stages, and repeat the simulator runs.

20

May

2014

The black hole detector finished its run and so I attempted to run the analysis that I worked on previously. When I ran this it generated a segmentation fault and so I reran it with permission for a core dump. When I did this it came to the same point and failed to exit. As I was worried about running jobs that get out of control, I ran it on my laptop rather than Spectre and found that the same thing happened. When this happened I closed the terminal window to force the program to end. It however still took a while to end, and when it did I found a valid core dump. It turned out that because the program was using a lot of memory, when it dumped it took a while to say what it was doing and was in fact writing a large core to disk. After dealing with a couple of cases of null pointers, which I found out about from the cores, the program ran normally on Spectre again. I then further reduced the cases that it reports. I excluded cases where the destination was reached in a shorter number of hops which indicates asymmetry of a load balancer. I also excluded cases where the initial and final MDA Traceroutes differed. This latter point should perhaps be revisited to allow load balancer structure to differ i.e. If a black hole still exists at the final MDA trace, the data would not be seen if it has been excluded.

Further investigation of the non flow-ID, load balancing fields data has involved checking that the cases of missed load balancers where load balancers have been seen are actually now simple non load balancing nodes. In the cases that have been investigated this expectation has been confirmed.

The MDA internet simulator based on warts analyses programming has run without crashing to determine probe packet counts where local set information is used to avoid repeated analysis of the same load balancers. The program is now in the process of being debugged and extended to include distributed performance analysis. The analysis has 40 factor combinations which run in one go taking one or two days. It does use quite a bit of memory so will probably need to be run on Wraith as it grows.

13

May

2014

I am a little suspicious that the results for finding load balancing behaviour of normally non load balancing fields may be caused by some kind of noise. There are cases however where a particular node is tested a small number of times and some cases find a load balancer. I then looked at some cases of load balancers found and noticed that they tend to involve only two routers which are joined together by multiple links. In particular, if a node has been excluded from the summary if it has ever been seen to show per packet behaviour. It is not at this stage, clear why some nodes would show this limited type of load balancing and other times only show a single link and no load balancing.

After the successful run on the Internet Simulator with one days team data, a new stat was included in the program to show the number of packets used not including controller packets, as well as the existing statistic that includes both controller and probe packets. A run was carried out using the modified code and sensible results were observed. Using the existing memory map, the simulation took only one day. This may mean that using this amount of data, gang probing may again be a possibility though not essential.

The second phase driver for the scamper black hole detector was tested and debugged. The latest testing involved manually adding a line to the targets file once the driver was running and waiting. Once that test was successful a full blackhole detection run was initiated on Yoyo, and appears to be behaving correctly.

The Megatree MDA simulator kernel is being tested to see if the program can correctly alternate between data from different vantage points for a given run. This work is in the debugging and rerunning stage and is gradually starting to behave correctly. This simulator is based on the scamper warts analysis program that I use for analysis, and will involve directly accessing topology data rather than running packet by packet. This will make the processing of large data files feasible, and will still allow access to packet level information.

06

May

2014

Further work has been carried out on the black hole detection system based on a fast mapping approach. An initial data set has been collected and the construction of an analysis routine has begun to investigate the series of MDA and Paris traceroute runs. Much of the same code will be able to be used as in the earlier routine, however the new data sets have all the traces mixed in together so the ones for analysis must be identified and grouped according to destination address. This so that destination cases where a black hole is found can be reported.

Another angle relating to this same work is the development of the drivers. It turns out that the program loop waits at some points if no new results need processing. This means that scheduled regular tasks will not be triggered if they rely on the loop circulating. In particular changes to the targets list will not be processed and new targets will not be analysed. This will require investigation into how to avoid the waiting at certain steps. Once this is achieved some sleeps will also need to be added to avoid too much CPU usage.

The Internet simulator appears to have carried out a successful simulation when the data set was reduced to a third. This success was achieved after having to make a change to an existing assertion about some data variables. It seems that under certain circumstances the was able to detect an allowable condition. The following is the assertion: assert(firstHintTime <= simTime); There is a method which can occasionally reset the firstHintTime and possibly make it greater than the simTime: initialiseHints(void).

I have also started on an algorithm to process warts data and approximate a simulation without the great cost of processing packet by packet. This approach is still able to provide information about packet costs as warts records most commonly needed packet details.

29

Apr

2014

I have been working on the fastmapping like approach. Two drivers are required for this. One detects Paris Traceroute runs that are shorter than the original MDA (load balancer detecting) run and the other uses the data from this to initiate a further series of Paris traceroute runs using the flow ID that was used in the short Paris run. Paris traceroute uses consistently the same flow ID within a trace analysis. The first driver has worked correctly and the second is under test.

For the Internet simulator a shortage of virtual memory has been encountered when processing a complete team traceroute data set from Caida. Tony has said it may be possible to require less memory by adjusting the simulator program. The focus of this work is to try and quantify the cost of control packets when using Doubletree, and to compare this with Traceroute.

It was also decided recently to try and simulate Megatree using the Internet simulator. However the amount of data required to do this is much larger than the already too big case above. Alternatively it may be possible simulate Megatree using a variation of my warts analysis programs which operates at the discovered topology level rather than necessarily the packet level. The warts data could be processed in the order it was collected and the Megatree savings could be made using the available data. An approximation of packet usage for a given load balancer will be required as only the grand total of packets used for bringing forward are recorded. Bringing forward is the way that MDA gains access to nested load balancers and finds flow IDs that give access to the successor set of nodes for probing. The actual probing packets are recorded. This approach should require much less computing power and still give detailed information on the performance of Megatree with various factors and levels applied. These include sequential versus parallel probing of destinations and various degrees of this, with and without distributed Megatree, and with and without Megatree.

01

Apr

2014

Since it was decided to search for black holes in load balancers I have been developing the first driver of a pair which will add an entry to a targets list file if a short Paris traceroute trace is found, signalling a possible black hole. The Paris trace is compared to an initial MDA trace to determine if it is shorter. MDA traceroute is the mode that maps load balancers as well as linear parts of a trace from source to destination in the Internet. Paris traceroute follows the same path through a per flow load balancer with each subsequent packet sent. There has been some debugging to do with the driver and then subsequent reruns, to get it up to scratch.

A new run on the Internet simulator has been initiated using team rather than gang probing. The difference is that team probing probes to different destinations from each vantage point whereas gang probing does all the destinations from each vantage point. The CAIDA traceroute data is collected with team probing, so if the simulator is set to gang probing then extra links must be created in the memory model of the subset of the Internet under study. This addition of links takes a long time to do, and may not be practical in our case. In getting the new settings right the simulator has exited early a couple of times with an error message. Once there was a missing data file and once there was an inaccessible address for the controller.

It is desirable to now look at the big picture in regard to what the Internet simulator can be used for. Two possibilities are to analyse controller cost for doubletree and to analyse controller cost and savings for Megatree which is similar but analyses load balancers. Megatree would not start in the middle of the trace like doubletree, and would require an initial single packet per hop traceroute to determine if any load balancers which have been seen before have occurred again. Then the full MDA traceroute would be carried out. Savings would obviously have to cover this initial look-see as well as controller traffic. Doubletree is designed to avoid repeatedly discovering the same sections of topology by recording trace end points and nodes connected to these. Megatree records divergence and convergence points of load balancers.