User login

Search Projects

AMP

AMP- the active measurement project is a system for making active measurements. It is deployed at most Universities in New Zealand and most of the non-Telco ISPs. It has a large number of built in tests. Performance measurements from the public system are available at http://erg.cs.waikato.ac.nz

Historical information:

The NLANR AMP active measurment project was lead by Tony McGregor. At that timeAMP was the largest and most widespread active measurement system. It was designed for high performance research and education networks, especially the US Internet2 networks. It was deployed by the research and education networks in 11 countries (USA, Canada, Taiwan, Norway, Finland, Australia, Thailand, Japan, Ireland, Hungary and Korea). There were approx 140 measurement points worldwide.

31

Jul

2012

Started looking at using topology data to generate more datapoints to help
group events on. Hopefully should be able to group events between sites
that share common paths (at this stage I'm planning on starting with the
AS path) as well as those that share sources and targets. As part of this
added an event detector to alert on major path changes between sites and
realised that there appears to be a bug in the AMP code to determine
common paths. Spent some time trying to track it down and it looks to be
due to counting the sample time period incorrectly, which I'm now trying
to fix.

Figured out the cause of the AMP data interface module crashing on newer
php/apache. An incorrectly sized variable was being used in the c portion
to receive data from the php portion and along the way it was clobbering
something it shouldn't have. I'm sure the compiler warned about this last
time, but not in this case.

17

Jul

2012

Spent some more time working on building useful groups of events for
RTT/loss data. I'm trying to find a compromise between including all
events that happen about the same time and grouping only those events that
are obviously related, while allowing events to be in multiple groups
where that makes sense. Some of these issues are coming about because my
sample data extraction program doesn't guarantee strictly increasing
timestamps in the warm-up phase while fetching historical data.

Tidied up some error messages in the icmp test in AMP where non-echoreply
responses were being incorrectly examined for the embedded triggering
packet. It should now properly index into those packets and record the
correct error type codes.

Noticed that sometimes the AMP tput test was failing to run in both
directions on some nodes and tried to investigate why. Running the tests
manually works, but scheduling them through AMP often fails to get the
return path test to run. Looks like there is some sort of timing issue
where the connection takes a long time to close and this prevents it from
being re-established in the other direction (the single threaded server is
still waiting for close() to return). Have yet to figure out an answer to
this.

Spent some time with Shane, Brad and Jamie poking at the Network
Diagnostic Tool (NDT) used by perfsonar, mlab and as part of the
nzbt.org.nz broadband test. Some of the results we were getting weren't of
the quality we were expecting, so we put together our own little test lab
to see how it works. Our initial tests using a virtualised server couldn't
sustain gigabit speeds across the network bridge in one direction, despite
working fine in the other (and NDT performed less than half as good as
iperf). With two physical, directly connected machines we finally managed
to get the expected TCP performance but the extra analysis that NDT
performed was still bogus - it reports network limits that are much lower
than they actually are (and lower than what the test just observed!), RTT
values that are 500 times larger than they actually are, etc.

11

Jul

2012

Updated ampcentral to build with a newer gcc, had missed this when I
rebuilt the amplet packages last week. Sent all that off to NLNOG RING
which will hopefully solve the problems they had getting it running.

Put some of my data into the web system Shane wrote for evaluating event
detection. Through that I found a couple of of events being generated that
shouldn't have been due to a previous event still being active. Spent some
time reworking the detector classes to limit these multiple events from
different instances of the same detector from being triggered in close
proximity as they add now new information.

02

Jul

2012

Worked on some ldap scripts to add new voodoo users into appropriate
groups. Tested them successfully on a machine configured very nearly the
same as voodoo. Will hopefully give them a run early next week and see how
well they work in the real environment.

Expanded on some of the install instructions for ampcentral in response to
some new users having issues. There was lots missing about new database
configuration that needs to be done to make sites and data available via
the web interface. Also built new Ubuntu and Debian amplet packages for
them to use. Had a few issues with the SSL API changing slightly and new
gcc being much more pedantic, but it builds and runs now.

Updated the display of event groups to show all events from other sites
that are related to help give a feel for what is going on. Found and fixed
a few issues with recording the timing of event groups when starting with
a fresh database and importing historical data - they should now start at
the time of the first event in the group and end at the time of the last
event.

26

Mar

2012

Fixed the http2 test in AMP to properly share the DNS cache between
simultaneous connections which means it no longer performs unnecessary
lookups for the same name. The sharing interface in libcurl actually works
quite well.

Tried to build new amplet packages including the recent changes, but ran
into some problems with libraries when building in my lenny buildroot.
Autoconf/make is meant to build a particular binary with an extra library
that the rest don't need, but this doesn't make it through to the Makefile
in lenny.

Jamie put together new RJ45-DB9 serial connectors for the emulation
network, so I created some sensible minicom configs for all the machines,
should be just as easy to use now as the old system with the Cyclades
terminal server was. Also set up udev on my linux image to force a
consistent order of the network interfaces that matches the way they are
cabled.

13

Mar

2012

Had more trouble with the emulation network than expected. Built a custom
ramdisk using proper tools (mkinitramfs etc) and a default Debian kernel
rather than using entirely custom ones. This worked fine, but after
installing an old image the machines would refuse to boot. Turns out that
the disks used to be part of a RAID and didn't have a useful MBR (and
frisbee didn't fix this). Built a master machine to create images from,
built a new Squeeze image and am now trying to convince frisbee to send
the entire image. I'm thinking a new version of frisbee may be required.

Tried to track down why the http2 test was performing DNS queries multiple
times for the same hostnames. Even with all the DNS cache sharing options
set in libcurl it will repeat requests, unless I force IPv4 only. A vague
line in a changelog looks like this might be fixed, but too recently for
the change to make it into Debian.

Finished putting together basic historical usage data for KAREN, it shows
a nice up and to the right trend.

06

Mar

2012

Added some new http2 test destinations to the main AMP test schedule.
Started running them on Massey in response to a query about web
performance and in doing so found and fixed a few display bugs. Had
another look at using the logs from the test to generate waterfall graphs
of http connections (using http://www.softwareishard.com/har/viewer/) and
found a few cases where libcurl might not be behaving as expected when
resolving addresses.

Spent some time talking with Shane and planning out how we can fit
everything together in a useful fashion for the MSI project.

Started investigating the best way to aggregate measurements from the last
few years of the KAREN weathermap to look at the growth of the network.

Watched some of the streamed presentations by Josh on Openflow, looking
quite interesting.

28

Feb

2012

Got the new graphs for latency and traceroute data rolled out onto the
live ampcentral site. Updated the install scripts to properly create the
new databases for API keys etc required to access the data in the new
style. Put this and the most recent amplet tarballs online on the WAND
research software site so they are now available, should write a quick
blog post advertising them now.

Tried further to track down the cause of incorrect data to be reported
sometimes in the new combined dns/icmp test. In the process fixed a couple
of small bugs in the options parsing, but the main one refuses to show up
while being observed.

Watched a couple of streamed sessions of the Future Broadband conference
that seemed interesting. A lot of businesses at the conference seem quite
cautious and risk averse with UFB, not wanting to invest in it till it has
proven itself in some way, while user advocacy groups can't wait to get
online. Could be some interesting developments here in the next while.

21

Feb

2012

Built AMP on FreeBSD for the first time in quite a while and found that
only very minor changes were required to build successfully. Getting the
self checks running in Linux and FreeBSD took a bit more effort with
things like data formats having changed without the checks being updated
to match. Updated a lot of tests, contact details, documentation etc to
useful current values.

Started setting up a staging point to get the new traceroute graphs
working in ampcentral and deployed. A few changes needed to work with
newer php as well as tidying up the way some of the data formats are used.

13

Feb

2012

Tidied up some more of the AMP documentation to make it more friendly to
others installing it, made some of the Debian package scripts a bit
smarter and worked through closing a few of the more simple outstanding
tickets. Fixed a whole lot more errors on the PHP side of things in
ampcentral also.

Worked with Chris to try to recreate in SQL some of the ways we use RRD in
the weathermap. This seems to be coming along - it is functional and
pretty quick but just needs tweaking to make sure that it is fast when
dealing with large amounts of data.

08

Feb

2012

Tried to find out some more information as to why the Waikato and VUW AMP
monitors are unable to reach any of the KAREN monitors over IPv6. Found a
few interesting reachability issues and changes in routing, but nothing to
point to a definitive answer. While doing so I found what appeared to be a
bug in the AMP matrix display that was showing long term average latency
for some links that were down. In actual fact they do have brief periods
of connectivity that result in average being available.

Spent some time cleaning up php errors in the AMP webpages that look to be
the result of newer php being more strict. Nothing too dangerous but they
really make a mess of the log files.

Had a couple of corrupted results from the new dns/icmp test that I can't
recreate. The code isn't too long or complex and valgrind generally has no
problems with it, but I can't cause the problem to occur when I have any
sort of monitoring in place which makes it hard to track down.

31

Jan

2012

Short week this week as Wednesday to Friday was spent in Christchurch at
NZNOG. Quite a few interesting talks this year and got to catch up with
some people I had been corresponding with which was really good.

Spent the first part of the week working on the server side of the new AMP
test, reporting and saving data appropriately.

24

Jan

2012

Spent some time working on adding a latency test to AMP that will perform
DNS lookups and then test to the (possibly multiple, possibly changing)
addresses resolved. This can hopefully be used to give a bit of insight
into some of the google services. In doing this I found and fixed a few
more small bugs in AMP that were showing up with newer compilers etc on
the emulation network.

The Cyclades terminal server in the emulation network won't be getting
it's firmware upgraded - it now refuses to boot at all. Bringing it into
the lab and having a closer look at it doesn't really show any options on
how to improve the situation.

Put together a bit of pydoc documentation for the new emulation network
set up stuff. It's not complete but it covers most of the user facing
functionality and is very easy to use.

Spent some time working with Chris and William to get scripts in place and
working, testing various bits and trying to provide some helpful
direction.

16

Jan

2012

Recent software upgrades have meant that the imaging and configuration
program for the emulation network (that was based around ns-2 years
ago) needed to be recompiled to work with new libraries etc. Decided a
better solution was to finish the python version I started writing last
year to do the job. It's fairly simple now to install and configure
machines again, and hopefully won't be too hard to add in support for
virtualised machines as well.

In doing so I found some issues with the Cyclades terminal server used
in the emulation network - it kernel panics after a short amount of
uptime, sometimes while booting, and sometimes fails self memory checks
while booting. There aren't any obvious user serviceable parts inside (no
flash that we can replace), so going to try updating the firmware and see
if that improves the situation.

Also spent some time trying to track down unusual ipv6 results that were
showing some sites as having no connectivity to other sites that they
normally should. Both Waikato and Victoria started failing to reach test
destinations at the same time last month, which I'm trying to track down.

29

Nov

2011

Collected some new traceroute datasets for Chris containing fewer
unresponsive hops. Still quite a significant number in the middle of
the path don't reply and it seems they probably won't due to the way they
are configured. May need some heuristics to try to merge unresponsive hops
that are probably the same device.

Saw an NZNOG post about web page loading times ("user experience") and
thought it might be interesting to do a bit more work with the web
download speed data we've been collecting on a couple of AMP monitors.
Wrote a script to convert the data from the http2 test into the HTTP
Archive (HAR) format so I could use their tools to generate waterfall
graphs similar to what firebug produces. This seems to be working fine
except that data from the Waikato monitor is showing sub-millisecond
connection times to websites hosted in the US. Spent some time looking
inside libcurl to see if the problem could be there, but it turns out that
it is the fortinet on the edge of the university network acting as a web
proxy that is breaking my results.

In trying to generate a nice state machine graph to use to illustrate some
points I decided that the merged tree looked really bad - it had large
numbers of transitions between the same pairs of nodes. Generated a lot of
graphs showing the distributions of object sizes across those transitions
to confirm that they were all distinct (in general they were).

08

Nov

2011

Short week back due to travel. Most of the week was spent catching up on
emails and things that needed to be done while I was away (weathermap
updates, etc). Spent some time looking at the way scamper was packaged for
AMP machines and checking that it would work fine for an upcoming topology
collection project, as well as investigating NZ IPv6 documentation
(community best practices) to see if there were any clues that might make
it easier to perform measurement to addresses that are actually in use.