WAND Network Research Group University of Waikato Crest Of Arms

Investigating Sleeper and Dark Traffic at a New Zealand ISP

Motivation

This study investigates the traffic destined for hosts that had been previously active but had not sent any packets for some time (so are presumably now inactive) to see whether ISP customers were receiving, and therefore being charged for, significant quantities of traffic when they were not actively using their Internet connection.

We also look at the applications that were responsible for any such traffic, as this may provide some indication as to why customers were receiving the traffic even though they were no longer actively using their connection. Finally, we also examine traffic sent to IP addresses that were never active so that we can distinguish between regular unsolicited background traffic and the traffic patterns that result from previous customer activity.

We use the word "Dark" to refer to IP addresses within the ISP address space that were never active, i.e. never transmitted any packets. "Dark" is also used to refer to traffic destined for hosts that were dark.

We use the word "Sleeper" to refer to IP addresses that were previously active, but had not transmitted a packet for some time. We assume that these hosts were no longer active. A threshold of 30 minutes of inactivity was used to determine whether a host had become a sleeper. A host ceased to be a sleeper as soon as it transmitted a packet.

A flow is defined as a sleeper flow if the customer it is destined is in the sleeper state for the entire duration of the flow. Thus, only flows consisting entirely of inbound traffic can be sleeper flows.

The Datasets

We have examined trace sets captured from a New Zealand ISP. The traces were filtered to contain residential DSL traffic only and were captured in January 2009, 2010 and 2011.

Basic Stats

Firstly, we examined the amount of traffic contributed by dark and sleeper flows. The results are given below - note that an IP is only classified as a sleeper IP *if* it is the recipient of a sleeper flow.

Dark (2009)Sleeper (2009) Dark (2010)Sleeper (2010) Dark (2011)Sleeper (2011)
Inbound Flows 6.22 %8.93 % 6.06 %11.11 % 13.27 %9.77 %
Inbound Bytes 0.03 %0.04 % 0.02 %0.04 % 0.03 %0.03 %
Inbound Packets 0.20 %0.32 % 0.25 %0.41 % 0.35 %0.31 %
Observed Local IP Addresses 44.62 %54.77 % 34.67 %65.14 % 38.31 %60.79 %

These results show that, while there are a lot of flows targetted at dark or inactive space, only a very small proportion of the data entering the ISP network is destined for unresponsive addresses. Almost all active IP addresses receive traffic while they are inactive, unsurprisingly. Broadly speaking, these results do not suggest that sleeper hosts receive significantly more unsolicited traffic than a typical dark IP address.

Applications

Using libprotoident, we have attempted to identify the application involved with each of the dark and sleeper flows. Any flows where there is no payload transmitted (i.e. no data after the TCP or UDP header) cannot be identified by libprotoident and are classified as "No Payload".

The following graphs show the top 10 application protocols (as well as Other, in appropriate instances) based on flow, byte, packet and local IP count.

IPsFlowsBytesPackets
Dark (2009)
Sleeper (2009)
Dark (2010)
Sleeper (2010)
Dark (2011)
Sleeper (2011)

A few notable results from these graphs:

  • No Payload flows account for the majority of the dark traffic and are also a big contributor of sleeper traffic.
  • BitTorrent UDP is much more prominent in sleeper traffic than dark traffic.
  • The same is true of Skype, to a lesser extent.
  • Windows Messenger spam and SIP UDP were big in 2009 but have vanished in 2010.
  • There appears to have been some sort of DNS scan in 2011.

Ports used by 'No Payload' Flows

As No Payload flows are a significant contributor of both dark and sleeper traffic, we decided to examine these in further detail to see what ports are being probed by these flows.

As with the earlier graphs, we have plotted the ten most popular port numbers used by flows that were classified as No Payload.

IPsFlowsBytesPackets
Dark (2009)
Sleeper (2009)
Dark (2010)
Sleeper (2010)
Dark (2011)
Sleeper (2011)

The simplest conclusion to draw from this is that there is still a lot of scanning of old favourite TCP ports, such as 135, 23 and 445.

The Customers

In this section, we have examined individual sleeper IPs to see if there are any instances where a particular user might be the recipient of a significant quantity of sleeper traffic. In the tables below, we list the top 5 sleeper hosts in each dataset, ordered by the number of sleeper bytes received. We also report the protocol that contributes the most sleeper bytes for that host, so that it might be possible to determine the cause of the higher quantities of sleeper traffic.

2009

RankSleeper FlowsSleeper Packets Sleeper MBsPrimary Protocol % of Bytes matching Primary Protocol
148,101379,789 47.9PPStream98.758
2161,420233,269 27.3BitTorrent UDP80.360
3163,841184,117 25.4BitTorrent UDP 93.957
4105,020175,706 19.5BitTorrent UDP 45.413
51,01918,552 19.0Unknown UDP 99.301

2010

RankSleeper FlowsSleeper Packets Sleeper MBsPrimary Protocol % of Bytes matching Primary Protocol
1311,762342,557 48.5BitTorrent UDP97.833
2153,474161,804 23.9BitTorrent UDP99.438
3100,658295,796 20.8No Payload 99.723
4120,007128.086 18.3BitTorrent UDP 98.093
5110,490119,953 17.3BitTorrent UDP 97.156

2011

RankSleeper FlowsSleeper Packets Sleeper MBsPrimary Protocol % of Bytes matching Primary Protocol
1313,566761,430 60.4BitTorrent UDP54.223
21,118131,589 25.2Unknown UDP99.412
3148,722186,875 20.1BitTorrent UDP 96.204
4110,935148,891 19.0BitTorrent UDP 87.262
5115,419129,652 18.8BitTorrent UDP 97.785

One user in each dataset received approximately 50 MB of data while their IP address was inactive. As each dataset covers approximately a week, this would amount to over 200 MB each month (assuming a constant rate of sleeper traffic). This amount is not insignificant, but it would probably only concern users that were operating under a very small data cap. In that case, we would suggest that disabling their P2P clients would be a prudent course of action instead, based on these results.

Having said that, the PPStream case in the ISP 2009 dataset is not quite as simple. In our analysis, there was no evidence of PPStream traffic to that IP address in the nine hour period prior to the appearance of the sleeper PPStream traffic. The host was inactive for eight hours before the first PPStream flow was observed. All of the PPStream sleeper traffic then occurred in the space of 20 minutes before ceasing and not being observed again. This suggests that PPStream sleeper traffic may be both very bursty and unpredictable, although it only affected a single user in our dataset.

The "Unknown UDP" instance in the 2011 data appears to be the result of a data transfer using Cisco IPSec. Libprotoident was not confident enough to classify the contributing flows as Cisco IPSec, but both endpoints are using UDP port 10000 (which is the Cisco IPSec port).

By contrast, sleeper hosts that observed large quantities of UDP BitTorrent traffic typically saw the first UDP flow within 2 minutes of becoming inactive and the flows were spread throughtout the period of inactivity. The length of time that the address was inactive was the major contributing factor to the amount of sleeper traffic observed.

The full distribution of the amount of sleeper traffic observed for each sleeper IP address is shown below. Note the logscale on the X-axis. The distribution is long-tailed, meaning that the top 5 sleeper hosts discussed earlier are the exception rather than the norm. By comparison, 90% of sleeper hosts received less than 1 MB of sleeper traffic over the course of a week.

Sleeper Traffic Degradation

200920102011

Another interesting property of sleeper traffic is the degradation rate, i.e. the decline in sleeper traffic as the host remains idle for longer periods of time. In theory, we expect that most sleeper flows would occur within the first few minutes of an address becoming inactive and the rate of sleeper traffic would rapidly degrade as external hosts realise the address is not longer responding. Different applications will likely have different degradation rates.

The above graphs show the amount of traffic observed for flows beginning X minutes after the time that the local address was last active, plotted as rolling means. The "All" line shows the degradation rate across all observed sleeper flows while the other lines show the rate for individual prominent protocols.

The 2009 dataset contains some interesting spikes. The "Other" spike at approximately 470 minutes is due to the flood of PPStream traffic discussed earlier. The "Unknown" spike at 180 minutes is the result of a single large flow, which we have been unable to identify.

In all three datasets, it appears the BitTorrent UDP traffic degrades faster than scan / probe traffic, i.e. the No Payload and Windows Messenger lines. This could be due to the hosts slowly being removed from DHTs as they remain idle.

Conclusions

Overall, these results show that genuine (i.e. non-scan) sleeper traffic does exist in ISP networks, but they also show that it is not present in any significant quantity. Much of the traffic that was sent to inactive hosts was not obviously distinguishable from regular dark traffic, aside from a certain amount of BitTorrent UDP traffic. Most inactive IP addresses received less than 1 MB of traffic over the course of a single week, which suggests that the average user is unlikely to notice it.

For the few users who do receive a significant quantity of sleeper traffic (where significant is more than 20 MB per week), the cause can be easily related back to the use of P2P software.