Skip to main content

ring logo

Category: announcement

Maintenance: Upgrade to Ubuntu 22.04 LTS

On February 7 2023 we will be upgrading all Ubuntu 18.04 LTS (aka “Bionic”) RING nodes to Ubuntu 22.04 LTS (aka “Jammy”). This upgrade is necessary to ensure that all RING nodes receive timely security updates. An additional benefit is that we’ll have more modern software.

Note: Do not perform the Ubuntu upgrade yourself, the RING admins will first attempt an upgrade.

Timeline

2023-02-06 17:00 UTC: Disable RING SQA on all nodes

By disabling SQA we make sure no excessive alerts are being sent while the nodes are being upgraded.

2023-02-07 08:00 UTC: Configure upgrade playbook on all nodes

Each node will be automatically upgraded to Ubuntu 20.04, rebooted, upgraded to Ubuntu 22.04 and rebooted again. We expect all nodes to complete this process around 11:00 UTC.

2023-02-07 13:00 UTC: Reach out to owners of nodes which didn’t survive

We expect that a number of nodes (for whatever reason) will not successfully upgrade to 22.04. We will notify you if the upgrade was not succesful. In such cases we’ll simply ask that you re-provision the node with Ubuntu 22.04 LTS and not bother trying to repair the upgrade.

2023-02-08: Re-enable RING SQA, celebrate

Thursday most of the NLNOG RING should be back to normal, until next time in 2027 :-)

If you have any questions or concerns, let us know!

10 years of NLNOG RING

This month marks the tenth anniversary of the NLNOG RING project. In this article we look back on how the project came to be and how it evolved over this past decade.

A network engineer’s tale…

The story of NLNOG RING starts on the #nlnog IRC channel, at the end of 2010. A network engineer received complaints that his customers had difficulty reaching various destinations in several Dutch networks. The case was a curious one, because the problem would come and go. Some TCP sessions would establish immediately, whereas others would take multiple attempts before a connection was made. It was clear something was broken, but locating the root cause proved to be difficult.

To find the source of the problem, the engineer proceeded to ask engineers from other networks for traceroute outputs, gathering data about how packets would travel from their networks to his. The other engineers were of course happy to help, but because each question had to be answered individually it took a long time to gather the necessary data. All in all it took several days to get a complete picture and identify a root cause, which turned out to be a faulty backbone link within the fabric of a large Dutch internet exchange point.

Manual coordination of network troubleshooting

During the surrounding discussion on the IRC channel, seeing the amount of effort it took to collect the required information from different vantage points, the question came up: “What if we had a way for an engineer to access other networks securely, and collect troubleshooting data, without having to wait for the other side?” Several people immediately offered to dedicate servers or virtual machines to the project, and a few others started building tooling for software installation and user management. And so, in January 2011, NLNOG RING was born.

Architecture and tools

The NLNOG RING is a “looking glass on steroids”. Participants join the project by making a (virtual) server available, hosted inside their own network. In return they gain access to their own shell account on all the machines provided by all other participating networks.

Right from the start we were conscious of the fact that we would have to manage a potentially large number of systems with a small group of volunteers. To do this in a time-efficient manner we deployed Puppet on all provided systems. This allowed us to install software tools and configure users in a centralized manner. To further limit the scope of work we decided to support only a single operating system: Ubuntu LTS. For security we did not want to rely on passwords. All user access is controlled through SSH keys and there is no superuser access for any of the participants.

The basis of the NLNOG RING is a shell account, which offers a lot of freedom to participants to run their own troubleshooting scripts or programs. To add extra value to this, each machine is provisioned with a collection of commonly used network troubleshooting tools. We provide a DNS-interface and a RESTful API for retrieving participant and node information, and we have a regular BGP looking glass providing insight into many networks.

NLNOG RING is a community project. Over the years, many people have contributed tools and code to make the project more useful. One of the first tools was ring-trace, a piece of software to run traceroutes from different vantage points, and display them in a graphical format.

Example of ring-trace output

Another user-contributed tool is ring-sqa, a piece of software that attempts to automatically detect connectivity problems between NLNOG RING nodes and notifies their owners. Events are also correlated to detect larger, sometimes Internet-wide outages, which are published on a dashboard.

Since 2013 we also cooperate with RIPE Atlas, to combine the strengths of the two platforms. NLNOG RING nodes are selectable as measurement targets in the RIPE Atlas interface. Furthermore, the RIPE Atlas tools package is installed on all NLNOG RING nodes, so participants can integrate RIPE Atlas measurements in scripts run on the RING.

Operating model and sponsoring

The NLNOG RING was started by a couple of network engineers in their free time, and is still completely run by a small number of volunteers. All participating networks provide their own machines. In most cases (75%) this is a VM, making the barrier to participate very low.

At the start of the project all management tooling was running on infrastructure from InTouch, the employer of one of our founding volunteers. As the project grew the requirement for some dedicated management infrastructure arose. In 2013 we successfully held a fundraiser, which enabled us to obtain the necessary hardware for hosting our management tooling. Over the years more sponsors donated resources. These generous donations help us to run the project on essentially zero budget.

Growth

Because the project originated within the Dutch ISP community, the first participants were all Dutch network operators. After giving our first public presentation at RIPE62 in May of 2011, ISPs from outside The Netherlands also showed interest in participating. While The Netherlands is still the country with the most active participants (93 nodes as of January 2021), the majority of participants is based elsewhere. At the time of writing we have 472 participating autonomous systems, with (virtual) machines in 56 countries.

Map of NLNOG RING nodes (January 2021)

Supporting all these machines was significantly increasing in load on our central Puppet server, to a point where in 2016 configuration of a single machine would take more than 30 minutes. In addition to this we were facing the planned obsolescence of Puppet 2, which meant we would have to rewrite a significant part of our configurations to a syntax supported by Puppet 3. Altogether a good opportunity to re-evaluate our architecture.

After evaluating several configuration management systems we decided on Ansible, mostly because of its support for a masterless “pull” model. In this model all servers download their latest configs from a source code repository, and apply their changes locally. This removes the need for a centralized management server, which means we can scale to a virtually unlimited number of machines. All configuration files are published on our GitHub repository, so that all participants can contribute.

To further cope with the increased growth in participants and machines we automated health monitoring, to automatically notify participants of problems with the (virtual) machines they provided to us.

What’s next?

In ten years the NLNOG RING has grown from a handful of machines in the Netherlands to over 500 nodes worldwide, and we continue to see the number of active nodes grow. To scale the platform further we plan to invest some time in building more self service tooling for provisioning of machines. Another item high on our wishlist is a graphing solution that displays latency and packet loss on the full mesh of network paths between all nodes. We will also continue to add features and tools requested by participants.

We of course hope to continue to see a diverse set of ISPs join the project. The success of the project largely depends on the networks that provide us with resources. We thank all current participants for making the NLNOG RING a huge success! Tell your friends to join too!

NLNOG RING has reached 500 active nodes!

We are pleased to announce that this week we have reached the milestone of 500 active ring nodes!

The NLNOG RING started in december 2010 as a debugging tool for Dutch network operators to troubleshoot connectivity problems between their networks. Over the last nine years we’ve seen a steady growth and international expansion. We now have presence in 441 autonomous systems, spread over 55 countries on 6 continents.

We thank all participants and sponsors for making this project a massive success!

Ubuntu 18.04 (Bionic Beaver) now supported on NLNOG RING

We are pleased to inform you that we now support Ubuntu 18.04 (Bionic Beaver) for new ring node installs.

Unlike the migration to Ubuntu 16.04 we will not be performing a ‘big bang’ upgrade of all existing ring nodes to release 18.04. Instead, we will be asking each node owner to upgrade their own hosts. We will contact all node owners with instructions in a couple of weeks.

We will be supporting Ubuntu 16.04 for some time to come. To simplify our provisioning system we will most likely end our support before the EOL date set by Canonical. Further updates on our plans will follow once a timeline has been determined.

If you have any questions or concerns, please let us know.

Upgrading the entire RING to Ubuntu 16.04 LTS

Next week we’re going to upgrade all Ubuntu 12.04 LTS (aka “Precise”) hosts to Ubuntu 16.04 LTS (aka “Xenial”). This upgrade is necessary to ensure that all RING nodes receive timely security updates. An additional benefit is that we’ll have more modern software.

Do not ubuntu upgrade yourself, the RING admins will first attempt an upgrade. If possible, a RAM upgrade to at least 1GB would be appreciated.

Timeline:

Sun 09-04-2017T23:59Z - disable RING SQA on all nodes

Since the upgrades require reboots & daemon restarts, and we can’t exactly pace and predict when what happens, we don’t want to risk sending people false RING SQA alerts dude to clusters of nodes being upgraded. The solution is that we will temporary disable RING SQA.

Mon 10-04-2017 - launch ansible upgrade script on all nodes

Martin Pels authored a really cool ansible playbook which facilitates the upgrade path (which actually is 12.04 -> 14.04 -> 16.04).

We’ll disable all auxiliary services (scamper, ringfpingd, munin-node), run the playbook against all nodes, reboot them, and hope for the best!

Wed 11-04-2017 - reach out to owners of nodes which didn’t survive

We expect that a number of nodes (for whatever reason) will not successfully upgrade to 16.04. Our management system will notify you if the upgrade was not succesful. In such cases we’ll simply ask that you re-provision the node with Ubuntu 16.04 LTS and not bother trying to repair the upgrade.

Thu 12-04-2017 - enable RING SQA on all nodes, celebrate

Thursday most of the NLNOG RING should be back to normal, until next time in 2022 :-)

If you have any questions or concerns, let us know!

New monitoring tool: RING SQA

A new partial outage detector dubbed RING SQA is available to all RING participants. The purpose of the tool is to detect outages as fast as possible that only affect a subset of all internet destinations.

RING SQA pings all other nodes (v4 + v6) every 30 seconds to derive a baseline, this baseline is compared to the last 3 minutes of measurements. If the median of the baseline is tripped for three consecutives minutes, an alarm is raised.

When an alarm is raised, three MTRs are immediately launched towards destinations that previously were reachable, but suddenly not anymore. The purpose of these traces is to provide an investigation starting point for your NOC.

All in all super fast outage detection. All participants are invited to use this system! Gratis! :-)

One can simply configure where alerts should be emailed by changing the /etc/ring-sqa/alarm.conf file on your own RING node(s) to something like this (do keep in mind the indenting!):

job@ringnode01.ring.nlnog.net:~$ sudo cat /etc/ring-sqa/alarm.conf
---
email:
  to: noc@yourcompany.com
  from: sqa-alert@ your_ring_node .ring.nlnog.net
  prefix: 'RING ALERT '
irc:
  host: 1.2.3.4
  port: 5502
  password: derp
  channel: ! '#noc'

Afterwards restart the ring-sqa daemons to load the new config:

job@ringnode01:~$ sudo restart ring-sqa4
ring-sqa4 start/running, process 18715
job@ringnode01:~$ sudo restart ring-sqa6
ring-sqa6 start/running, process 18727

Et voila! After 30 minutes the machine will stand guard over your network. RING participants with multiple hubs or datacenters will benefit from spinning up more nodes, as monitoring is from each RING nodes individual perspective.

I’d like to extend a HUGE thank you to Saky Ytti who wrote RING SQA just last weekend. Please send him beer, chocolate and flowers.

Below I’ve included an example outage alert.

------------------ Example RING SQA Message ---------------------------

From: sqa@xing02.ring.nlnog.net
To: noc@
Subject: RING ALERT raising ipv4 alarm - 16 new nodes down
Body:

Regarding: xing02.ring.nlnog.net ipv4

This is an automated alert from the distributed partial outage
monitoring system "RING SQA".

At 2014-07-27 10:18:05 UTC the following measurements were analysed
as indicating that there is a high probability your NLNOG RING node
cannot reach the entire internet. Possible causes could be an outage
in your upstream's or peer's network.

The following nodes previously were reachable, but became unreachable
over the course of the last 3 minutes:

- itps01.ring.nlnog.net            128.65.97.93 AS42010 GB
- fullsave01.ring.nlnog.net       141.0.202.201 AS39405 FR
- globalaxs01.ring.nlnog.net       176.10.80.10 AS 9009 GB
- kwaoo01.ring.nlnog.net         178.250.209.33 AS24904 CH
- suretec01.ring.nlnog.net          185.8.92.17 AS199659 GB
- swisscom01.ring.nlnog.net      193.247.170.254 AS 3303 CH
- claranet01.ring.nlnog.net         195.157.9.4 AS 8426 GB
- claranet04.ring.nlnog.net        195.22.19.34 AS 8426 PT
- dcsone01.ring.nlnog.net         203.123.48.14 AS37989 SG
- trueinternet01.ring.nlnog.net  203.144.167.57 AS 7470 TH
- jump01.ring.nlnog.net          212.13.217.117 AS 8943 GB
- lchost01.ring.nlnog.net        213.230.217.125 AS25098 GB
- suomi01.ring.nlnog.net         217.119.42.194 AS16302 FI
- melbourne01.ring.nlnog.net     37.128.187.253 AS39451 GB
- netability01.ring.nlnog.net       46.182.9.20 AS 1197 IE
- viatel02.ring.nlnog.net          46.183.108.2 AS31122 FR
- claranet06.ring.nlnog.net          92.54.7.29 AS 8426 ES

As a debug starting point 3 traceroutes were launched right after
detecting the event, they might assist in pinpointing what broke:

trueinternet01.ring.nlnog.net  AS 7470 (TH)
mtr -i0.5 -c5 -r -w -n 203.144.167.57
  1.|-- 109.233.156.241        0.0%     6    0.5   0.5   0.5   0.6   0.0
  2.|-- 109.233.156.1          0.0%     5    0.8   0.9   0.8   1.1   0.1
  3.|-- 109.233.156.2          0.0%     5    0.8   0.8   0.8   0.9   0.0
  4.|-- 64.209.88.33           0.0%     5    0.9   1.0   0.9   1.5   0.3
  5.|-- 159.63.23.198         60.0%     5  265.1 264.9 264.7 265.1   0.3
  6.|-- ???                   100.0     5    0.0   0.0   0.0   0.0   0.0
  7.|-- ???                   100.0     5    0.0   0.0   0.0   0.0   0.0
  8.|-- ???                   100.0     5    0.0   0.0   0.0   0.0   0.0
  9.|-- ???                   100.0     5    0.0   0.0   0.0   0.0   0.0
 10.|-- ???                   100.0     5    0.0   0.0   0.0   0.0   0.0
 11.|-- 203.144.144.30        80.0%     5  297.4 297.4 297.4 297.4   0.0
 12.|-- ???                   100.0     4    0.0   0.0   0.0   0.0   0.0

fullsave01.ring.nlnog.net      AS39405 (FR)
mtr -i0.5 -c5 -r -w -n 141.0.202.201
  1.|-- 109.233.156.241        0.0%     6    0.5   0.5   0.5   0.5   0.0
  2.|-- 109.233.156.1          0.0%     5    0.8   3.2   0.8  12.2   5.0
  3.|-- 109.233.156.2          0.0%     5    0.8   0.9   0.8   1.0   0.1
  4.|-- 109.233.156.37         0.0%     5    1.0   1.0   0.9   1.5   0.3
  5.|-- 149.11.106.1           0.0%     5    1.1   1.4   1.1   1.7   0.2
  6.|-- 130.117.3.137          0.0%     5    1.5   1.7   1.5   1.8   0.2
  7.|-- 154.54.62.77           0.0%     5   11.4  11.7  11.3  13.0   0.7
  8.|-- 154.54.75.154          0.0%     5  201.0 166.9  66.9 323.0 101.5
  9.|-- 154.54.56.214          0.0%     5   23.0  23.0  22.8  23.0   0.1
 10.|-- 149.11.58.62          80.0%     5   26.4  26.4  26.4  26.4   0.0
 11.|-- ???                   100.0     5    0.0   0.0   0.0   0.0   0.0
 12.|-- 141.0.202.201         80.0%     5   25.0  25.0  25.0  25.0   0.0

globalaxs01.ring.nlnog.net     AS 9009 (GB)
mtr -i0.5 -c5 -r -w -n 176.10.80.10
  1.|-- 109.233.156.241        0.0%     6    0.4   0.5   0.4   0.5   0.0
  2.|-- 109.233.156.1          0.0%     5    0.9   1.8   0.7   5.3   1.9
  3.|-- 81.201.115.41          0.0%     5    0.9   0.9   0.8   1.0   0.1
  4.|-- 62.209.32.18          40.0%     5    1.3   1.2   1.2   1.3   0.1
  5.|-- 80.81.192.165          0.0%     5    1.3   9.3   1.2  41.5  18.0
  6.|-- 193.27.64.245         60.0%     5  191.9 108.1  24.3 191.9 118.5
  7.|-- 193.27.64.66          80.0%     5   43.6  43.6  43.6  43.6   0.0
  8.|-- ???                   100.0     5    0.0   0.0   0.0   0.0   0.0
  9.|-- ???                   100.0     5    0.0   0.0   0.0   0.0   0.0
 10.|-- 176.10.80.2           80.0%     5   26.1  26.1  26.1  26.1   0.0
 11.|-- ???                   100.0     5    0.0   0.0   0.0   0.0   0.0
 12.|-- ???                   100.0     4    0.0   0.0   0.0   0.0   0.0
 13.|-- ???                   100.0     3    0.0   0.0   0.0   0.0   0.0
 14.|-- ???                   100.0     2    0.0   0.0   0.0   0.0   0.0
 15.|-- 176.10.80.10           0.0%     1   24.3  24.3  24.3  24.3   0.0

An alarm is raised under the following conditions: every 30 seconds
your node pings all other nodes. The amount of nodes that cannot be
reached is stored in a circular buffer, with each element representing
a minute of measurements. In the event that the last three minutes are
1.2 above the median of the previous 27 measurement slots, a partial
outage is assumed. The ring buffer's output is as following:

29 min ago  41 measurements failed (baseline)
28 min ago  41 measurements failed (baseline)
27 min ago  41 measurements failed (baseline)
26 min ago  42 measurements failed (baseline)
25 min ago  41 measurements failed (baseline)
24 min ago  41 measurements failed (baseline)
23 min ago  41 measurements failed (baseline)
22 min ago  41 measurements failed (baseline)
21 min ago  41 measurements failed (baseline)
20 min ago  41 measurements failed (baseline)
19 min ago  41 measurements failed (baseline)
18 min ago  41 measurements failed (baseline)
17 min ago  41 measurements failed (baseline)
16 min ago  41 measurements failed (baseline)
15 min ago  41 measurements failed (baseline)
14 min ago  41 measurements failed (baseline)
13 min ago  41 measurements failed (baseline)
12 min ago  41 measurements failed (baseline)
11 min ago  41 measurements failed (baseline)
10 min ago  41 measurements failed (baseline)
 9 min ago  41 measurements failed (baseline)
 8 min ago  41 measurements failed (baseline)
 7 min ago  41 measurements failed (baseline)
 6 min ago  41 measurements failed (baseline)
 5 min ago  41 measurements failed (baseline)
 4 min ago  41 measurements failed (baseline)
 3 min ago  45 measurements failed (baseline)
 2 min ago  66 measurements failed (raised alarm)
 1 min ago  65 measurements failed (raised alarm)
 0 min ago  65 measurements failed (raised alarm)


Kind regards,

NLNOG RING

ntp.ring.nlnog.net time service now available

The NLNOG RING is happy to announce the launch of a novel service, available for the general public.

A stratum-1, GPS synced, high-quality time server is available over IPv4 and IPv6 at:

ntp.ring.nlnog.net

  • Location: InterXion5, Amsterdam, Netherlands
  • ACL: restrict -{4,6} default kod notrap nomodify nopeer noquery

The service was made possible through generous contributions from the following companies:

State of the RING 2012

A yearly newsletter about the best debugging project ever

Edition 2, 28 Dec 2012

Contents

  1. Introduction: growth
  2. New services
  3. Into the future!
  4. Testimonials & conference exposure
  5. Closing notes

1. Introduction: growth

Today, 28th of december 2012, the RING turns 2 years old! The project’s origins can be traced back to the following line on IRC (translated):

[28 Dec 2010/10:26 CET - #NLNOG]
"hey, maybe we should create a #nlnog test platform"

And now here we are, 2 years later: 188 machines provided by 165 organisations in 39 countries. Compared to 2011 we have more than doubled in size!

From our usage statistics we gather that 65% of organisations actively makes use of the RING through SSH.

Other statistics gathered from our code repositories:

  • 1111 commits were made (doubled since last year!)
  • Most code commits happened, again, on Tuesdays
  • Lines of code: 20267 insertions(+), 5774 deletions(-)

We could not have sustained this level of growth without the continuing support of our infrastructure sponsors and the 2012 fundraiser contributors:

  • XS4ALL, Amazon Web Services, Leaseweb, Atrato IP Networks,
  • Gossamer Threads, BIT, PCextreme, SoftLayer, Snijders IT,
  • Solido Hosting, Duocast, A2B Internet, Nedzone, Tetaneutral,
  • LCHost, Previder, Triple IT and eBay Classifieds Group.

A full overview of all supporters can be found here: https://ring.nlnog.net/patrons/

2. New services

AMP (Active Measurement Project):

One of the biggest changes this year was moving from a distributed master/slave smokeping setup to something much better: AMP. The AMP software performs measurements from every RING node to every other RING node, and reports the results to central collectors which in turn feed a web interface.

AMP as a tool offers us insight in end-to-end MTU, jitter, latency, packetloss and historic traceroutes both for IPv4 and IPv6 between all RING nodes.

URL: http://amp.ring.nlnog.net/

BGP Looking glass:

Due to popular request we set up a BGP looking glass, which currently receives full IPv4 and IPv6 tables from 35+ participants. The LG uses the BIRD BGP daemon with a web interface written by one of the RING participants!

We currently are exploring if we can use the collected data for a monitoring and alerting system to help participants gain insight in prefix visibility and, for instance hijack events. Stay tuned!

URL: http://lg.ring.nlnog.net/

IRR/RPLS services:

Although the IRR system and RPSL have been around for a long time, there still is a lot of room for improvement in terms of performance, ease of use and standardised methods and tools.

We believe that the RING community can make a difference in the popularity of proper filtering. One of the first things we offer (in beta) is a web service to expand RPSL object such as AS-SET, AUTNUM and RS-SETs and expose the data in JSON.

URL: http://irr.ring.nlnog.net/

3. Into the future!

In 2013 we will continue to automate as many aspects of the RING as possible. But more importantly, the RING has to become the best swiss army knife a network engineer can imagine so we will focus on usability, more advanced tools and security.

We are making a lot of progress towards publication of all kinds of RING related information in an easy accessible database, we imagine this will accelerate the development of a new generation of tools!

We also started talks with other debugging projects such as RIPE Atlas to explore if cooperation and exchange of information can further such projects.

As the RING is a community effort, we can only become more valuable to our members by help from the community. We need you for new creative ideas, high quality code, and of course more RING nodes. If you can help out in our efforts, don’t hesitate to contact us!

4. Testimonials & Exposure

Two debugging cases have been documented, where the RING proved to be of vital importance when debugging an issue at hand.

Route leak:

A root-cause analysis based on historic data collected by the RING.

URL: https://ring.nlnog.net/news/root-cause-analysis-using-amp/

The IPv4 address that ended with .255:

How the variety in RING nodes helped locate an ancient, dysfunctional ACL.

URL: https://ring.nlnog.net/news/ring-success-the-ipv4-255-problem/

Conference exposure:

Various RING participants have spoken at Internet oriented conferences around the world. The following meetings made a presentation slot available to the RING: eduPERT, LINX79, MENOG11, NANOG56 and RIPE65.

All of these presentations have helped the RING grow, as in the days after such a presentation we saw a spike in RING participant applications.

If you want to present about the RING at a meeting in your region or local operator community, please contact us. We have great slides available for this purpose!

5. Closing notes

We conclude this newsletter by saying to you, the participants, THANK YOU! Without the continued support from lots of participants the RING would not be where it is today. We are proud to be playing a small role in making the Internet an easier thing to debug and research.

Again, thank you!

Kind regards,

Job Snijders, Martin Pels, Peter van Dijk, Edwin Hermans, ringthing

RING success - the IPv4 .255 problem

This is an example one of the RING participants (BelWü) wanted to share with all RING participants.

Some days ago a customer of us encountered problems with connecting from another ISPs (via DSL) to his VPN concentrator in his head-office network which is connected through us.

Unfortunately the other ISP does not participate within the RING. But nevertheless I did a RING-traceroute to the VPN concentrator to check if there are (other) reachability problems. The customer was reachable from all other sites but not from keenondots01. So I looked deeper into that, tried some traceroutes etc. and noticed this in /etc/hosts:

91.218.150.255 keenondots01 keenondots01.ring.nlnog.net

Mh, a .255 IPv4 address… Until that time I did not known the source IP address on the remote (DSL) site, asked the customer, and yes it was a .255 address. So the problem was that some old firewall statements denied all IPv4 addresses with 255 in the last octet. And this was easy to fix.

I think it’s great to have that variation of IP addresses, AS numbers, upstreams, firewalls (or at least basic access-lists) etc. within the RING. So even if another ISP does not participate within the RING there is maybe a RING participant with a similar setup for its RING node which will give you the hint to find the solution for your problem.

Routing issue root cause analysis using AMP

This is an example one of the RING participants (eBay Classifieds Group) wanted to share with all RING participants.

Several weeks ago we (eBay Classifieds Group) encountered an issue with some customers coming from Denmark (more precise, TDC customers), having issues reaching websites in the eBay Classifieds Grou network. These issues were showing as a slow website, and packetloss to our network. This lasted for some time, but it didn’t escalate in time to me, so by the time it did, the issue was already gone.

Now I haven’t been following the connected member list for the NLNOG/RING project, but Job Snijders pointed me out that TDC does have a RING node!

Now Job showed me during that weekend while we had some drinks, his new cool tool on the ring, it was even better then what I did pitch to him some moons ago. Not just latency monitoring, but the NLNOG/RING project keeps track of the number of hops, and keeping archives of traceroutes. And it all presents it in a very nice interface.

Debugging:

First, looked up the actual issue at hand.

AMP Latency Measurement: A

From 13:40 there is an increased jitter, and packet loss visible! So, let’s check out that cool graph that displays number of hops history.

AMP Path Analysis: B

And we see increase in number of hops, now let’s take a look at the actual ‘traceroute’ history.

AMP Traceroute History: C

Take a small look at 13:45 on August 1st, hey… why has the traceroute from ECG towards TDC changed into going over the AMS-IX platform instead of the usual Level3 path? We see the real cause at 14:00, the number 3 hop has become Novatel in Bulgary.

Now, a quick search in my mailbox reveals that Novatel recently connected to AMS-IX, also this is listed on the AMS-IX “connected” page.

AMS-IX

Conclusion:

It seems Novatel went live at AMS-IX the day before, my idea is that they accidentally leaked their NTT routes via the AMS-IX routeservers, and had their NTT link congested by doing so. This might have been prevented if the AMS-IX routeservers would have done strict RPSL checking.

If you have any questions regarding the use of the tool, or question about this article, don’t hesitate to contact me: Maarten Moerman / mmoerman@ebay.com