Understanding networking concepts

Category: STP

STP related content

IEEE 802.1D Spanning Tree Topology Changes

The Spanning Tree Protocol (STP) was designed to avoid Layer-2 loops in a LAN with redundant links. The potential loops are avoided in STP by forcing some switch ports to move into a blocking state in which they are not allowed to forward data frames. But the protocol must also be able to react to topology changes and converge to a new logical topology in which some of the blocked ports need to be moved to a forwarding state.

In the original IEEE 802.1D specification of the STP protocol, there is no mechanism to ensure that all the switches of the STP domain are already aware of the change in the topology and have reacted accordingly. As a consecuence, a switch running STP cannot move a port directly to the forwarding state because it may cause a temporary loop. Unused MAC table entries can also be a source of temporary loops and outages in the network during a topology change. STP rely on timers and the reduction of the MAC table aging timer to handle topology changes.

In this blog post, we will examine the events and BPDUs generated during a topology change with the help of Wireshark and a sample network in GNS3:

S1 has been configured as the root bridge using the command spanning-tree vlan 1 root primary:

S1 configured as root

S2 has been configured as the secondary root bridge using the command spanning-tree vlan 1 root secondary:

S2 configured as secondary root

S2 will become the designated switch on the network segment between S2 and S3. Therefore, S3 will place its Gi0/1 port into a blocking state to prevent a forwarding loop.

In a stable scenario, S1 will generate configuration BPDUs every 2 seconds (hello timer). These BPDUs will be received by S2 and S3 on their root ports. In the following Wireshark capture, we can see that S1 is sending configuration BPDUs to S2 every 2 seconds. Notice that S2 does not send any BPDU to S1 because a switch does not send BPDUs through its root port in a stable topology:

From the traffic capture on the link between S2 and S3, we can see that S2 is relaying the BPDUs it receives from S1 to S3. When the BPDUs are forwarded to S3, S2 changes the root path cost from 0 to 4 and uses its Bridge ID (BID) as the sender BID (the Bridge Identifier field):

When S3 receives these BPDUs, it realizes that there is a path to the root through S2, but it choose its Gi0/0 port as root port because it is the best path to reach the root. S3 does not send BPDUs to S2 because its Gi0/1 port is in a blocking state.

In this network topology, all traffic between PC2 and PC3 must go through S1. Let’s generate some traffic between PC3 and PC2 to verify how their MAC address are learned by the switches:

S3 has learned the MAC address of PC2 through its root port Gi0/0:

PC2 MAC address
S3 MAC table

S2 has learned the MAC address of PC3 through its root port Gi0/0:

PC3 MAC address
S2 MAC table

Now let’s shut down the interface Gi0/0 on S2 to simulate a link failure between S1 and S2. Before shutting down the port, let’s activate the debugging of STP events on S3:

Immediately after the port Gi0/0 on S2 is shut down, S2 starts annoucing itself as the root. Since S3 is blocking on its Gi0/1 port, S2 was only receiving BPDUs from its root port which is now administratively down. S2 realizes that all the paths to the root has been lost and starts an election process which results in declaring itself as the root bridge.

Let’s examine the events captured on S3:

During aproximately 20 seconds (MaxAge timer), S3 ignores the inferior BPDUs it receives from S2. Only after the MaxAge timer expires, S3 reacts to those inferior BPDUs and unblocks its Gi0/1 port. As soon as the port Gi0/1 enters the Listening state, S3 starts sending BPDUs to S2. S2 realizes that these BPDUs are superior and stops claiming itself as the root. S2 selects its Gi0/1 port as the root port and generates a Topology Change Notification (TCN) BPDU, which is received by S3 on its Gi0/1 port. Then, S3 forward the TCN towards the root S1. From the debug, we can see that S3 also generates a TCN when the Gi0/1 port transitions to the Forwarding state.

Let’s examine the Wireshark captures on the link between S2 and S3. After the link failure, S2 starts announcing itself as the root:

When the MaxAge timer expires, S3 starts relaying the BPDUs generated by S1 to S2:

When S2 receives the superior BPDU, it elect its Gi0/1 port as the root port and generates a TCN:

S3 acknowledges the TCN received from S2:

Examining the Wireshark capture on the link between S1 and S3, we can see the TCN relayed from S3 to S1:

As soon as S1 receives the TCN, it acknowledges the TCN and sets the Topology Change (TC) bit on the subsequent BPDUs:

S3 sends a second TCN to S1 when its Gi0/1 port transitions to Forwarding state:

S1 sets the TC bit on the configuration BPDUs for the duration of MaxAge+Forward-delay seconds:

When S2 and S3 receive a configuration BPDU with the TC bit set, they reduce their MAC address table aging time from the default value of 300 seconds to Forward-delay seconds (15 seconds by default). This way, during a topology change, switches can age out old dynamic entries that were learned using the network topology prior to the link failure. This also prevents network outages due to “blackholes“. For example, imagine that PC3 is sending traffic to PC2. In the initial topology, the traffic will follow the path PC3->S3->S1->S2->PC2. If the link between S2 and S3 fails, the data traffic will be interrupted for about 50 seconds until the STP converges to the new topology. But without the MAC aging reduction mechanism, S3 will keep sending the frames from PC3 to S1 for 300 seconds until the dynamic entry for PC2’s MAC address expires.

To conclude, let’s explain why S3 waits for MaxAge seconds before reacting to the inferior BPDUs received from S2. The reception of an inferior BPDU from S2 may indicate that S2 has lost its conection to the root or that the root is down. The key point is to understand that, even though S3 is still receiving BPDUs from S1 every 2 seconds, it is not safe to assume that S1 is still alive. In this small topology, we could probably safely conclude that S1 is alive because it is directly connected to S3, but in a larger topology with several hops between the switch and the root, we may receive some BPDUs after the root switch goes down due to propagation delays. This is the reason why STP maintain the old root information for a port during MaxAge seconds before accepting the new information.

Temporary loops in STP

When I was studying all the basic concepts related to the traditional STP protocol, I start asking myself one question: why do we need the listening and learning states? Why can’t a switch interface move from a blocking state directly to a forwarding state?

The theory says that if the LAN topology suddenly changes (for example, a link fails) and an interface that was previously blocked is moved to a forwarding state, a temporary loop may be created. The reason for these temporary loops could be the old MAC table entries that were learned using the old topology. To solve this problem, STP defines two interim states (listening and learning states).

During the listening state, the old MAC table entries are removed and during the learning state the interface starts to learn the source MAC addresses of the received frames. These two transitory states help the switches to adapt to the new topology and avoid the creation of potential temporary loops.

All of this may sound reasonable, but I wasn’t able to find a scenario where a temporary loop was created as a consequence of moving an interface directly from a blocking state to a forwarding state. Then I tried to search on the Internet and found some forum discussions about this topic and, surprisingly, I came up with a blog post from the author of the Cisco Official Cert Guide for the CCNA certification, Wendell Odom. In that post, the author admitted he wasn’t able to find a case in which the listening state is really necessary in STP to avoid temporary loops. He also quoted a fragment from a book written by Radia Perlman, the creator of STP. In that book, Radia Perlman even suggested the listening state wasn’t really necessary.

I strongly recommend reading the Wendell Odom article. But, as a summary, it seems that learning MAC addresses immediately after unblocking an interface isn’t harmful. Even though an interface could potentially learn a wrong MAC address it will not create a loop. Therefore, the listening and learning state could have been merged into a simple “preforwarding state” in the original STP definition, as suggested by Radia Perlman in her book.

© 2022 Networking Tales

Theme by Anders NorenUp ↑