Understanding networking concepts

Tag: STP

IEEE 802.1D Spanning Tree Topology Changes

The Spanning Tree Protocol (STP) was designed to avoid Layer-2 loops in a LAN with redundant links. The potential loops are avoided in STP by forcing some switch ports to move into a blocking state in which they are not allowed to forward data frames. But the protocol must also be able to react to topology changes and converge to a new logical topology in which some of the blocked ports need to be moved to a forwarding state.

In the original IEEE 802.1D specification of the STP protocol, there is no mechanism to ensure that all the switches of the STP domain are already aware of the change in the topology and have reacted accordingly. As a consecuence, a switch running STP cannot move a port directly to the forwarding state because it may cause a temporary loop. Unused MAC table entries can also be a source of temporary loops and outages in the network during a topology change. STP rely on timers and the reduction of the MAC table aging timer to handle topology changes.

In this blog post, we will examine the events and BPDUs generated during a topology change with the help of Wireshark and a sample network in GNS3:

S1 has been configured as the root bridge using the command spanning-tree vlan 1 root primary:

S1 configured as root

S2 has been configured as the secondary root bridge using the command spanning-tree vlan 1 root secondary:

S2 configured as secondary root

S2 will become the designated switch on the network segment between S2 and S3. Therefore, S3 will place its Gi0/1 port into a blocking state to prevent a forwarding loop.

In a stable scenario, S1 will generate configuration BPDUs every 2 seconds (hello timer). These BPDUs will be received by S2 and S3 on their root ports. In the following Wireshark capture, we can see that S1 is sending configuration BPDUs to S2 every 2 seconds. Notice that S2 does not send any BPDU to S1 because a switch does not send BPDUs through its root port in a stable topology:

From the traffic capture on the link between S2 and S3, we can see that S2 is relaying the BPDUs it receives from S1 to S3. When the BPDUs are forwarded to S3, S2 changes the root path cost from 0 to 4 and uses its Bridge ID (BID) as the sender BID (the Bridge Identifier field):

When S3 receives these BPDUs, it realizes that there is a path to the root through S2, but it choose its Gi0/0 port as root port because it is the best path to reach the root. S3 does not send BPDUs to S2 because its Gi0/1 port is in a blocking state.

In this network topology, all traffic between PC2 and PC3 must go through S1. Let’s generate some traffic between PC3 and PC2 to verify how their MAC address are learned by the switches:

S3 has learned the MAC address of PC2 through its root port Gi0/0:

PC2 MAC address
S3 MAC table

S2 has learned the MAC address of PC3 through its root port Gi0/0:

PC3 MAC address
S2 MAC table

Now let’s shut down the interface Gi0/0 on S2 to simulate a link failure between S1 and S2. Before shutting down the port, let’s activate the debugging of STP events on S3:

Immediately after the port Gi0/0 on S2 is shut down, S2 starts annoucing itself as the root. Since S3 is blocking on its Gi0/1 port, S2 was only receiving BPDUs from its root port which is now administratively down. S2 realizes that all the paths to the root has been lost and starts an election process which results in declaring itself as the root bridge.

Let’s examine the events captured on S3:

During aproximately 20 seconds (MaxAge timer), S3 ignores the inferior BPDUs it receives from S2. Only after the MaxAge timer expires, S3 reacts to those inferior BPDUs and unblocks its Gi0/1 port. As soon as the port Gi0/1 enters the Listening state, S3 starts sending BPDUs to S2. S2 realizes that these BPDUs are superior and stops claiming itself as the root. S2 selects its Gi0/1 port as the root port and generates a Topology Change Notification (TCN) BPDU, which is received by S3 on its Gi0/1 port. Then, S3 forward the TCN towards the root S1. From the debug, we can see that S3 also generates a TCN when the Gi0/1 port transitions to the Forwarding state.

Let’s examine the Wireshark captures on the link between S2 and S3. After the link failure, S2 starts announcing itself as the root:

When the MaxAge timer expires, S3 starts relaying the BPDUs generated by S1 to S2:

When S2 receives the superior BPDU, it elect its Gi0/1 port as the root port and generates a TCN:

S3 acknowledges the TCN received from S2:

Examining the Wireshark capture on the link between S1 and S3, we can see the TCN relayed from S3 to S1:

As soon as S1 receives the TCN, it acknowledges the TCN and sets the Topology Change (TC) bit on the subsequent BPDUs:

S3 sends a second TCN to S1 when its Gi0/1 port transitions to Forwarding state:

S1 sets the TC bit on the configuration BPDUs for the duration of MaxAge+Forward-delay seconds:

When S2 and S3 receive a configuration BPDU with the TC bit set, they reduce their MAC address table aging time from the default value of 300 seconds to Forward-delay seconds (15 seconds by default). This way, during a topology change, switches can age out old dynamic entries that were learned using the network topology prior to the link failure. This also prevents network outages due to “blackholes“. For example, imagine that PC3 is sending traffic to PC2. In the initial topology, the traffic will follow the path PC3->S3->S1->S2->PC2. If the link between S2 and S3 fails, the data traffic will be interrupted for about 50 seconds until the STP converges to the new topology. But without the MAC aging reduction mechanism, S3 will keep sending the frames from PC3 to S1 for 300 seconds until the dynamic entry for PC2’s MAC address expires.

To conclude, let’s explain why S3 waits for MaxAge seconds before reacting to the inferior BPDUs received from S2. The reception of an inferior BPDU from S2 may indicate that S2 has lost its conection to the root or that the root is down. The key point is to understand that, even though S3 is still receiving BPDUs from S1 every 2 seconds, it is not safe to assume that S1 is still alive. In this small topology, we could probably safely conclude that S1 is alive because it is directly connected to S3, but in a larger topology with several hops between the switch and the root, we may receive some BPDUs after the root switch goes down due to propagation delays. This is the reason why STP maintain the old root information for a port during MaxAge seconds before accepting the new information.

Temporary loops in STP

When I was studying all the basic concepts related to the traditional STP protocol, I start asking myself one question: why do we need the listening and learning states? Why can’t a switch interface move from a blocking state directly to a forwarding state?

The theory says that if the LAN topology suddenly changes (for example, a link fails) and an interface that was previously blocked is moved to a forwarding state, a temporary loop may be created. The reason for these temporary loops could be the old MAC table entries that were learned using the old topology. To solve this problem, STP defines two interim states (listening and learning states).

During the listening state, the old MAC table entries are removed and during the learning state the interface starts to learn the source MAC addresses of the received frames. These two transitory states help the switches to adapt to the new topology and avoid the creation of potential temporary loops.

All of this may sound reasonable, but I wasn’t able to find a scenario where a temporary loop was created as a consequence of moving an interface directly from a blocking state to a forwarding state. Then I tried to search on the Internet and found some forum discussions about this topic and, surprisingly, I came up with a blog post from the author of the Cisco Official Cert Guide for the CCNA certification, Wendell Odom. In that post, the author admitted he wasn’t able to find a case in which the listening state is really necessary in STP to avoid temporary loops. He also quoted a fragment from a book written by Radia Perlman, the creator of STP. In that book, Radia Perlman even suggested the listening state wasn’t really necessary.

I strongly recommend reading the Wendell Odom article. But, as a summary, it seems that learning MAC addresses immediately after unblocking an interface isn’t harmful. Even though an interface could potentially learn a wrong MAC address it will not create a loop. Therefore, the listening and learning state could have been merged into a simple “preforwarding state” in the original STP definition, as suggested by Radia Perlman in her book.

Ethernet frame types and BPDUs

The first time I saw the structure of a Bridge Protocol Data Unit (BPDU) in Packet Tracer, I noticed something unusual in its Ethernet encapsulation. BPDUs are the type of message used by the Spanning-Tree Protocol (STP) in switches to avoid creating loops in a LAN. I will probably talk about STP later in this blog, but now I want to focus the discussion on the structure of an Ethernet frame.

The most common Ethernet frame type used today is known as Ethernet II. If you look at the traffic captured by WireShark or the PDU details shown by Packet Tracer, you will probably see the structure of an Ethernet II frame. For example, let’s look at the output of the “Outbound PDU details” of a ping message taken from Packet Tracer:

The first section of the output is named “Ethernet II” and represents the Layer-2 Ethernet frame that encapsulates the upper layers data. Let’s review the different frame fields in order, from left to right:

  • Preamble (7 bytes): bit pattern of alternating 1s and 0s for clock synchronization between the transmitter and the receiver.
  • Start Frame Delimiter (SFD) (1 byte): bit pattern 10101011 that marks the beginning of the frame.
  • Destination MAC address (6 bytes): the destination physical Layer-2 address.
  • Source MAC address (6 bytes): the source physical Layer-2 address.
  • Type (2 bytes): specifies the upper level protocol encapsulated. In this case, 0x0800 represents IPv4.
  • Data (variable length): the data or payload from the upper layers.
  • Frame Check Sequence (FCS) (4 bytes): a 32-bit CRC value for error checking.

Now let’s look at the PDU details of a BPDU packet:

Now, the Ethernet section is called “Ethernet 802.3“. The frame fields are basically the same, except for the Type field, which it is now called LEN (length) and represents the length in bytes of the data portion of the frame. Therefore, the STP protocol messages do not use the common Ethernet II encapsulation. An Ethernet 802.3 frame with LLC 802.2 encapsulation is used instead.

Ethernet II, also known as DIX Ethernet, is the version 2 of the original Ethernet implementation developed by DEC, Intel and Xerox. In the first IEEE definition of the 802.3 Ethernet standard, the Ethertype was replaced by the data length field and the protocol type was specified in an additional header using the LLC 802.2 protocol. The LLC header consists of 3 fields:

  • Destination Service Access Point (DSAP) (1 byte): represents the destination layer-3 process. In this example, the value 0x42 represents the STP protocol
  • Source Service Access Point (SSAP) (1 byte): represents the source layer-3 process: 0x42 for STP, again.
  • Control (1 or 2 bytes): represents the type of communication (unacknowledge connectionless, connection-oriented or acknowledged connectionless).

These fields are shown in Packet Tracer under the “LLC” section:

The last section called “STP BPDU” shows the fields of the BPDU message, as defined by the STP protocol.

LLC encapsulation has a variation called SNAP extension, that defines two additional fields after the control field:

  • OUI (3 bytes): 24-bit number that uniquely identifies a vendor.
  • Protocol ID (2 bytes): specify the particular protocol defined by that vendor.

For example, Cisco proprietary PVST+ protocol is encapsulated using a value of 0x00000c for the OUI field, and a value of 0x010b for the protocol ID. If both the DSAP and SSAP fields have a value of 0xAA and the control field is set to 0x03, it means that the frame is using the SNAP extension.

802.2 LLC and 802.2 SNAP framing types were used in some old technologies like FDDI, Token Ring or AppleTalk. Since IEEE approved the use of the Ethernet II in its 802.3 standard, clearly, this frame format won the battle and it is used in almost every local area network today. However, we can still see the old LLC encapsulation in some protocols like STP.

To finish the discussion about Ethernet frame types, I tried to do an experiment in Packet Tracer. It seems that Packet Tracer always shows BPDUs using LLC without SNAP extension. I created a simple topology with two switches and forced them to used the Rapid PVST+ instead of the default PVST+, by entering the IOs command: “spanning-tree mode rapid-pvst“.

Now the BPDU section is called “RSTP 802.1w”, showing that we are using the “rapid version” of the Spanning-Tree Protocol, but the LLC encapsulation shown is the same as before.

© 2022 Networking Tales

Theme by Anders NorenUp ↑