Understanding networking concepts

Category: CCNA

CCNA-related content

IEEE 802.1D Spanning Tree Topology Changes

The Spanning Tree Protocol (STP) was designed to avoid Layer-2 loops in a LAN with redundant links. The potential loops are avoided in STP by forcing some switch ports to move into a blocking state in which they are not allowed to forward data frames. But the protocol must also be able to react to topology changes and converge to a new logical topology in which some of the blocked ports need to be moved to a forwarding state.

In the original IEEE 802.1D specification of the STP protocol, there is no mechanism to ensure that all the switches of the STP domain are already aware of the change in the topology and have reacted accordingly. As a consecuence, a switch running STP cannot move a port directly to the forwarding state because it may cause a temporary loop. Unused MAC table entries can also be a source of temporary loops and outages in the network during a topology change. STP rely on timers and the reduction of the MAC table aging timer to handle topology changes.

In this blog post, we will examine the events and BPDUs generated during a topology change with the help of Wireshark and a sample network in GNS3:

S1 has been configured as the root bridge using the command spanning-tree vlan 1 root primary:

S1 configured as root

S2 has been configured as the secondary root bridge using the command spanning-tree vlan 1 root secondary:

S2 configured as secondary root

S2 will become the designated switch on the network segment between S2 and S3. Therefore, S3 will place its Gi0/1 port into a blocking state to prevent a forwarding loop.

In a stable scenario, S1 will generate configuration BPDUs every 2 seconds (hello timer). These BPDUs will be received by S2 and S3 on their root ports. In the following Wireshark capture, we can see that S1 is sending configuration BPDUs to S2 every 2 seconds. Notice that S2 does not send any BPDU to S1 because a switch does not send BPDUs through its root port in a stable topology:

From the traffic capture on the link between S2 and S3, we can see that S2 is relaying the BPDUs it receives from S1 to S3. When the BPDUs are forwarded to S3, S2 changes the root path cost from 0 to 4 and uses its Bridge ID (BID) as the sender BID (the Bridge Identifier field):

When S3 receives these BPDUs, it realizes that there is a path to the root through S2, but it choose its Gi0/0 port as root port because it is the best path to reach the root. S3 does not send BPDUs to S2 because its Gi0/1 port is in a blocking state.

In this network topology, all traffic between PC2 and PC3 must go through S1. Let’s generate some traffic between PC3 and PC2 to verify how their MAC address are learned by the switches:

S3 has learned the MAC address of PC2 through its root port Gi0/0:

PC2 MAC address
S3 MAC table

S2 has learned the MAC address of PC3 through its root port Gi0/0:

PC3 MAC address
S2 MAC table

Now let’s shut down the interface Gi0/0 on S2 to simulate a link failure between S1 and S2. Before shutting down the port, let’s activate the debugging of STP events on S3:

Immediately after the port Gi0/0 on S2 is shut down, S2 starts annoucing itself as the root. Since S3 is blocking on its Gi0/1 port, S2 was only receiving BPDUs from its root port which is now administratively down. S2 realizes that all the paths to the root has been lost and starts an election process which results in declaring itself as the root bridge.

Let’s examine the events captured on S3:

During aproximately 20 seconds (MaxAge timer), S3 ignores the inferior BPDUs it receives from S2. Only after the MaxAge timer expires, S3 reacts to those inferior BPDUs and unblocks its Gi0/1 port. As soon as the port Gi0/1 enters the Listening state, S3 starts sending BPDUs to S2. S2 realizes that these BPDUs are superior and stops claiming itself as the root. S2 selects its Gi0/1 port as the root port and generates a Topology Change Notification (TCN) BPDU, which is received by S3 on its Gi0/1 port. Then, S3 forward the TCN towards the root S1. From the debug, we can see that S3 also generates a TCN when the Gi0/1 port transitions to the Forwarding state.

Let’s examine the Wireshark captures on the link between S2 and S3. After the link failure, S2 starts announcing itself as the root:

When the MaxAge timer expires, S3 starts relaying the BPDUs generated by S1 to S2:

When S2 receives the superior BPDU, it elect its Gi0/1 port as the root port and generates a TCN:

S3 acknowledges the TCN received from S2:

Examining the Wireshark capture on the link between S1 and S3, we can see the TCN relayed from S3 to S1:

As soon as S1 receives the TCN, it acknowledges the TCN and sets the Topology Change (TC) bit on the subsequent BPDUs:

S3 sends a second TCN to S1 when its Gi0/1 port transitions to Forwarding state:

S1 sets the TC bit on the configuration BPDUs for the duration of MaxAge+Forward-delay seconds:

When S2 and S3 receive a configuration BPDU with the TC bit set, they reduce their MAC address table aging time from the default value of 300 seconds to Forward-delay seconds (15 seconds by default). This way, during a topology change, switches can age out old dynamic entries that were learned using the network topology prior to the link failure. This also prevents network outages due to “blackholes“. For example, imagine that PC3 is sending traffic to PC2. In the initial topology, the traffic will follow the path PC3->S3->S1->S2->PC2. If the link between S2 and S3 fails, the data traffic will be interrupted for about 50 seconds until the STP converges to the new topology. But without the MAC aging reduction mechanism, S3 will keep sending the frames from PC3 to S1 for 300 seconds until the dynamic entry for PC2’s MAC address expires.

To conclude, let’s explain why S3 waits for MaxAge seconds before reacting to the inferior BPDUs received from S2. The reception of an inferior BPDU from S2 may indicate that S2 has lost its conection to the root or that the root is down. The key point is to understand that, even though S3 is still receiving BPDUs from S1 every 2 seconds, it is not safe to assume that S1 is still alive. In this small topology, we could probably safely conclude that S1 is alive because it is directly connected to S3, but in a larger topology with several hops between the switch and the root, we may receive some BPDUs after the root switch goes down due to propagation delays. This is the reason why STP maintain the old root information for a port during MaxAge seconds before accepting the new information.

OSPFv3: building the topology

In the previous blog post, we reviewed the process of building a simple single-area OSPFv2 topology using the output from the “show ip ospf database” command. In this post, we will repeat the same process but using OPSFv3 and IPv6.

We will be using the same network topology, but the router interfaces have been configured with IPv6 addresses:

Let’s start the topology reconstruction process by examining the Router-LSA generated by R1:

The previous output shows some differences between the format of the Router-LSA in OSPFv3 and its counterpart in OSPFv2:

  • The option field is longer (24 bits)
  • The Link State ID is no longer equal to the RID of the advertising router
  • All addressing information has been removed from the router-LSA
  • Link state information about interfaces connected to stub networks has been removed from the router-LSA

In OSPFv3, a router may generate one or more Router-LSAs. Remember that the tuple LS type, Link State ID, advertising router RID uniquely identify an LSA in the Link State Database (LSDB). If more than one router-LSA is generated, they can be distinguised by generating a different Link State ID for each of them.

In OSPFv2, both Router and Network-LSAs store some addressing information. Specifically, in OSPFv2 Router-LSA we may found the following addressing information:

  • For point-to-point links: local interface IPv4 address
  • For links connected to transit networks: DR interface IPv4 address and local interface IPv4 address
  • For links connected to stub networks: local interface IPv4 address and network mask

In OPSFv2, the network mask for transit networks is stored in the Network-LSA generated by the Designated Router (DR).

By contrast, OSPFv3 removes all addressing information originally stored in Router and Network-LSAs and moves it to other types of LSAs, as discussed later in this post. OSPFv3 router and network LSAs only contain topological information necessary to build the Shortest Path First (SPF) tree. It means that Router and Network-LSAs are only flooded when a topological change occurs and prefix changes will not trigger any SPF recalculation, which is an improvement over OSPFv2.

The Router-LSA generated by R1 describes two pont-to-point links:

The fields used to describe the attached interfaces are:

  • Metric
  • Local Interface ID
  • Neighbor Interface ID
  • Neighbor Router ID

Metric represents the outbound cost assigned to the interface. Local Interface ID is a number that uniquely identifies the local router interface. Neighbor Interface ID corresponds to the interface ID of the neighboring router and Neighbor Router ID identify the node attached at the other end of the link.

The two links described in the R1 Router-LSA can be translated into the following graph:

The neighbor RID can be used by the local router as a key to search in the LSDB for the next Router-LSA and place another piece of the puzzle. Let’s examine both R2 and R3 Router-LSAs:

R3 router-LSA
R2 router-LSA

These new Router-LSAs contain a link description that points back to R1. This link corresponds to the serial link that connects R1 to R2 and R3. These LSAs contain another interface type: “Link connected to a Transit Network”. The transit network corresponds to the Ethernet segment that connects routers R2 and R3 through the switch SW1. Like OSPFv2, OSPFv3 represents broadcast and Non-broadcast Multiaccess Networks (NBMA) by adding a pseudonode to the SPF topology. This pseudonode is described by a Network-LSA created by the DR, which is R3 in this topology. The Link State ID of a Network-LSA corresponds to the interface ID of the DR. This field in combination with the RID of the DR can be used as a key to search in the LSDB for the Network-LSA associated with the Ethernet segment:

If we compare the previous output with the structure of a Network-LSA in OSPFv2, we may notice two main differences. First, the Link State ID in field in OSPFv2 is equal to the IP address of the DR. In OSPFv3, this value is replaced by the interface ID of the DR, as mentioned before. In addition, in OSPFv3 the network mask field is missing. Remember that Router and Network-LSAs do not store any address information.

We can find information about prefixes and IPv6 addresses in the last two LSAs that need to be analyzed in order to complete the starting network topology: Link and Intra-Area-Prefix-LSAs. But before moving on to the next LSAs, let’s add another piece to the puzzle using the information stored on the Network-LSA generated by R3:

Basically, the Network-LSA describes the list of routers connected to the same network segment (R2 and R3). The relationship between R2 and R3 can be described by the Network-LSA pseudonode, which is represented by the cloud in the previous graph. The cost to reach this pseudonode can be obtaind from the Router-LSAs.

At this point, you may notice that Router and Network-LSAs provide all the topological information needed to build the SPF tree from any router to any other router. However, in addition to the topological information, routers need information about prefixes and neighbor link-local addresses in order to build their routing tables. In OSPFv2, the next-hop IP address is determined from the Router-LSA received from the neighboring routers. But Router-LSAs have an area flooding scope. Therefore, if an interface IP changes, a new Router-LSA is re-flooded throughout the entire area. But such change is only relevant to neighboring routers connected to the same link where the interface is attached. Another improvement of OSPFv3 over OSPFv2 is the addition of a link-local flooding scope. In OSPFv3, neighbor-specific information is placed in another LSA type: Link-LSAs. A router originates a separate Link-LSA for each attached link connected to one or more neighbors. A Link-LSA provides the following information:

  • The router’s link-local address
  • The list of IPv6 prefixes associated with the interface
  • A collection of options bits (beyond the scope of this discussion)

Let’s examine the Link-LSA generated by R1 for its interface connected to the stub network where PC1 is attached:

From the previous output we can derive the following information:

  • Link State ID: 4 (interface ID of R1 interface)
  • Router Priority: 1
  • Link Local Address: FE80::1
  • Prefix associated with R1 interface: 2001:1::/64

The link-local scope of Link-LSAs can be confirmed by looking at the Link-LSAs generated by R2 from the point of view of R1:

The previous output shows that R1 receives only one Link-LSA from R2, the one associated with the point-to-point link between R1 and R2.

Finally, let’s examine the last LSA we will be discussing in this post: Intra-Area-Prefix LSA. An Intra-Area-Prefix LSA provides a list of IPv6 prefixes which are associated with either a Router-LSA or a Network-LSA. A router may generate multiple Intra-Area-Prefix LSAs in order to keep the size of the LSA small. The different LSAs are distinguished by their Link State ID.

Let’s examine the Intra-Area-Prefix LSA generated by R1:

The previous LSA is linked to the R1’s Router-LSA by the fields: Referenced LSA Type, Referenced Link State ID and Referenced Advertising Router. A total of three IPv6 address prefixes are described, which correspond to the two point-to-point links and the stub network.

If we examine the Intra-Area-Prefix LSAs generated by R3, we get the following output:

R3 generates two Intra-Area-Prefix LSAs: one for the point-to-point link to R1 and another for the transit network connecting R2 and R3. The Intra-Area-Prefix LSA associated with the transit network is linked to the Network-LSA generated by R3.

As a final note, let’s examine the output of the Intra-Area-Prefix LSA generated by R2:

The previous LSA only describe the IPv6 prefix associated with the point-to-point link connecting R1 to R2. R2’s Intra-Area-Prefix LSA omits the IPv6 prefix associated with the transit network because it is already described in the Intra-Area-Prefix-LSA generated by the DR.

Combining all the information from the different Link and Intra-Area-Prefix LSAs we can build the final topology:

OSPFv2: building the topology

This blog post describes the process of building a simple network topology using the information derived from the “show ip ospf database” command. This is an interesting exercise to understand how to read the information contained in the different LSA types and how they are linked together like a puzzle.

We will begin the process using a sample topology that consists of three routers configured using single area OSPFv2. Therefore, we will review only Type-1 and Type-2 LSAs. In a later blog post, I will repeat the same exercise but using IPv6 and OSPFv3.

The goal of this exercise is to reconstruct the same topology using only the information from the Link State Database (LSDB). Let’s start by examining the output of the “show ip ospf database router self-originate” command in router R1:

The previous command shows the detailed information about the Router LSA (Type-1 LSA) originated from R1. From the previous output, I want to highlight the following information:

  • Link State ID: 1.1.1.1
  • Advertising Router: 1.1.1.1
  • Number of links: 5

The Link State ID identifies a node in the OSPF graph. In a Type-1 LSA, the Link State ID value is the Router ID (RID) of the advertising router. The Link State ID is the key used to search in the LSDB in order to link the different LSAs together. The final OSPF graph is used by the routers to calculate the shortest path using the Shortest Path First (SPF) algorithm (Dijkstra algorithm).

The output shows a total number of 5 links in R1 instead of 3. This is due to the way that point-to-point links are described in OSPFv2. When addressing information is assigned to the interfaces in a point-to-point link, it is modeled as if it were a stub network attached to the router. This extra node is used in SPF calculations only when the traffic goes to the network assigned to the point-to-point link (for example, if the destination is one of the serial interfaces). For simplicity, we will ignore those two extra stub networks associated with the serial links on R1 and consider only the point-to-point links and the real stub network where PC1 is attached.

The first connection described in the output of the previous command corresponds to the point-to-point connection between R1 and R3:

It can be read as node 1.1.1.1 is connected to node 3.3.3.3 through a link that has an associated IP of 10.0.13.1 and a cost of 64. The Link ID field points to the Link State ID of the Router LSA generated by R3. This relationship between R1 and R3 can be represented as follows:

The second connection corresponds to the additional stub network associated with the addressing information assigned to the previous serial link between R1 and R3, so we will ignore it when building the topology.

The third connection corresponds to the point-to-point connection between R1 and R2:

This information allows us to add another node to the previous graph:

The next connection is the other extra stub network associated with the serial link between R1 and R2, so we will skip it. The final connection corresponds to the real stub network associated with the network segment where PC1 is attached:

In this case, the Link State ID corresponds to the subnet ID (192.168.10.0) and the Link Data field carry the subnet mask information (255.255.255.0). The cost to reach this network is also specify. With that information, we can add another piece to the graph:

At this point, we have two choices to continue the reconstruction of the topology. The R1’s Router LSA is connected to the R2 and R3’s Router LSAs through the Link ID fields. For example, the serial link between R1 and R2 has a Link ID of 2.2.2.2. This value can be used by R1 as a key to look for the R2’s Router LSA in the LSDB:

The first connection described in the R2’s Router LSA is the serial connection to R1:

The Link ID 1.1.1.1 points to the R1’s Router LSA, which allows us to place the two pieces of the puzzle together. Notices that the Link Data field corresponds to the IP address of R2’s serial interface. This IP can be used by R1 as the next-hop address when computing routes through R2. The interface cost value of 64 is used by R2 when computing routes using its serial interface as the outgoing interface.

Before examining the third connection described in the R2’s Router LSA, let’s have a look at the output associated with R3’s Router LSA:

In this LSA, the first connection corresponds to the serial connection between R1 and R3. At this point, we can add the costs and IPs associated with the point-to-point links connecting R1 to R2 and R3:

In order to complete the final topology, let’s analyze the link to the transit network described in the Router LSAs of R2 and R3. The transit network corresponds to the Ethernet segment that connects routers R2 and R3 through the switch SW1. Ethernet networks are modeled in OSPF by adding a pseudonode to the topology. In multiaccess networks like Ethernet, more than two routers can be attached to the same network segment. In order to simplify the complexity and reduce the size of the LSDB, instead of creating a single connection between every possible pair of routers in the LSDB, an extra node is created (the pseudonode) and it is used as a vertex in the graph to which all the other routers connect.

The pseudonode is described by the Type-2 LSA created by the Designated Router (DR), which is R3 in this topology:

The Link State ID of the Type-2 LSA corresponds to the IP address of the DR (192.168.20.2). The network mask is also specified in the LSA, which can be used to calculate the network address using the IP of the DR. Finally, the Type-2 LSA stores a list of RIDs associated with all the routers connected to the transit network. This RIDs can be used as keys to search in the LSDB for the corresponding Router LSAs, in order to find all the possible paths in the graph.

We can conclude the process of building the topology by adding the information provided by the Type-2 LSA, resulting in the following diagram:

In the next post, I will repeat the same process but using IPv6 and OPSFv3 to discuss the differences.

The Magic Number

Mastering subnetting is a fundamental skill in network engineering. Finding the subnet ID given an IP address and a non-trivial subnet mask can be a challenge at first, but with practice, it can become a second nature making possible to solve subnetting problems in just a few seconds.

One of the main study resources I used for the CCNA was the Cisco Official Cert Guide Book (OCG). In this book, the author describe a process to find the subnet ID when using a difficult mask. A difficult mask is a subnet mask that has one octet that is neither 0 nor 255. Since, by definition, a subnet mask cannot interleave 0s and 1s, the possible values for that octet are: 128, 192, 224, 240, 248, 252 and 254. The OCG book calls that octet the interesting octet.

The process of finding the resident subnet ID of an IP address described in the book can be summarized as follows. For every octet in the subnet mask:

  1. If the mask octet is 255, just copy the corresponding IP octet in the same position.
  2. If the mask octet is 0, put a 0 in the same position.
  3. If the mask octet is an interesting octet:
    • Calculate the magic number as 256 – V, where V is the decimal value of the interesting octet.
    • Calculate the subnet ID octet as the largest multiple of the magic number that is less than the correspoding IP octet.

As an example, let’s find the resident subnet ID of the IP address 10.0.50.76/29. In this case, the subnet mask in dotted decimal notation (DDN) is 255.255.255.248. Following the previous procedure, since the first three octets are 255, to find the resulting subnet ID we just need to copy the first three octets of the IP address: 10.0.50.X. To find the last octet ‘X’, we have to follow the step 3 and calculate the magic number as 256 – 248 = 8. The value ‘X’ will be the largest multiple of 8 that is less than 76, which is 72. Therefore, the subnet ID will be 10.0.50.72/29.

But, at this point, you may ask yourself: why does it work? In order to understand this process, let’s have a look at the following figure:

The figure shows four different subnetting schemes. The first one corresponds to a /25 mask, which divides the original network in two subnets. The second one corresponds to a /26, which divides the network in four subnets. The next one divides the network in eight subnets (/27) and the last one in sixteen subnets (/28). The figure highlights one important fact: in all subnetting schemes, the last subnet octet always corresponds to the interesting octet described in the OCG book. This is because the interesting octet, by definition, will have a binary value of all 0’s in the host portion of the octet and all 1’s in the subnet portion. Another important fact we can notice from the figure is that the size of the resulting subnets can be calculated by subtracting the interesting octet value from 256, which corresponds to the magic number concept used in the OCG book.

Once we know the size (magic number) of the resulting divisions of the original network range, finding the multiple of this number closest to the original IP octet value without going over gives us the desired resident subnet ID.

Layer 3 Switches

Trying to explain how a layer 3 switch works is, in my opinion, a very useful exercise when you are starting to learn basic switching and routing concepts. It is an opportunity to review both the layer 2 switching logic and the layer 3 routing logic.

Let’s review the operation of a L3 switch using an example based on the following simple topology:

If the switch receives a frame from PC1 destined to PC2, since both PCs reside on the same network segment (VLAN 10), the switch will behave as a L2 switch while forwarding the receiving frame. It will look for PC2 MAC address in its MAC address table and the frame will be forwarded out the port connected to PC2 (assuming the PC2 address is found in the MAC table).

In this first forwarding scenario, if the destination MAC address belongs to another host in the same VLAN where the receiving port resides, the corresponding layer 2 forwarding process is triggered depending on the type of incoming frame (known unicast frame, unknown unicast frame, broadcast or multicasts).

Now let’s see what happens when PC1 sends a frame to a device in another subnet, for example PC4. In this case, a layer 3 device is needed to communicate hosts in different VLANs. PC1 can’t send the frame directly to PC4 because the destination IP resides in another network. Therefore, PC1 will send the frame to its default gateway, which corresponds to the layer 3 switch in the topology. When the L3 switch receives the frame, it realizes the destination MAC address corresponds to its own MAC address in that VLAN and the layer 3 forwarding process is triggered. Since the frame is destined to the switch itself, the switch acts like an end device in this scenario and cannot behave like a L2 switch. The switch takes the role of a router and performs the following steps to process the incoming frame:

  1. The IP packet is de-encapsulated from the received data-link frame.
  2. The destination IP address is compared to the L3 switch routing table to find a route that matches the destination address.
  3. If a route is found, the packet is encapsulated in a new data-link frame and the frame is transmitted through the outgoing interface specified in the matched routing table entry. If no route is found, the packet is discarded.
  4. In order to send the new frame, the switch must find the destination MAC address using its local ARP cache or through an ARP request.

If the outgoing interface identified at step 3 is a routed interface, the new frame is forwarded out the physical routed interface. If the outgoing interface is a Switch Virtual Interface (SVI), the forwarding process is a bit more complex.

An SVI is a logical interface inside the switch that is bound to a VLAN and allows the switch to send IP packets to that VLAN like another IP device. You can think of an SVI like a physical Netword Card Interface (NIC) of a PC, but the SVI is virtual (which means it is build by software). The SVI allows the switch to send and receive IP packets through the physical ports associated with the corresponding VLAN.

If the outgoing interface identified by the routing table is an SVI, out which physical port is the frame forwarded? The answer to this question depends on the destination MAC address:

  • If the destination MAC address is a broadcast or multicast address, the frame is forwarded out all physical ports assigned to the outgoing VLAN.
  • If the destination MAC address is a unicast address (learned through the ARP process or find in the ARP table), the switch must look for the destination MAC in its MAC address table, like a regular L2 switch. The matched MAC table entry will identify the outgoing physical port that will be used to forward the frame.

During my studies for my CCNA exam preparation, I used to review all these steps and concepts involved in the operation of a L3 switch to make sure I really understand all the fundamentals of routing and switching: MAC address tables, ARP tables, routing tables, layer 2 forwarding logic, SVIs, layer 3 routing logic, etc. In future blog posts, I’ll make a deeper review of some of those specific switching and routing topics.

Temporary loops in STP

When I was studying all the basic concepts related to the traditional STP protocol, I start asking myself one question: why do we need the listening and learning states? Why can’t a switch interface move from a blocking state directly to a forwarding state?

The theory says that if the LAN topology suddenly changes (for example, a link fails) and an interface that was previously blocked is moved to a forwarding state, a temporary loop may be created. The reason for these temporary loops could be the old MAC table entries that were learned using the old topology. To solve this problem, STP defines two interim states (listening and learning states).

During the listening state, the old MAC table entries are removed and during the learning state the interface starts to learn the source MAC addresses of the received frames. These two transitory states help the switches to adapt to the new topology and avoid the creation of potential temporary loops.

All of this may sound reasonable, but I wasn’t able to find a scenario where a temporary loop was created as a consequence of moving an interface directly from a blocking state to a forwarding state. Then I tried to search on the Internet and found some forum discussions about this topic and, surprisingly, I came up with a blog post from the author of the Cisco Official Cert Guide for the CCNA certification, Wendell Odom. In that post, the author admitted he wasn’t able to find a case in which the listening state is really necessary in STP to avoid temporary loops. He also quoted a fragment from a book written by Radia Perlman, the creator of STP. In that book, Radia Perlman even suggested the listening state wasn’t really necessary.

I strongly recommend reading the Wendell Odom article. But, as a summary, it seems that learning MAC addresses immediately after unblocking an interface isn’t harmful. Even though an interface could potentially learn a wrong MAC address it will not create a loop. Therefore, the listening and learning state could have been merged into a simple “preforwarding state” in the original STP definition, as suggested by Radia Perlman in her book.

A unicast ARP request?

After analyzing some Wireshark traffic captures, I noticed that some ARP requests were broadcast frames, as expected, but I randomly found some ARP requests that were unicast frames. I thought ARP requests were always broadcast frames, but I was wrong.

Let’s do a quick review of the ARP protocol using a simple topology in Packet Tracer:

Imagine that PC1 (10.10.10.1) wants to access the Web Server (10.10.10.3). PC1 will generate an IP packet that needs to be encapsulated inside an Ethernet frame. To create the Ethernet frame, PC1 needs to know the MAC address of the Web Server. Now is when the ARP process come into play. PC1 will send an ARP request asking for the MAC address of 10.10.10.3. Since the destination MAC address is unknown, this frame must be a broadcast frame and it will be received by all the devices in the LAN. Only the device with IP 10.10.10.3 (the Web Server) will reply with its MAC address.

The Wireshark capture shown below corresponds to an ARP request. In the Ethernet section of the capture, we can see that it is a broadcast frame (the destination MAC address is FF:FF:FF:FF:FF:FF).

The next Wireshark capture is also an ARP request, but the destination address is a unicast MAC address instead of a broadcast address.

What is, then, the purpose of a unicast ARP request if the destination MAC address is already known? I found the answer of this question in the RFC 1122 – “Requirements for Internet Hosts – Communication Layers”. Section 2.3.2.1 describe four mechanisms for flushing out-of-date ARP entries. One of the mechanisms is called “unicast poll” and is described as: “Actively poll the remote host by periodically sending a point-to-point ARP Request to it, and delete the entry if no ARP Reply is received from N successive polls“.

Following the example shown in the previous capture, the sending host has an entry in its ARP table that matches the IP 192.168.18.31 with the MAC address 60:f2:62:0f:50:a5. In order to verify if the MAC address is still valid, a unicast ARP request is sent using the MAC address recorded in the ARP table. If the destination host is still using that MAC address, it will reply with an ARP reply message. If no reply is received, it means that the MAC address is no longer in use and the sending host will delete the corresponding ARP table entry.

One final observation. In this particular capture of a unicast ARP request, the “Target MAC address” field is filled with a MAC address of all zeros (00:00:00:00:00:00). However, it is possible to find ARP requests that also use the MAC address of the destination host in this field. For example, the capture shown below:

Does it depend on the particular implementation of the ARP protocol running on the sending host? Or maybe another protocol feature? I’ll keep digging into it.

Ethernet frame types and BPDUs

The first time I saw the structure of a Bridge Protocol Data Unit (BPDU) in Packet Tracer, I noticed something unusual in its Ethernet encapsulation. BPDUs are the type of message used by the Spanning-Tree Protocol (STP) in switches to avoid creating loops in a LAN. I will probably talk about STP later in this blog, but now I want to focus the discussion on the structure of an Ethernet frame.

The most common Ethernet frame type used today is known as Ethernet II. If you look at the traffic captured by WireShark or the PDU details shown by Packet Tracer, you will probably see the structure of an Ethernet II frame. For example, let’s look at the output of the “Outbound PDU details” of a ping message taken from Packet Tracer:

The first section of the output is named “Ethernet II” and represents the Layer-2 Ethernet frame that encapsulates the upper layers data. Let’s review the different frame fields in order, from left to right:

  • Preamble (7 bytes): bit pattern of alternating 1s and 0s for clock synchronization between the transmitter and the receiver.
  • Start Frame Delimiter (SFD) (1 byte): bit pattern 10101011 that marks the beginning of the frame.
  • Destination MAC address (6 bytes): the destination physical Layer-2 address.
  • Source MAC address (6 bytes): the source physical Layer-2 address.
  • Type (2 bytes): specifies the upper level protocol encapsulated. In this case, 0x0800 represents IPv4.
  • Data (variable length): the data or payload from the upper layers.
  • Frame Check Sequence (FCS) (4 bytes): a 32-bit CRC value for error checking.

Now let’s look at the PDU details of a BPDU packet:

Now, the Ethernet section is called “Ethernet 802.3“. The frame fields are basically the same, except for the Type field, which it is now called LEN (length) and represents the length in bytes of the data portion of the frame. Therefore, the STP protocol messages do not use the common Ethernet II encapsulation. An Ethernet 802.3 frame with LLC 802.2 encapsulation is used instead.

Ethernet II, also known as DIX Ethernet, is the version 2 of the original Ethernet implementation developed by DEC, Intel and Xerox. In the first IEEE definition of the 802.3 Ethernet standard, the Ethertype was replaced by the data length field and the protocol type was specified in an additional header using the LLC 802.2 protocol. The LLC header consists of 3 fields:

  • Destination Service Access Point (DSAP) (1 byte): represents the destination layer-3 process. In this example, the value 0x42 represents the STP protocol
  • Source Service Access Point (SSAP) (1 byte): represents the source layer-3 process: 0x42 for STP, again.
  • Control (1 or 2 bytes): represents the type of communication (unacknowledge connectionless, connection-oriented or acknowledged connectionless).

These fields are shown in Packet Tracer under the “LLC” section:

The last section called “STP BPDU” shows the fields of the BPDU message, as defined by the STP protocol.

LLC encapsulation has a variation called SNAP extension, that defines two additional fields after the control field:

  • OUI (3 bytes): 24-bit number that uniquely identifies a vendor.
  • Protocol ID (2 bytes): specify the particular protocol defined by that vendor.

For example, Cisco proprietary PVST+ protocol is encapsulated using a value of 0x00000c for the OUI field, and a value of 0x010b for the protocol ID. If both the DSAP and SSAP fields have a value of 0xAA and the control field is set to 0x03, it means that the frame is using the SNAP extension.

802.2 LLC and 802.2 SNAP framing types were used in some old technologies like FDDI, Token Ring or AppleTalk. Since IEEE approved the use of the Ethernet II in its 802.3 standard, clearly, this frame format won the battle and it is used in almost every local area network today. However, we can still see the old LLC encapsulation in some protocols like STP.

To finish the discussion about Ethernet frame types, I tried to do an experiment in Packet Tracer. It seems that Packet Tracer always shows BPDUs using LLC without SNAP extension. I created a simple topology with two switches and forced them to used the Rapid PVST+ instead of the default PVST+, by entering the IOs command: “spanning-tree mode rapid-pvst“.

Now the BPDU section is called “RSTP 802.1w”, showing that we are using the “rapid version” of the Spanning-Tree Protocol, but the LLC encapsulation shown is the same as before.

Collisions in Ethernet

In my first post of this blog, I will discuss the concept of collision in Ethernet transmissions. We all know that the use of switches instead of hubs in modern Ethernet networks has made collisions a thing of the past. But how?

When I started reading about Ethernet fundamentals, I was a little confused about the process of collision. If we think about the early days of Ethernet, it is easy to see how collisions occured. The standards 10BASE5 (Thick Ethernet) and 10BASE2 (Thin Ethernet) both used coaxial cables and were based on a bus topology. In a coaxial cable, we only have two wires: the central conductor and the outer metallic shield, both separated by a dielectric material. Since all the devices are connected to the same cable, if two devices happen to transmit at the same time, the two signals sent are mixed and a collision occur.

But, what happen in a twisted-pair cable? For example, in the old standards 10BASE-T or 100BASE-T, two pairs of cables are used. Since one pair is used for transmission and the other pair for reception, how can we get collisions? In this case, the collisions are a consecuence of the use of hubs.

Hubs are dumb devices. When a hub receives a frame, it sends the frame out all the ports except the port the frame it was received on. For example, the diagram below represents three devices connected via a hub:

If PC1 sends a frame to PC3, the hub will receive the frame on its Fa0 port and it will forward the frame out ports Fa1 and Fa2. So, PC2 will also receive a copy of the frame destined to PC3, that will be dropped. But, if PC2 also sends a frame to PC3 at the same time, the hub will forward the frame out ports Fa0 and Fa2. So, the hub will transmit two electrical signals corresponding to the frames sent by PC1 and PC2 through port Fa2 (using its transmission pairs 3 and 6). It will generate a collision, and PC3 will receive a corrupted sequence of bits.

The introduction of switches solved this problem because switches are intelligent devices that work at the layer 2 in the OSI model, and they know out which ports send the frames and when to send them. If we replace the hub in the previous topology by a switch, when PC1 sends a frame to PC3 we may find two different scenarios:

If PC3 MAC address is in the MAC address table of the switch, the frame will only be forwarded out Fa0/3. If PC3 MAC address is not in the switch MAC address table, the switch will forward the frame out all the ports except the receiving port, like a hub. In both scenarios, if PC1 and PC2 send a frame to PC3 simultaneously, the switch will not start forwarding both frames time out Fa0/3 generating a collision. One of the two frames will be stored in buffer while the other frame is being sent, therefore, collisions will never occur. In a switch, every port represents a different collision domain. Even though a port is configured in half-duplex mode, collisions will never occur as long as there is only one device connected to that port. If more than one device is connected to a switch port via a hub, then only one device can transmit at a time and collisions will need to be handled by the CSMA/CD algorithm.

© 2022 Networking Tales

Theme by Anders NorenUp ↑