Navigating the Azure Networking Maze: 1 Flow, 2 Directions, 3 Metrics

Navigating the Azure Networking Maze: 1 Flow, 2 Directions, 3 Metrics
Photo by Priscilla Du Preez 🇨🇦 / Unsplash

As an infrastrucutre arhcitect, when setting up services in Azure, it is invitable that you will freqently need to conduct network testing. There are 1 flow, 2 directions and 3 metrics to consider.

1 Flow

Taking the 5-Tuple as an example

Source IP Source Port Destination IP Destination Port Protocol
1.1.1.1 1024 2.2.2.2 22 TCP

If the focus is on ICMP protocol testing tools (such as ping), more attention should be paid to Source IP / Destination IP only. As long as ICMP is allowed through, these 2 piceses of information are sufficient for many network negineers to determine whether thenetwork is reachable.

mtr

If the focus is on TCP protocol testing tools (such as mtr / psping / tcpping), apart from paying attention to the Source IP and Destination IP, there will also be more focus on the Destination Port and Protocol. In Azure, you need to be very clear about the protocol you are using - TCP, UDP or ICMP? This could be the key to identifying the issue in certain specific situations.

Most of the time, you don't need to pay too much attention to the Source Port, but if one day you are troubleshooting a network with ECMP and encounter an abnormal router, you can adjust the Source Port incrementally (Port No. 1024 / 1025 /1026... ) to change hash value, leading to different outcomes. This allow you to ateempt redirecting traffic away from the suspicious path and cross-check which router is malfuncitoning. This is because the ECMP path algorithm primarily uses the 5-Tuple to ensure connection consistency.

2 Directions

In addtion to commonly configuring inbound traffic (destination -> source) and outbound traffic (source -> destination) setting for the Azure Firewall and Network Security Group (NSG), you also need to ensure that the routing remains symmertical.

example-network-security-group

As a cloud network engineer, apart from filling out firewall whitelist forms, most of the time is spent researching how to avoid issues caused by asymmetric routing in cross-border networks. It is possible that the path from the Source IP to the Destination IP is correct, but the reverse path is incorrect and detours to another place.

asymmetric routing

To resolve asymmetric routing, it is necessary to have a clear Source IP and Destination IP, along with route tables available for lookup along the path to ensure the root cause of incorrect routing is identified. This is an extremely challenging journey. If you also have services like firewall (e.g. Azure Firewall or Palo Alto NGFW) in the middle, the difficutly level will increase further because they are stateful firewalls.


The core function of a stateful firewall is to track the state of network connections. It records connection information, such as src. IP, dest. IP, port number, and the state of TCP connections (e.g. SYN, ACK flags), in its internal state table. The firewall expects to see a complete TCP 3-way handshake process to verify and log a legitimate connection.

  1. The client sends a SYN packet to request establishing a connection
  2. The server responds with SYN/ACK packet.
  3. The client then sends an ACK packet to confirm the connection is established.

TCP 3-way handshake
Digram by Wireshark Wiki

In an asymmertic routing environment, the path taken by data pakcets from the source to the destination differs from the path taken when returning from the destination to the source. If return packets (such as ACK packets) arrive at a firewall due to different routing paths, and the firewall has not recorded the inital request for that connection (e.g. it did not see the corresponding SYN packet), the firewall will treat the packet as invalid or potentially malicious and discard it. This occurs because the firewall cannot find the inital connection information in its state table or deems the packet inconsistent with the established connection state.

The approach to adjusting the network with Azure is nothing more than...

  1. AS Path Prepending
  2. Longest Prefix Match (LPM) and Short Prefix Match (大小網段)

But overall, it wasn't so easy to adjust in the past, until Azure Virtual WAN introducted the Route Map feature, which become GA and available for use in production environments. Ref: Announcing the general availability of Route-Maps for Azure virtual WAN

azure-route-map

3 Metrics

ping traceroute mtr ntttcp sockperf iperf3 deadman
Network Latency ⚠️ Unreliable ⚠️ Unreliable ⚠️ Unreliable - ✅ Reliable ⚠️ Unreliable ⚠️ Unreliable
Network Throughput - - - ✅ Reliable ✅ Reliable ✅ Reliable -
Network Routing - ✅ Reliable ✅ Reliable - - - -

All network connection must be established under a connected network state, but if categorized based on tools, it can likely be divided into 3 metrics:

  1. Network Latency
  2. Network Throughtput
  3. Network Routing

About Network Latency

Although many tools can measure network latency, these metrics may not particularity accurate within Azure. Azure Accelerated Networking primarily improves VM network performance by offloading host network packets to FPGA-based SmartNICs, significantly enhanceing efficiency, especially TCP/UDP workloads. It reduce latency, jitter and CPU usage. Because the performance is excellent, it is generally recommended to enable this feature.

However, even with Azure Accelerated Networking enabled, ICMP still be suboptimal. This is maily because Azure Accelerated Networking was NOT originally designed to directly optimize ICMP. Addtionally, ICMP traffic is typically assigned lower priority within the Microsoft Backbone (MSBB).

Although this metric is not very reliable, if you want to understand the lowest latency using ping, it is recommended to refer to the round-trip min value, as it primarily represents the latency performance under the lightest network load and the most ideal conditions (e.g. match fast path).

upa/deadman

BTW, if you need to temporarily monitor multiple destinations, I personally highly recommend the upa/deadman

About Network Throughtput

throughput-bandwidth

According to Azure's recommendations, it is suggested to use ntttcp or sockperf to obtain accurate throughput values. However, if you are someone who frequently use iperf3, you can still use this tool, but be mindfule of the Bandwidth Delay Product (BDF). Because it is a key factor affecting TCP network throughput performance, especially in modern high-bandwidth, high-latency network environments (often referred to as Long Fat Networks, LFNs). BDP determines the maximum amount of data that can be accommodated on the network at any given time, which is the data that has been sent but not yet acknowledged. Here is an article that I think is well reading. Ref

Aside from increasing throughput by adjusting the TCP window size (-w), since in most cases you are unlikely to modify this, I personally recommend using parallel streams (-P parameter) to improve throughput. Multiple streams can share the data transmission load, and even if the window size of each stream is not optimal, their combined effect may more effectively 'fill' the network pipeline, thereby achieving higher overall throughput. This is also useful for simulating scenarios where multiple applications are simultaneously using the network

This is the following command I often use:

iperf3 --client ${Dest IP} --bidir --parallel 32 --time 30 --interval 1 --omit 1

About Network Routing

This is the network route to Google DNS displayed when I was on Starlux Airlines, using the in-flight wireless network via satellie

mtr-via-satellie
# ICMP
mtr -zb ${Dest IP}

# TCP
mtr -zb --tcp -P ${Dest Port} ${Dest IP}
mtr

The most important part is -z parameter, which displays the ASN of the routers passed through. You can fully understand how many ISPs have been traversed

3 Connection Paths from Taiwan on-premise to Azure Korea Central

There are roughly 3 routing methods for accessing Azure services from the Internet:

  1. ExpressRoute Circuit w/wo Global Reach
  2. Microsoft Routing Preferred (Cold Potato Routing)
  3. Internet Routing Preferred (Hot Potato Routing)

For the best latency and throughput, Microsoft Routing Preferred is used, meaning route traffic either via the Microsoft network rather than Internet Routing Preferred. Ref

Read more