Realistic Latency Measurement in the Application Layers

We are often asked how to measure the one-way latency with Avalanche, or “why don’t I see a one-way latency metric in my results.” This entry will explain why we don’t measure the one-way latency, and how you can measure the two-way latency.

In order to measure the one-way latency, you have two options:

  1. Use TCP TimeStamps. TCP TimeStamps are a TCP Extension and can therefore sometimes be blocked or  stripped off by Devices Under Test. And we learn from the RFC 1323, that a “1ms precision can run up to 8 Tbps.” By using a L2-3 tester, you could get an incredibly more precise measurement (Spirent Test Center has a precision of 2.5 nanoseconds at 100Gbps, scaling to the Tbps).
  2. Add a signature field in your frames. For high-accuracy you need dedicated clocks, and we use a highly accurate one in our Spirent Test Center modules. This is fine for L2-3 testing because it’s the only way to do it, and also because it doesn’t impact the payload. However, when you test a full Application-layer product, where do you put that timestamp/signature? Nowhere. All the fields of all the layers are used by their respective protocols. If we add a signature field, our traffic is not 100% realistic anymore.

Last but not least, both of these solutions only work in two-arms testing scenarios – since you need the receiving end of a packet to implement the logic to measure and report the latency. Two-way latency works in both two-arm and one-arm scenarios.

Avalanche measures latency in a passive way – by measuring the state of the TCP connections. It limits the measurement to the two-way latency, but it gives us all kinds of interesting information. More importantly, we perceive the end-users perspective. Keep this in mind: the goal of any network is to provide data transfer in the most efficient way possible for the users. Their perspective, at the end of the day, is all that counts! 

What matters when you move up in the stack is, for instance, the HTTP response time, or the time it takes to open a connection – because those are the metrics of the protocols belonging to the OSI layer you are currently testing.

But I must admit that sometimes, this view on things is not enough. For instance, some devices will not let traffic pass through unless it’s stateful. And you need this traffic to be realistic because it’s a smart DPI device. It would be hard to pull this test off with a L2-3 tester (I’m not saying it’s not technically possible, because it is, but it is time consuming and provides bad L4-7 stats, obviously).

One of the stats to look at is the Time to SYN/ACK. This metric tells you how long it takes to send a SYN packet and receive the SYN/ACK response back. SYN packets also happen to be what gets inspected the most by firewalls (to check IP and TCP options against their rule base, at the very least) so it essentially tells you the two-way latency including the initial processing time of the firewall.

The counterargument is that sometimes, you don’t want to focus so much on the SYN/ACK exactly for the same reasons that you sometimes want to focus on it. Fortunately, Avalanche records another stat: the Round-Trip Time. Here’s the Wikipedia definition:

“[…]the round-trip delay time (RTD) or round-trip time (RTT) is the length of time it takes for a signal to be sent plus the length of time it takes for an acknowledgment of that signal to be received. This time delay therefore consists of the transmission times between the two points of a signal.

[…]In the context of computer networks, the signal is generally a data packet, and the RTT is also known as the ping time. An internet user can determine the RTT by using the ping command.”

A more technical definition is given in the RFC 6349:

Round-Trip Time (RTT) is the elapsed time between the clocking in of the first bit of a TCP segment sent and the receipt of the last bit of the corresponding TCP Acknowledgment.

The RTT cannot be seen during test Runtime, but it’s possible to see it in the post-run “Realtime” results. Simply open the results in the Analyzer and create a Real-Time Trending Graph by clicking on the appropriate button:

A window pops-up where you need to choose which statistics to graph. Note that you can:

  1. Graph as many metrics as you like. For instance Average Time to SYN/ACK and Average RTT. I will do this in this case.
  2. Open as many tests as you like. If you do, the graph will include all the metric(s) you specified for all of the open test(s).
Creating a trending graph

Creating a trending graph - Time to SYN/ACK and RTT here

This then will create the graph appropriately. Right click on it to export to clipboard or JPG or CSV.

Trending Graph showing average RTT and Time to SYN/ACK.

I said earlier that you could open multiple tests at the same time. Below is an example. You will see the same test as above, as well as a second test (which didn’t go so well).

Trending Graph showing the same metrics over two test runs.

One last important comment: the two-way latency is just the addition of the downstream and the upstream latency. These are rarely equal. This means you cannot simply take the RTT value, divide it by two, and extrapolate that this new value is the one way latency.

In most networks, and Internet first among them, packets don’t follow the same paths. Routing protocols are very dynamic, take price, performance and availability in consideration when doing their work, and these conditions change all the time. For these reasons you can’t consider the two-way latency as twice what the one-way latency is.

Do not write in your reports the one-way latency as half of the two-way latency. Even when you have a complete control over the network this can be misleading. If you test a firewall for instance, the inbound traffic can be more heavily analyzed than the outbound traffic. The result is that in theory the latency will be higher for the inbound than the outbound traffic.


About acastaner

I'm the EMEA Technical Lead for Application & Security at Spirent. I specialize in layer 4-7 technologies, Cloud, Programming and CyberSecurity.
This entry was posted in General, Tutorial and tagged , , , , . Bookmark the permalink.

3 Responses to Realistic Latency Measurement in the Application Layers

  1. Sabu T says:

    SO how do we find one-way latency ? Or are you saying we should be mentioning two-way latency at all and that when we say latency, it should be mean two-way latency ?

    • acastaner says:

      Yes, because of the nature of the application layer it’s impossible to measure the one-way latency to my knowledge. Unless we use a protocol in that layer specifically designed to do that. I guess somebody could write a web application that would put a high accuracy (that’s the tricky part) timestamp in some HTTP header or body and measure it on the other end. But in that case it’s not “realistic” traffic, ie: it’s test traffic, not, say, Twitter or Facebook or Netflix traffic you might want to emulate.

      With that in mind, my personal opinion is that, yes, we should accept this constraint and only mention two-way latency (or “round-trip time”) when testing in the layer 7. This is good because it represents the actual quality of experience of a user, and that’s what matters in these layers. One-way latency is a lower-layer metric, when you develop/test a switch or router and want to understand how much delay it adds to the network.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s