Thursday, December 8, 2016

Opportunistic Key Caching - Fast roaming with OKC

For devices (and wireless networks) that support Opportunistic Key Caching, this non-standard fast-roaming technique can make roaming times very fast.

In a WPA2-Enterprise Wi-Fi network, a Pairwise Master Key (PMK) is created during the process of EAP authentication between the wireless client and the AP it is connecting to. The PMK represents the Robust Security Network Security Association (RSN-SA) between the client and the AP. The PMK is also used to create the Pairwise Transient Key (PTK), which is used to encrypt frames between the client and AP.

The PMK generated after a full EAP authentication is only good between the client and the AP it initially connected to. If the client roams to a new AP, a new PMK must be generated through the EAP process. Part of the EAP process includes the 4-way handshake, which generates the PTK for encrypting data. The first frame of the 4-way handshake, which is from the AP to the client, includes an identifier for the PMK, called the PMKID. The PMKID is simply a 128-bit hash of the PMK, the client's MAC address, and the AP's MAC address. Below is an example of a PMKID seen in a wireless packet capture.

Figure 1: PMKID Captured During 4-Way Handshake
If wireless clients and wireless distribution systems cache PMKs between clients and APs, the PMKID can be used when a client roams "back" to an AP that it had been authenticated to previously. This would speed up roaming "back" to an old AP, since the full EAP authentication would not need to take place; the PMK already exists. Just the 4-way handshake would be necessary to generate the PTK. Think of the scenario shown below, where a client roams between two APs.

Figure 2: Roaming Back to an Old Friend
When the client roamed "back" to AP1, the PMKID could be sent in the re-association request. The client already has PMK1, and if the wireless distribution system cached PMK1, they authentication could proceed directly to the 4-way handshake without a full EAP authentication.

This certainly helps, but only if the client roamed back to an old AP. It still needs to complete a full EAP authentication when roaming to AP2, which usually takes at least 200ms. This is where Opportunistic Key Caching comes in. OKC is a method to calculate a new PMK between a client and an AP that it had never authenticated to before. As long as the client had authenticated to one AP in the distribution system, a new PMK could be calculated, by both the client and the distribution system, without having to do a full EAP authentication. All it requires is that both the client and distribution system use the same mathematical formula to calculate the new PMK.

A sure fire way to tell that a client supports OKC is to look at the reassociation request it sends when roaming to an AP it had not been previously authenticated to. It will include a PMKID in the reassociation request, even though it had not established a PMK with that AP previously.

Figure 3: PMKID in Re-association Request
Note that this is not the same PMKID that is shown in Figure 1. At this point, if the wireless distribution system the client is connected to does not support OKC, a full EAP authentication will start. If the distribution system does support OKC, the 4-way handshake will start after the re-association response.
Figure 4: OKC In Action
In this example, the use OKC results in a roam time of a 36ms.

OKC is supported by default in recent versions of controller-based Cisco wireless solutions. You can watch the magic happen by using the "debug client <macaddress>" command from the CLI. When the client roams using OKC, you will see this in the output:

Figure 5: Computing New PMKID
If your wireless clients support it, OKC can be handy for making clients roam faster. Unfortunately, not all clients do. Most notably, OKC is not supported by any Apple iOS devices. The standard for fast roaming, 802.11r, results in roam times that can be even faster than OKC.

Thursday, November 24, 2016

Cisco Optimized Roaming: Client behavior with 11v vs. without 11v

With or Without V

In my previous blog entry, I discussed what is necessary to get BSS Transition Management working with Cisco controller-based Wi-Fi networks. In this entry, I wanted to present a comparison of client behavior when BSS Transition Management is enabled to when it is not enabled.

For you to see 11v frames from your Cisco network, either aggressive load balancing or optimized roaming must be enabled. For this discussion, I will focus on optimized roaming. The optimized roaming engine will keep track of client statistics, such as RSSI and data rate, and disassociate clients that don't meet configurable thresholds. If BSS Transition Management is enabled (at the WLAN level) in combination with optimized roaming, the AP will send a transition request to a client before disassociating it, giving the client time to roam to a better AP. If BSS Transition Management is not enabled, or the client does not support it, the client will simply be disassociated.

Before I dive into the packet captures, we need to discuss a specific detail of optimized roaming: the engine only looks at RSSI of data packets, not management or action. To see optimized roaming work, the client must be moving data.

The test client was my Windows 10 Dell laptop with an integrated Intel 8260 dual-band wireless adapter. The advanced configuration for the adapter has a parameter called "roaming aggressiveness," which has 5 options, from lowest to highest. According to Intel's documentation, the setting "lowest" means the client will not roam unless it loses connectivity. I set roaming aggressiveness to "lowest" for the tests, so the optimized roaming engine would try to get my client to roam before it decided to itself.

The test WLAN had an SSID of Test, 5 GHz only. The WLAN was configured for WPA2-Enterprise with Fast Transition. BSS Transition Management was on for the first test, then turned off for the second.

The test setup was identical to my first post: two APs in local mode and two APs in sniffer mode, watching the same channels as the local mode APs nearest to them.

Figure 1: Testing setup
I would start off by associating to the "Back" AP, then moving towards the "Front" AP. Because I had set the roaming aggressiveness of my client to its lowest setting, I could get to line-of-sight of the "Font" AP and still be connected to the "Back" AP. I would verify what AP my client was connected to by issuing a netsh wlan show interface command from a command prompt. The output of this command will show what channel the client is on, so I could tell what AP it was connected to.

With 11v BSS Transition Management


Once the client reaches a point where the optimized roaming engine determines it should roam, a transition management request is sent to it.

Figure 2: Transition Management 

You can see in figure two that the client sends probes prior to sending the transition management response. I guess it wanted to confirm that there was an AP on the channel indicated in the transition request frame. Looking at the time stamps, it took about 50ms for the client to re-associate to another AP.

To be honest here, it looks like the client was not happy with the SNR it was seeing from the new AP. There are probe requests/responses on channel 64, which was the channel of the "Back" AP it had been associated to. This could explain why the roam took 50ms.

Without 11v BSS Transition Management 


In this test, my client was line of sight to the "Front" AP when the optimized roaming engine sent a disassociate frame.

Figure 3: Abrupt Disassociation

You can see from figure 3 that it took 800ms for the client to realize what had happened and send out probe requests, then another 90ms to get connected. Total time from disassociation to re-association response is nearly 900ms. Luckily, it was able to re-associate without having to do a complete EAP authentication cycle, otherwise the roam would have taking about a full second.

Conclusion


While 900ms may be a tolerable roam time for a data client, it is too long for voice applications. Even if you only have data clients, sticky clients ruin the party for other clients by using low data rates and consuming more air time. If you are going to use Optimized Roaming, 11v BSS Transition Management offers a way to gracefully move sticky clients to a better AP.

Comments? Suggestions? Please leave a comment below or reach me on Twitter @GiantsNerd.

Monday, November 7, 2016

Cisco 11v BSS Transition Management Frames





Welcome to my first post in what will hopefully become a series of posts on Wi-Fi technologies. Today, I want to talk about 802.11v. More specifically, the BSS Transition Management component of the 802.11v standard.

One of  problems encountered in wireless networks with mobile devices is "sticky" clients. As a wireless device moves around an area where an ESS is deployed, it makes the choice of which AP to associate to. This choice is usually made based off of RSSI; the client will connect to the AP that it hears the best signal from. As that client moves away from the AP it initially associated towards other APs that have a better RSSI, the client may choose to associate to the new AP.

The problem is the "may choose" part. The decision to roam is entirely up the client. Some clients will only roam when they can no longer hear the AP they initially connected to. Some clients will allow users to configure a roaming aggressiveness setting so they can tune how likely their client is to roam. Either way, the decision to roam is the client's to make. This will lead to sub-optimal conditions where a client would be better served if it would just roam to a different AP.

802.11v BSS Transition Management to the rescue. The 11v BSS Transition Management function allows a wireless distribution system to request a client that supports 11v to roam to an AP that will serve the client better. The client still makes the decision to honor the request or not, but it does offer more control over the roaming process.

There are 3 types of 11v BSS Transition Management frames: query, request, and response. A query can be made from a client that supports 11v after it roams to make sure that the AP it roamed to is the best one. A request is a frame sent from a DS to a client requesting that it roam to a different AP. This request can include a BSSID and channel of an AP that the client could roam to. Finally, the response is sent by a client to a DS in reply to a request. All three types are seen over the air as 802.11 action frames. 

To see 802.11v in action, you need a client that supports it. Apple devices that support iOS 7 or better support 11v. Windows 10 encourages 11v, but you will need a wireless adapter and driver that supports it. As always, make sure you have the latest drivers; sometimes the drivers that ship with your device don't support the latest features. To see if a client truly supports 802.11v Transition Management, you can capture an association request frame from a client. If you want to learn how to do this with a Cisco WLC and lightweight APs, see this article. Here's an association request from my Windows 10 laptop with an Intel 8260 adapter. The information element (IE) is under tagged parameters/Extended Capabilities.

Figure 1
You will also need to enable 11v support on your wireless network. On Cisco wireless networks, this is set at the WLAN level in the Advanced tab.

Figure 2
The Disassociation Imminent option sets a flag in 11v request telling the client that it needs to roam, or it will be disassociated after a certain amount of time. In this dialog, "TBTT" is equal to the beacon interval.

To save my fingers from typing "BSS Transition Management," I'm going to abbreviate it as "BTM." No matter how hard I tried, I could not get my Windows 10 laptop to send a BTM query. I also did testing roaming back and forth between two APs, and never once did I see a BTM request from the DS. Something was missing in the configuration of my Cisco wireless network. It turns out that there are only two conditions under which a Cisco AP will send a BTM request: If you have aggressive load balancing enabled on the WLAN, or if you have optimized roaming enabled at the radio level.

Aggressive load balancing allows an administrator to attempt to balance the number of clients associated to APs in an area. If more than one AP can hear the probe response ACK from a client, the controller will try to steer the client to a less-loaded AP when it attempts to associate. Aggressive load balancing combined with 11v BTM will allow the client to associate to an AP, after which the AP will send a BTM request, asking the client to move to a less loaded AP.

Figure 3
Without 11v, aggressive load balancing will deny association to the more loaded AP until the client reaches a defined number of attempts, after which it will allow the association.

I setup the scenario in Figure 3 with my Cisco wireless network. AP1 was using channel 64,60 and AP 2 was using channel 36,40. I also had two more APs in sniffer mode watching the same channels. Aggressive load balancing was enabled for the WLAN. Two devices were already connected to AP 1 when I tried to connect a third device that supported 11v.
Figure 4

You can see in Figure 4 that immediately after associating, the client is sent a BTM request, because the AP already has two other clients connected to it, and nearby AP2 has none. The details of the BTM request are shown below. A reference on the details of BTM request frames can be found at Allen Huotari's excellent Cisco Blog post.

Figure 5
The latest version of Wireshark does not completely decode the BTM request frame, but I have indicated the important sections of the candidate list section. Underlined in red is the BSSID of the AP the client is being requested to roam to, and highlighted in yellow is the channel number of the target AP. 0x24 translates to 36 is decimal, so we are seeing the expected channel of AP2.Note that the disassociation imminent flag is set, telling the client it needs to roam or it will be disassociated. Take a look at the disassociation timer field, which is set to 1953. This correlates to the "Disassociation Time" in Figure 2.

The client responds with a BTM response frame, shown below.

Figure 6
The key items to look for here are that the "BSS Transition Target BSS" matches the BSSID given in the BTM request, and that the values for the Dialog token field match.

The other scenario in which a Cisco wireless DS will send a BTM request is when optimized roaming is enabled, and the AP detects that the connection between the AP and client is not ideal and thinks that asking the client to roam to another AP would provide better connectivity. Optimized Roaming is configured globally for the entire radio (802.11a or 802.11b), under Wireless->Advanced.

Figure 7
Optimized Roaming can assess the health of a client's connection by using RSSI, data rate, or both. By default, only RSSI is considered. For Optimized Roaming to work, Coverage Hole Detection must be enabled for the same radio band. It is under the Coverage Hole Detection section where you can set the RSSI values that will trigger an Optimized Roaming condition.

Figure 8
Optimized Roaming does not depend on 11v. Without 11v support, Optimized Roaming will simply disassociate a client in the hopes that it will re-associate to a better AP. With 11v enabled, the Cisco AP will send a BTM request to the client, telling it where to go. This should result in a better roaming experience for the client, since it will not have to scan the channel for the next AP.

To see this in action, you can enable client and 11v events debugging on the WLC. Here is what I saw when my client triggered an Optimized Roaming event:

*apfMsConnTask_5: Nov 07 11:25:49.305: e4:b3:18:67:54:d0 Optimized Roaming : Client RSSI(-77) is lower than the association RSSI threshold(-74), reject the association request
*apfMsConnTask_5: Nov 07 11:25:49.305: e4:b3:18:67:54:d0 Client is triggering BSS Transition*
apf80211vTask: Nov 07 11:25:49.306: e4:b3:18:67:54:d0 apf80211vSendPacketToMs: 802.11v Action Frame sent successfully to wlc
*apf80211vTask: Nov 07 11:25:49.306: e4:b3:18:67:54:d0 Setting Session Timeout to 4 sec - starting session timer for the mobile
*apf80211vTask: Nov 07 11:25:49.306: e4:b3:18:67:54:d0 Setting Session Timeout to 40 sec - starting session timer for the mobile 

On the air, the Optimized Roaming BTM request looks similar to the aggressive load balancing one shown in figure 5. In my scenario, my client was connected to AP2, and a BTM request was sent to suggest I move to AP1.

Figure 9
Again, the target BSSID is underlined in red and the channel is highlighted in yellow. I'll leave it up to the reader to confirm that the BSSID matches what you would expect from looking at figure 4. A key difference between this frame and the one shown in figure 5 is the value of the disassociation timer. It's close to 400, and is an order of magnitude smaller. This correlates with the "Optimized Roaming disassociation time" value in Figure 2.

Overall, the BSS Transition Management feature is a good way to gracefully tell clients that they should roam, and where they should roam too. I see this being an important feature in wireless VoIP networks. A key takeaway from this blog is that in order to see any benefit from 11v features in Cisco wireless networks, you either have to use aggressive load balancing or Optimized Roaming. I don't really recommend aggressive load balancing, and together with Optimized Roaming it could cause some unwanted problems. Remember that Optimized Roaming requires Coverage Hole Detection.

That's it for now. I hope to add more entries in the future. Stay tuned!