Friday, September 29, 2017

Using SQL Queries to Analyze AP Neighbor Information

There's been debate on the state of 2.4GHz Wi-Fi. Some say 2.4 GHz is deceased, that it has kicked the bucket, shuffled off its mortal coil, joined the choir invisible. Others say it's not quite dead yet. I'm in the latter group, primarily because I have devices and applications that rely on it.

One thing most Wi-Fi engineers will agree on: if you have a high-density 5 GHz network of dual-radio APs, you will need to turn off some of the 2.4 GHz radios. There are only 3 non-overlapping channels available, and since 2.4 GHz propagates better than 5 GHz, leaving all radios enabled will result in large amounts of co-channel interference. And that's bad.

*Heavy Sigh*
So how do you decide which radios to turn off?

If you have a Cisco wireless network with lightweight APs, you can leverage RRM to help. RRM uses Neighbor Discovery Packets (NDP) to measure the RF "distance" between radios. The end result is two sets of tables: the receive neighbor table and the transmit neighbor table. The receive neighbor table contains a list, for each radio, of what other radios it can hear NDP packets from. The list contains which channel the neighbor was heard on, and the signal strength it was heard at last. The transmit  neighbor table contains a list, again for each radio, on how well other radios can hear it. The transmit neighbor table also contains channel and power information for each neighbor.

With this information, you could set a criteria for disabling a 2.4 GHz radio. If a radio has more than X receive neighbors on the same channel as it does, with a power greater than Y, that radio should be disabled. Another criteria could be if a radio has more than A transmit neighbors on the same channel as it does with power greater than B, that radio should be disabled.

Getting the information to determine this is not easy through the CLI or web interfaces. You could do it, but it would be very time and tedious. Thankfully, there is WLCCA. WLCCA is a GUI tool that can read the output of the "show run-config" command and parse it for very valuable information. One of the things it can do is export the receive neighbor table in CSV format. Once it is in CSV format, it can be imported into a database and T-SQL can be used to get the information we want.

For WLCCA to work, you need to give it the output of "show run-config." One way of getting the output is logging the output of a CLI session to text file and issuing the command. Another way is to establish a CLI session and use the transfer upload commands to send the output to a TFTP server. This is the preferred method, especially if you have a controller with hundreds of APs. You may need to extend the timeouts on your TFTP server past the defaults for the transfer to complete successfully.

After you have the exported run-config, open WLCCA and import the file.

You'll be prompted with the dialog below to chose certain analysis options. You can uncheck most of the options. After clicking OK, you'll be prompted to select the file.

After importing the file, WLCCA can export the neighbor table. You perform this by going to Report Center menu.

A file dialog will open to allow you to chose the location and name of the exported file. The name you enter will be appended with "-Nearby" and given a .csv extension. Repeat these steps and export the AP Configuration List (CSV). This file will have "-APsConfig" appended to the file name you enter.

The next step is to import the CSV files into a database. For this blog, I chose Microsoft Access. Before importing in Access, the CSV files need to be cleaned up a bit to make the process smoother.

Open the CSV file for the neighbor list with Excel or a text editor. You will need to delete the first and third lines of the text file, and change the column headings so they don't contain spaces. Below is an example of the Nearby file with the edits made.

You'll also need to edit the APsConfig file in a similar way. This file has many columns, but for our purposes only four are necessary: the ID, AP name, 2.4 GHz channel, and 5 GHz channel. I used Excel for this. My polished data looked like this.

Now that the raw data is in an acceptable format, it can be imported in Access. Launch Access and create a new blank desktop database. Click on the External Data tab, then Text File to launch the wizard. First select the Nearby CSV file. Choose "Import the source data into a new table in the current database," and click OK. Next you'll be asked how to parse the file. Select "delimited" as shown and click Next.

Next, select the delimiter as a comma, and check the box to indicate that the first row contains field names.

Click Next, then Next again. Select the option to let Access add a primary key field, then click Next.

Give the new table a name. I use RxNbrs, which I will reference later in queries. Click Finish.

Repeat these steps to import the APsConfig file. There is one different step in the import process for the APsConfig file; you can chose the ID field as the primary key instead of asking Access to add one for you. When you finish, name the table APsConfig.

I promise, we're getting to the good stuff now. The first SQL query I will write will create the transmit neighbor table from the receive neighbor table. Click on the Create tab, then Query Design.

Don't add any tables, just click Close on the Show Table dialog. Switch to SQL view by selecting it in the upper left corner. Here's the query that will get the transmit neighbor table from the receive neighbor table:

SELECT RxNbr as [TxAP], AP AS [TxNbr], Power, Slot, Channel

The tx neighbor table is really just the inverse of the rx neighbor table! Click on the Save icon and name the query TxNbrs.

Now the real fun begins. Let's combine the information in the APsConfig table and the RxNbrs table to see what radios have Rx neighbors on the same channel they are configured for. Go to the Create tab again, and click Query Design. Click close on the Show Table dialog, and enter SQL view. Here is the query.

SELECT APsConfig.Name, COUNT(RxNbrs.RxNbr)
FROM APsConfig LEFT JOIN RxNbrs ON (APsConfig.[2dot4channel] = RxNbrs.Channel) AND (APsConfig.Name = RxNbrs.AP)
WHERE RxNbrs.Power > -61
GROUP BY APsConfig.Name
HAVING COUNT(RxNbrs.RxNbr) > 2
ORDER BY APsConfig.Name;

This query will produce a list of 2.4 GHz radios that have 3 or more neighbors on the same channel that it hears at a power greater than -61 dBm. You can tune the power level and count to your liking. I chose -61 because that is the level at which, even if the neighbor was transmitting at minimum power, the radio would hear it at about -82 dBm. (Remember, NDP packets are sent at the maximum power supported by the radio.)  My environment has several radios that meet this criteria.

Save this query as RxCandidates. Next, let do the same analysis for the tx neighbors. Here is the query:

SELECT APsConfig.Name, Count(TxNbrs.TxNbr) AS [CountOfTxNbr]
FROM APsConfig LEFT JOIN TxNbrs ON TxNbrs.Channel = APsConfig.[2dot4channel] AND TxNbrs.TxNbr = APsConfig.Name
WHERE TxNbrs.slot=0 AND TxNbrs.power >-61
GROUP BY APsConfig.Name
HAVING Count(TxNbrs.TxNbr) > 2

This query is getting the list of radios that have more than 2 tx neighbors that see the radio with a power greater than -61 dBm. Again, my environment has plenty of those.

Save this query as TxCandidates. Now we have a pretty good picture of  which radios can see other radios at high RSSI, and what radios can be heard by others at high RSSI. We can select radios that meet both criteria by executing the following query:

FROM RxCandidates 
WHERE Name IN (SELECT Name FROM TxCandidates);

Out of 480 APs in my sample deployment, 56 matched both my Rx and Tx criteria. This tells me that these 2.4 GHz radios occupy spaces that are already well covered by other radios, and can probably be disabled without affecting coverage. There are always caveats; make sure that the 2.4 GHz radio isn't necessary for RTLS or other services.

I know that the WLCCA tool is awesome, and has built-in reporting to help you find redundant radios. I just like working with data in SQ. If you are interested, try this in your own environment. If you get stuck, reach out to me on Twitter and I'll see if I can help.

Saturday, August 26, 2017

FRA and Macro/Micro Cell Operation - Part 2

Part 1 of this blog series looked at how Cisco 2800/3800 APs running in dual-5 GHz mode can steer clients from the macro cell to the micro cell using 802.11v BSS Transition Management frames. In this installment, I will look at what methods can be used if your clients don't support 802.11v.

Before going into the details of the other method (probe suppression), here is what I have observed while testing a mix of clients:
  • Both Android and iOS devices responded well to 802.11v Transition Management Requests. Sometimes the iOS device I was testing with would reject the request with a reason code 6, but most of the time it accepted the transition. 
  • If there are enough clients connected to the macro cell to warrant a transition to the micro cell, a client that does support 802.11v will be moved, even if it was in the macro cell "first." 
  • According to the latest RRM White Paper, if a client does not support 802.11v, but does support 802.11k, it can be transitioned, but not as gracefully. The client must request a neighbor report, and the returned neighbor list will be limited to the BSSID of the micro cell. The client will then be disassociated, after which it will hopefully connect to the micro cell. I was not able to replicate this; it was hard to find a client that supported 11k but not 11v. Turning off 802.11v on the WLAN resulted in no clients being transitioned at all, whether or not they supported 11k. 
Configuring probe suppression is shown below. Probe suppression can be configured to suppress only probe responses, or both probe responses and auth responses. 

(Cisco Controller) >config advanced client-steering probe-suppression enable probe-and-auth

(Cisco Controller) >show advanced client-steering

Client Steering Configuration Information

  Macro to micro transition threshold............ -55 dBm
  micro to Macro transition threshold............ -65 dBm
  micro-Macro transition minimum client count.... 1
  micro-Macro transition client balancing win.... 1
  Probe suppression mode......................... probe-and-auth
  Probe suppression validity window.............. 100 s
  Probe suppression aggregate window............. 200 ms
  Probe suppression transition aggressiveness.... 3
  Probe suppression hysteresis................... -6 dBm

The macro to micro transition threshold has a similar meaning with probe suppression as it did with 11v transition. If a new client is a transition candidate, probes received on the macro radio with an RSSI stronger than the macro to micro threshold will have their responses suppressed.

Probe suppression steering introduces four new parameters, only two of which are user configurable. The parameters perform the same function as those under Wireless -> Advanced -> Band Select, but have slightly different names.

The probe suppression aggregate window is an amount of time that a burst of probes from a client on a single change are considered a single probe. This is similar to the Scan Cycle Period Threshold value in Band Select. Sometimes clients will sends out probes in bursts of multiple probes. Below is a Motorola G4 probing out on 5 GHz. It sends bursts of 5 probes on the same channel, just milliseconds apart. The client-steering engine will treat these 5 probes as a single probe because they all happened within 200ms.

Probe Bursts

The probe suppression validity window is the amount of time that could elapse between probes (or bursts of probes) from a single client received on the macro radio. The default value is 100 seconds, and it acts as an age-out timer.

The validity window works with the transition aggressiveness value, which corresponds to the probe cycle count value under Band Select. The transition aggressiveness value sets a limit on the number of times probe responses from the macro radio will be suppressed. The default is 3. If a probing client was a candidate to have probe responses from the macro cell suppressed, and the client had probed out on the macro channel 3 times within 100 seconds, the fourth probe (or burst) on the macro radio would be answered. This allows clients to connect to the macro cell if they refuse to connect to the micro cell because the RSSI at the client is too low.

The probe suppression hysteresis is a user configurable value between -3 and -6 dBm, with the default being -6. When Cisco uses the word hysteresis, it refers to a dampening method to prevent clients from bouncing back and forth between radios. In the context of Client Roaming, under Wireless -> 802.11a/b, the hysteresis value tells CCX clients to move to a new AP only if the RSSI value is 3 dB better than the current AP. I stumbled across the meaning of the hysteresis in probe suppression by trying to adjust the values of the transition RSSI thresholds.

(Cisco Controller) >config advanced client-steering transition-threshold macro-to-micro -60

Value must be greater than micro to Macro RSSI - probe suppression hysteresis

(Cisco Controller) >config advanced client-steering transition-threshold micro-to-macro -60

Value must be less than Macro to micro RSSI + probe suppression hysteresis

In this case, it looks like the -6 dBm hysteresis means that probes for clients already associated to the AP would have to be 6 dB weaker/stronger to get moved to the other cell. This makes sense, as you don't wont the client bouncing back and forth between the micro and macro cells because of small differences in RSSI that could just be from different client device orientations.

My testing with probe suppression for client steering was mostly subjective. Since the clients did not associate, I could not use "show client detail" to see the RSSI of the probe requests at the AP. I could definitely see probe suppression in action over the air. Below is a capture on channels 44 and 161. The macro cell was on channel 161, and you can see probes on 161 being ignored.

Probe Suppression of Macro Cell
The client connects to the micro cell on channel 44.

Other testing I conducted involved the transition aggressiveness factor. My Moto G4 cycles through the 5 GHz channels in about 6 seconds. With a transition aggressiveness factor of three, it should take about 24 seconds to see probe responses from the macro cell. My observations lined up with this prediction within a few seconds.

Overall, I didn't find the probe suppression method of client steering to be as predictable as the 11v method, but it did work satisfactorily. Given that most clients now support 11v I would prefer using that method over probe suppression.

Sunday, August 13, 2017

FRA and Macro/Micro Cell Operation - Part 1

NOTE: This blog is not about the merits, performance, or lack thereof with dual-5 GHz radios. This is a blog about the operation or dual-cell APs, specifically how the AP transitions clients between the macro and micro cells.

Cisco 2800/3800 APs support Flexible Radio Assignment, which allows the 2.4 GHz radio to flip to either a monitor or another client-serving 5 GHz radio. When the AP is operating in this dual-5 GHz mode, the normal radio (slot 1) powers the macro cell, and the flexible radio (slot 0) powers the micro cell. The terms macro and micro are used for two reasons. When the flexible radio is put into 5 GHz mode, either automatically through the FRA algorithm or manually, the radio switches to a more directional antenna than the normal 2.4 GHz antenna. See the antenna radiation patterns of the 3800 from the AP2800/3800 Deployment Guide:

AP3800 Antenna Patterns
The second reason is the reduction in power on the flexible radio when operating in dual-5 GHz mode. The flexible radio is locked into transmitting at the lowest power supported, which is usually 2 dBm. The reduction in power, along with a mandatory separation of at least 100 MHz between the micro and macro radios, is to reduce the near-field effects of the two radios interfering with one another.

Looking at the elevation pattern (right), you can see that the macro radio has a "dead spot" directly below the AP. The micro radio (blue line) has a 15 dB advantage over the macro radio in this dead spot. Unfortunately because of the power limit on the micro radio, most clients will still perceive the macro radio as having a higher signal strength. This is even more true when not directly under the AP.

In order to take advantage of the micro cell, the AP/Controller has to have a way to nudge clients that connect to the macro cell over to the micro cell. This mechanism is called client steering, and the default method to steer clients is the 802.11v BSS Transition Request.

To see how client steering works with flexible radios the following settings must be made on the controller:
  • Flexible Radio Assignment must be enabled globally under Wireless -> Advanced. 
  • The flexible radio in the AP2800/3800 can either be set to Auto or client serving. When in auto mode, the FRA algorithms determines if it is better to leave the radio in 2.4 GHz client serving mode, monitor mode, or 5 GHz client serving mode. For my testing I manually configured the flexible radio as 5 GHz client serving. 
  • BSS Transition Management must be enabled for the WLAN that the clients will connect to. 
Unlike normal BSS Transition Management between distinct APs, Optimized Roaming is not required for 11v frames to be used to transition clients between the macro and micro cells. Radio power and channel settings can be left to auto. The power will not be adjustable on the micro radio, even if it it set to manual. For ease of testing, I removed DFS channels from the 802.11a channel plan. This is how my AP3800 looked:

AP Name                          Channel    TxPower       Allowed Power Levels    
-------------------------------- ---------- ------------- ------------------------
FRA-AP                           44*        *8/8 ( 2 dBm) [22/19/16/13/10/7/4/2]
FRA-AP                           149*       *2/7 (16 dBm) [19/16/13/10/7/4/2/0]

It's also helpful to see the BSSID values that are assigned to the macro and micro cells, to be able to confirm values in the BSS Transition Management requests.

(Cisco Controller) >show ap wlan 802.11a FRA-AP

Site Name........................................ TestGroup
Site Description................................. 

WLAN ID          Interface          BSSID                            
-------         -----------        --------------------------       
14              xxxxxxxxxx           58:ac:78:xx:xx:3f  
(Cisco Controller) >show ap wlan 802.11-abgn FRA-AP

Site Name........................................ TestGroup
Site Description................................. 

WLAN ID          Interface          BSSID                         
-------         -----------        --------------------------     
14              xxxxxxxxxx           58:ac:78:xx:xx:30            

Note the difference in the last octet of the BSSIDs between the macro and micro cells. I will reference this later.

The command to list the client steering parameters, their default values and explanation are shown below:

(Cisco Controller) >show advanced client-steering

Client Steering Configuration Information

  Macro to micro transition threshold............ -55 dBm
  micro to Macro transition threshold............ -65 dBm
  micro-Macro transition minimum client count.... 3
  micro-Macro transition client balancing win.... 3
  Probe suppression mode......................... disabled
  Probe suppression validity window.............. 100 s
  Probe suppression aggregate window............. 200 ms
  Probe suppression transition aggressiveness.... 3
  Probe suppression hysteresis................... -6 dBm

Macro to micro transition threshold: This is a value in RSSI above which a client can be transitioned from the macro cell to the micro cell. The default is -55 dBm. This is the RSSI at the AP. For example, if a client connects to the macro cell and its RSSI at the AP is greater than -55 dBm, it is a candidate to be transitioned to the micro cell. 

Micro to macro transition threshold: This is a value in RSSI below which a client can be transitioned from the micro cell to the macro cell. The default it -65 dBm. For example, if a client connected to the micro cell initially and had a RSSI at the AP of less than -65 dBm, it will be transitioned to the macro cell. Given the power difference between the macro and micro cells, is is rare for clients to be transitioned from the micro cell to the macro cell.

micro-Macro transition minimum client count: The minimum number of clients in either macro or micro cells that will trigger a transition for the next client connecting. The default is 3. For example, if there are 3 clients in the macro cell, the 4th client that tries to connect to the macro cell will be transitioned to the micro cell, if it meets the requirements for macro-to-micro transition threshold RSSI. 

micro-Macro transition client balancing window: This specifies the minimum difference in client count between the macro and micro cells that must exist before a client can be transitioned between cells. The default is three. Imagine a scenario where there are 5 clients in the macro cell and 3 in the micro cell. The difference in clients between the cells is 2, which is below the default balancing window value of 3. The next client that connects to the macro cell will not be transitioned to the micro cell, even it it meets the RSSI requirements. Now there are 6 clients in the macro cell and 3 in the micro cell, and the difference in client count now meets the balancing window requirement. The next client that connects to the macro cell will be transitioned to the micro cell, IF it meets the RSSI requirement. 

I only had two 802.11v capable clients to test with, so I changed both the transition minimum client count and transition client balancing window to 1.

(Cisco Controller) >show advanced client-steering

Client Steering Configuration Information

  Macro to micro transition threshold............ -55 dBm
  micro to Macro transition threshold............ -65 dBm
  micro-Macro transition minimum client count.... 1
  micro-Macro transition client balancing win.... 1

With these parameters, I could connect one client to the macro cell, and the second client to connect would (hopefully) get transitioned to the micro cell. I used 'debug client' and specified the MAC addresses for both clients.

I connected the first client to the SSID and confirmed through the CLI that it had connected to the macro cell. When using an AP setup in macro/micro, you will see extra lines in the 'debug client' output that contain XOR:

f8:95:c7:xx:xx:xx Association received from mobile on BSSID 58:ac:78:xx:xx:xx AP FRA-AP
f8:95:c7:xx:xx:xx Station:  F8:95:C7:xx:xx:xx  trying to join WLAN with RSSI 208. Checking for XOR roam conditions on AP:  58:AC:78:xx:xx:xx  Slot: 1
f8:95:c7:xx:xx:xx Station:  F8:95:C7:xx:xx:xx  is not eligible for XOR roam on AP  58:AC:78:xx:xx:xx 

The first client is not eligible for transition to the micro cell because it is the only client in the macro cell. Let's see what happens when the second client connects to the macro cell.

80:00:6e:xx:xx:xx Processing assoc-req station:80:00:6e:xx:xx:xx AP:58:ac:xx:xx:xx-01 ssid : xxxxxx thread:1a722e30
80:00:6e:xx:xx:xx Station:  80:00:6E:xx:xx:xx  trying to join WLAN with RSSI 212. Checking for XOR roam conditions on AP:  58:AC:78:xx:xx:xx  Slot: 1
80:00:6e:xx:xx:xx Station:  80:00:6E:xx:xx:xx  scheduled to transition to new BSS on AP  58:AC:78:xx:xx:xx

We see in the second line that the debug output shows a RSSI value of 212. I'm not sure how this scale of RSSI equates to a power level in dBm, but it appears to be above the threshold of -55 dBm. The third line indicates that the client will be scheduled for transition.

Before the client is transitioned, it sends a 802.11k Neighbor Report request. The debug output is interesting.

80:00:6e:xx:xx:xx Got action frame from this client.
80:00:6e:xx:xx:xx Station:  80:00:6E:xx:xx:xx  sent 802.11K neighbor request to AP  58:AC:78:xx:xx:xx 
80:00:6e:xx:xx:xx Station:  80:00:6E:xx:xx:xx  sent request with RSSI (0) to XOR roam capable AP  58:AC:78:xx:xx:xx  Slot 1
80:00:6e:xx:xx:xx Station:  80:00:6E:xx:xx:xx  limiting neighbors to sibling radios on AP  58:AC:78:xx:xx:xx 

Because the client-steering engine had already decided to transition this client to the micro cell, it limits the list of neighbors it will send back to the BSSIDs on the micro cell.

Note that the transition of the client is scheduled; it doesn't happen immediately. Perhaps this is to prevent flapping of clients transitioning between the micro and macro cells as clients join and leave the cell. It may also be delayed to allow clients in motion to roam to other macro cells, instead of being pulled back into the micro cell of a far away AP. My testing indicates that the amount of time that elapses between the association of the client that triggers the XOR roam and the transmission of the 802.11v BSS Transition Management Request varies. If I find more information I will update this blog.

Below we see the sequence of events as the second client is transitioned to the micro cell.

80:00:6e:xx:xx:xx apf80211vSendPacketToMs: 802.11v Action Frame sent successfully to wlc
80:00:6e:xx:xx:xx Setting Session Timeout to 4 sec - starting session timer for the mobile 
80:00:6e:xx:xx:xx Setting Session Timeout to 40 sec - starting session timer for the mobile 
80:00:6e:xx:xx:xx Got action frame from this client.
80:00:6e:xx:xx:xx Processing assoc-req station:80:00:6e:xx:xx:xx AP:58:ac:78:xx:xx:xx-00 ssid : xxxxx thread:18f453d8
80:00:6e:xx:xx:xx Station:  80:00:6E:xx:xx:xx  trying to join WLAN with RSSI 217. Checking for XOR roam conditions on AP:  58:AC:78:xx:xx:xx  Slot: 0
80:00:6e:xx:xx:xx Station:  80:00:6E:xx:xx:xx  is not eligible for XOR roam on AP  58:AC:78:xx:xx:xx

Line 1 is the debug entry for sending the 11v BSS Transition Request, which is acknowledged in line 4. The client re-associates in line 5. Line 6 indicates that the client is associating to the XOR radio slot 0, which is the micro cell. In line seven we see that the client is not eligible for transition from the micro back to the macro cell: the difference in the number of clients between the cells (0) is not greater than the transition client balancing window (1).

Over the air, we see a BSS Transition Management request that includes a candidate list. The only entry in the candidate list is the BSSID of the WLAN on the micro radio.

BSS Transition Management Request
Wireshark does not completely decode the candidate list entries, but they are in the same format as an 802.11k Neighbor Report. I highlighted the important elements: the BSSID and the channel number. The BSSID matches what we expect, and so does the channel number, 44. The client responds to the request with the following action frame.

BSS Transition Management Response

The important fields here are the BSS Transition Response Status Code and the BSS Transition Target BSS. The Status Code communicates whether the client accepts or rejects the request, and a value of 0 indicates that client accepts it. The Transition Target BSS indicates the BSSID that the client intends to transition to. In this case, it matches the BSSID in the candidate list from the request frame.

Now we see that the second client has been transitioned to the micro radio on slot 0.

(Cisco Controller) >show client summary 
MAC Address       AP Name           Slot Status        WLAN  Auth Protocol         Port Wired Tunnel  Role
----------------- ----------------- ---- ------------- ----- ---- ---------------- ---- ----- ------- ----------------
80:00:6e:xx:xx:xx FRA-AP             0   Associated     14   Yes   802.11n(5 GHz)   13   No    No      Local           
f8:95:c7:xx:xx:xx FRA-AP             1   Associated     14   Yes   802.11n(5 GHz)   13   No    No      Local    

That's it for Part 1 of this series on client steering between macro/micro cells on an AP3800. Next I will look at what can be done if clients do not support 11v.

Saturday, July 15, 2017

A Story About Knowledge

I worked as a lab assistant while an undergrad Physics major at a small college in Pennsylvania. My duties involved setting up experiments in lab rooms. Some of the experiments required electricity, which I was not that comfortable with at the time (read: terrified). A professor told me that if I learned the proper respect for electricity, all would be well. I managed to not electrocute myself over the next few years.

In the early 1990s I was accepted into the PhD program in the department of Physics at Rensselaer Polytechnic Institute. I attended on an assistantship, so I was again required to help setup labs. One class I worked in was taught by Wayne Roberge, a PhD in astrophysics from Harvard.

For one lecture session I had to setup a demonstration on electricity and magnetism that required a car battery for a power source. As I was setting up the equipment Dr. Roberge watched with concern. He looked at me and said, "You're being pretty cavalier with that battery." I asked what he meant by that. He said that the battery was dangerous, and I should be more careful with it.

"Why do you think it is dangerous?," I asked. "Well, it can produce 500 amps!," he replied. He may have been referring to the old saying "volts jolts, amps kill," an oversimplification of the fact that only small amounts of electrical current are required to stop a human heart.

At this point, I had the car battery on a table, with the terminals exposed. I grabbed the terminals, one in each hand. Dr. Roberge's eyes looked on in horror, as he thought I was about to die. Surprisingly to him, but not to me, nothing happened. I then explained to Dr. Roberge: the 12 volts of "push" supplied by a car battery is not enough to push current through dry human skin. Even if it could, the human body acts a a capacitor, which will not conduct direct current well.

Here's a guy with a PhD in Astrophysics from Harvard who didn't have a practical working knowledge of everyday electricity. Does that make him dumb, or does it make me smarter than him? No way.

The moral of the story: there are differing kinds of knowledge out there. Theoretical and practical are two kinds, but there are more. All have their place, and all have benefit. Because you think that, in theory, something is bad, doesn't necessarily mean that the reality is bad. When you look at what other people do and think, "that's dumb, I would never do that," consider that you may not have all the information that person had when that decision was made.



Thursday, July 6, 2017

Cisco WLC Fastlane for iOS - What it Does

If  you run a Cisco wireless network, chances are you have heard about Fastlane for iOS. Introduced with 8.3 firmware, Fastlane is a set of configuration changes that tune the wireless network for iOS 10 devices. It is part of a suite of features that resulted from the Cisco and Apple partnership that includes Adaptive 802.11r, and robust 11k/v support on iOS devices. This partnership came about because of the increase in the use of mission-critical applications like Jabber and Citrix on iOS 10 devices.

The main idea of Fastlane is to allow certain apps on an iOS device to send traffic with Voice Access Category (AC) on a network with Call Admission Control without having to use TSPEC. If you have worked on Cisco wireless networks that support phones like the 7925 or 8821, you may be familiar with TSPEC, which is a method that wireless devices use to reserve bandwidth when accessing the Voice AC on a network where Admission Control Mandatory (a.k.a CAC) is enabled. Fastlane is basically a hack that allows iOS devices to access the Voice AC in a non-standard way. In addition, the APs and upstream WLC will preserve any inner DSCP markings on IP packets from iOS devices. This ensures that CAPWAP tunnels packets between the AP and controller are marked with the appropriate DSCP values across the wired portion of the network.

Fastlane configuration is focused on tuning QoS settings for a WLAN, but there are also changes to the entire wireless network. Here is an overview of what happens when you enable Fastlane on a WLAN.
  • The QoS profile for the WLAN is set to Platinum. Remember that the QoS profile acts as a limiter on the QoS markings upstream and downstream between the controller and the APs. To preserve Voice DSCP markings, the WLAN's QoS profile must be set to Platinum. 
  • Enables AVC on the WLAN, and maps an automatically-created AVC profile to the WLAN. More on this later. 
  • Enables Admission Control Mandatory for the Voice AC on both 802.11a and 802.11b networks. Load-based CAC is selected, with maximum bandwidth for Voice set at 50%. This differs from the default value of 75%, which is the recommendation when configuring for Cisco phones. Expedited Bandwidth is also enabled. 
  • Enables QoS Map, Trust DSCP upstream, and creates a DSCP to UP exception map. These are global settings. 
  • Changes the EDCA profile on both 802.11b and 802.11a networks to a built-in profile called Fastlane. More on this later. 
Making all these changes requires disabling both bands, so it is quite disruptive. Plan accordingly. 

For all of this to work, there has to be a way for iOS devices to tell Cisco APs that, well, they are iOS devices. This is accomplished through a tagged Information Element that is included in probe and association requests. 
Apple iOS IE

Conversely, the WLAN needs to let the iOS device know that Fastlane is enabled. This is also accomplished with a tagged IE in probe and association responses, and also beacons. 

IE that tells iOS devices Fastlane is enabled
Note that this IE decodes as Aironet, and appears in beacons even if Aironet IE is disabled on the WLAN. Apparently the value for "Aironet IE data" varies, but the OUI Type is consistent. 

Now I will expand on each of the items that Fastlane configures, starting with the AVC profile. When you enable Fastlane on a WLAN, a new AVC profile called AUTOQOS-AVC-PROFILE is created and mapped to the WLAN. This profile maps DSCP settings to well-known applications used on iOS devices. The details of the AVC profile can be found here, but a picture is worth a bunch of words.

There are many more rules in this set, some of which punish apps like Netflix and BitTorrent with very low DSCP values.

Next in the list is Call Admission Control for Voice. Enabling CAC for the Voice AC will limit client access to the Voice AC to devices that support TSPEC and iOS devices that support Fastlane. Load-based CAC is enabled for Voice on both bands, which is a little suspect. Load-based CAC will reject a lot of calls on 2.4 GHz due to the normally higher channel utilization values. I recommend being careful here if you have voice devices on 2.4 GHz that support TSPEC.

A notable difference between Fastlane CAC and the recommended CAC settings for Cisco voice is the max bandwidth setting. The recommended max bandwidth percentage for Cisco voice is 75%, and Fastlane sets it at 50%. This means that each radio will be able to admit fewer TSPEC calls. It's hard to pin an exact number on how many fewer, however.

Another CAC feature that is enabled with Fastlane is Expedited Bandwidth. Expedited Bandwidth allows a radio to admit a TSPEC call from a CCXv5-capable client that indicates the call is urgent, even if there wasn't bandwidth available to admit the call. Use of this feature requires that the Call Manager be configured appropriately to mark certain calls, like 911 calls, as urgent.

Why don't we take a 5 minute break? 

Fastlane also makes changes to global QoS settings. The QoS Map setting is set to Enabled. This adds a tagged element to Association/Re-association Response frames that tells clients what UP values should be used for sending IP traffic tagged with a certain DSCP value. Below is an image which shows the configuration of the map on the WLC, with the resulting elements included in the Association Response.

QoS Mapping
In addition to the QoS map, there is an exception list, which maps certain DSCP values to UP values outside of the ranges defined in the map. Take a close look at the map above; DSCP 46, normally used for voice packets, is mapped to UP 5, which is classified for video.  The DSCP exception list fixes this, as well as adjusting other values. Below is an image showing part of the exception list, along with the tagged element that appears in the Association Response as a result.

DSCP Exception List

Trust DSCP UpStream is also enabled. This allows the AP to copy the DSCP marking of a packet sent from a wireless client to the DSCP of the CAPWAP tunneled packet towards the WLC. At the WLC, the CAPWAP header is stripped off and the original IP packet is sent on the wire, using the original DSCP marking. For a detailed discussion of Trust DSCP UpStream, I highly recommend the BRKEWN-2000 session presentation (login required) from Cisco Live Berlin 2017, from the one and only Jerome Henry.

Remember that the AVC profile above will also be manipulating DSCP markings. In the downstream direction, when a packet arrives at the AP and is queued to be sent to a client, the AVC profile is applied first, and then the manipulated DSCP value is converted to a UP value through the DSCP exception table. For example, if a Cisco Jabber audio packet arrived at the WLC with a DSCP value of 0x0, the AUTOQOS-AVC-PROFILE would re-write the DSCP value to 46. The DSCP exception table would convert that DSCP value of 46 to a UP value of 6. In turn, the UP of 6 would became an Access Category of 3 (Voice).

The final and perhaps most impactful change Fastlane makes is setting the EDCA profile on both 802.11a and 802.11b networks to the Fastlane built-in profile. If you are unfamiliar with Wireless QoS and the role of EDCA, I recommend Andrew von Nagy's 5-part Wireless QoS series

In short, EDCA parameters determine how long a client must wait to transmit a frame depending on Access Category. Voice AC frames should wait less for access to the wireless medium than Best Effort in order for Voice frames to have a better chance at getting transmitted. The Fastlane EDCA profile sets parameters according to the latest recommendations from the IEEE. Below is a comparison of the EDCA parameters between the default (WMM), Cisco's Voice Optimized, and Fastlane.
EDCA parameters compared
There are two things that stand out to me. First, there is a big difference between Voice Optimized and Fastlane. The Voice Optimized profile is highly biased towards voice traffic, based on the values of AIFSN and CWMIN. If you are currently using the Voice Optimized profile (recommended by Cisco if you are using Cisco wireless phones), changing to Fastlane could potentially have a negative effect on your voice applications.

The second difference is the change in TXOP values from the default WMM profile and Fastlane. The TXOP value specifies a limit, in intervals of 32┬Ás, that a client can hold access to the medium for. Why would you want to limit the amount of time, and is that necessary? It's necessary because of the possibility of large aggregated A-MPDU frames with 802.11ac clients (up to 4 MB!). Without a TXOP limit, once a client gains access to the medium it can send a single gigantic A-MPDU. If that client is at the edge of the cell, that could result in the client sending that big frame at a low data rate, consuming lots of airtime. Setting the TXOP limit requires the client to size its A-MPDU frames small enough that they can be transmitted, including the following SIFS and Block Ack, within the allotted time. This means that low data rate clients cannot monopolize airtime while sending large frames.

What impact will setting a TXOP limit for BE have on your network? In theory, TXOP limits result in smaller A-MPDUs, which means more contention for access to the medium for clients trying to transmit large amounts of data. It also means less efficiency, since there will be more airtime used for headers and block acks. In a very unscientific experiment, I transferred a 250 MB file from a laptop to a network share over a wireless network (clean channel 100, 20 MHz width, 2 spatial streams, all HT and VHT rates enabled). First I used the default WMM EDCA profile, which does not set a TXOP limit for Best Effort traffic. I then used the Fastlane EDCA profile,which specifies a TXOP limit of 79 for BE (about 2.5 ms).

With no TXOP limit, I saw A-MPDU frames with up to 60 sub-frames, with total size close to 100KB. With TXOP limit of 79, most A-MPDUs had 30 sub-frames, with total size of only about 45KB.

To compare efficiency, I graphed the number of Block Acks per second sent from the AP to the client. Here's what that looked like.

Restricting the A-MPDU size clearly results in more overhead in Block Acks. It's also clear from the graph that the data transfer without a TXOP limit finished a couple of seconds faster. We can also look at channel utilization by looking at the QBSS element in the beacons while the data transfer took place.

Channel Utilization Compared for TXOP Values
Without TXOP, peak channel utilization is a bit higher, but takes longer to ramp-up. With the TXOP limit of 79, peak utilization is lower, but steady.

My purpose with this blog was to detail all of the changes that enabling Fastlane will make to your wireless network, and what the impact of those changes might be to non-iOS devices. Hopefully the reader has enough information on Fastlane to make an informed choice on whether to enable it on a wireless network they manage.

Tuesday, May 30, 2017

Always something new to learn

"Forgive me. I am just a fledgling learning to fly" - Koro to Paikea, Whale Rider

In a recent post, I described what I thought to be odd behavior of an iPhone probing on channel 52. Channel 52 requires DFS, and a client device shouldn't probe unless it can hear an AP on the channel. I wasn't seeing anything on the channel but probes, and it was quite a mystery.

I removed the post, because I now know that wasn't what was happening. To summarize:

  • I saw probes on channels 52 and 56, but not 60 or 64. 
  • No other traffic on 52 or 54. 
  • iPhone was right next to IAP-315 I was capturing with. 
  • When I captured on channel 48, the probes were about 40 dB stronger than the probes I saw on 52. Probes on 56 were only about 1-2 dB weaker than those on 52. 
  • It wasn't just the iPhone; it was my Moto G4, and the laptop I was running Wireshark on. 
Here's a picture of the test setup:

 Here's another picture related to this story

OFDM Spectral Mask

My original post generated a lot of discussion on Twitter, with questions on iOS versions, DFS rules, and more. I researched the FCC report on the iPhone SE, looked into DFS rule changes, but couldn't find anything that would explain the behavior. Then Ben Miller suggested this:

This was the most plausible explanation of what I was seeing. The probe requests that I captured on 52 were actually transmitted on 48. The phone and AP were so close to one another that there was enough energy on the adjacent channel, 20 MHz away, to be decoded on channel 52. Looking at the spectral mask, it explains the 40 dB drop in power, and why I saw not only the iPhone, but also my laptop probe on 52.

To test further, I started capturing on 52, with the iPhone right next to the AP. I saw probe requests at -75 dBm. I left Wireshark running, and switched the capture channel from 52 to 48. I picked up the iPhone and moved it about 4 meters away from the AP. I saw probe requests at -61 dBm. Even though the phone was much farther away, the signal was received 14 dB stronger. To confirm things, I switched the capture back to channel 52 with the phone still 4 meters away. I saw no frames at all.

The first frame is from my laptop, right next to the AP. Note the receive power. The next frames are from the iPhone, which uses a randomized MAC when probing. The phone was placed next to the AP when probes were seen on 52, and 4 meters away when seen on 48.

There's been some discussion lately about APs that have dual-5 GHz radios, and why that can be a bad thing. After what I experienced, I tend to believe it. It's also a cautionary tale on how you setup your captures.

Thank you to Ben and all who viewed the blog and commented on Twitter. Ultimately, I was wrong, but I learned a lot.


Sunday, May 28, 2017

Are iOS Devices Breaking DFS Rules?

NOTE: What I thought were probes on channel 52 were actually not transmitted on channel 52. They were transmitted on channel 48. The transmitting devices were close enough to the capturing device that the OFDM signal was strong enough off-channel to be decoded. Click here for the updated blog that explains what really happened. What follows below is my (incorrect) interpretation of what I saw.

I was looking at my Twitter feed not too long ago, and there were a few tweets from a webinar that I was not able to attend. The webinar was hosted by Ekahau, and the presentation was by the excellent Jerome Henry. The slides are available here.

One of the slides describe the channel scanning behavior of iOS clients, particularly how they scan the U-NII-2e channels 100 - 144. The slide indicated that these channels must be scanned passively: the client must dwell on the the channel and listen for beacons, since DFS rules prevent it from sending probes.

The first question that came to mind when I saw the slide: what about U-NII-2, channels 52 - 64? These channels also require DFS, but were not listed on the slide. I thought that it was just an omission. I did some testing with my Motorola G4 and saw that it will not probe out on 52 - 64 unless it hears a beacon. Were iOS devices different? I had to test for myself.

I setup an Aruba IAP-315 in sniffer mode on channel 52 and captured in Wireshark. I used a display filter to see only beacons and probe requests. I took an iPhone SE running 10.3 and removed saved networks to simulate the phone being in a new environment. I turned on Wi-Fi on the phone and placed it less than a foot away from the IAP. This is what I saw:

No beacon frames, but probes from an unregistered MAC address received by the AP at -69 dBm. Keep in mind the phone is less than a foot from the AP. For comparison, I sniffed on channel 36 and saw probes from the same unregistered MAC at -30 dBm.

Next I ran a capture where I switched the channel from 52 to 56. Probe requests where seen on both channels, again with no beacons.

You can see from the time column that enough time is elapsing to see beacons. I didn't see any. I also captured on channels 60 and 64, but did not see any traffic on these channels at all.

So what is happening here? It looks like a client device is transmitting on a DFS channel without first hearing a beacon from a master AP on that channel. I don't think the phone is listening for radar, like an AP; because I see a probe on channel 52 within a second or two of turning Wi-Fi on.

Are DFS rules being broken?