Tuesday, May 30, 2017

Always something new to learn

"Forgive me. I am just a fledgling learning to fly" - Koro to Paikea, Whale Rider

In a recent post, I described what I thought to be odd behavior of an iPhone probing on channel 52. Channel 52 requires DFS, and a client device shouldn't probe unless it can hear an AP on the channel. I wasn't seeing anything on the channel but probes, and it was quite a mystery.

I removed the post, because I now know that wasn't what was happening. To summarize:

  • I saw probes on channels 52 and 56, but not 60 or 64. 
  • No other traffic on 52 or 54. 
  • iPhone was right next to IAP-315 I was capturing with. 
  • When I captured on channel 48, the probes were about 40 dB stronger than the probes I saw on 52. Probes on 56 were only about 1-2 dB weaker than those on 52. 
  • It wasn't just the iPhone; it was my Moto G4, and the laptop I was running Wireshark on. 
Here's a picture of the test setup:

 Here's another picture related to this story

OFDM Spectral Mask

My original post generated a lot of discussion on Twitter, with questions on iOS versions, DFS rules, and more. I researched the FCC report on the iPhone SE, looked into DFS rule changes, but couldn't find anything that would explain the behavior. Then Ben Miller suggested this:

This was the most plausible explanation of what I was seeing. The probe requests that I captured on 52 were actually transmitted on 48. The phone and AP were so close to one another that there was enough energy on the adjacent channel, 20 MHz away, to be decoded on channel 52. Looking at the spectral mask, it explains the 40 dB drop in power, and why I saw not only the iPhone, but also my laptop probe on 52.

To test further, I started capturing on 52, with the iPhone right next to the AP. I saw probe requests at -75 dBm. I left Wireshark running, and switched the capture channel from 52 to 48. I picked up the iPhone and moved it about 4 meters away from the AP. I saw probe requests at -61 dBm. Even though the phone was much farther away, the signal was received 14 dB stronger. To confirm things, I switched the capture back to channel 52 with the phone still 4 meters away. I saw no frames at all.

The first frame is from my laptop, right next to the AP. Note the receive power. The next frames are from the iPhone, which uses a randomized MAC when probing. The phone was placed next to the AP when probes were seen on 52, and 4 meters away when seen on 48.

There's been some discussion lately about APs that have dual-5 GHz radios, and why that can be a bad thing. After what I experienced, I tend to believe it. It's also a cautionary tale on how you setup your captures.

Thank you to Ben and all who viewed the blog and commented on Twitter. Ultimately, I was wrong, but I learned a lot.


Sunday, May 28, 2017

Are iOS Devices Breaking DFS Rules?

NOTE: What I thought were probes on channel 52 were actually not transmitted on channel 52. They were transmitted on channel 48. The transmitting devices were close enough to the capturing device that the OFDM signal was strong enough off-channel to be decoded. Click here for the updated blog that explains what really happened. What follows below is my (incorrect) interpretation of what I saw.

I was looking at my Twitter feed not too long ago, and there were a few tweets from a webinar that I was not able to attend. The webinar was hosted by Ekahau, and the presentation was by the excellent Jerome Henry. The slides are available here.

One of the slides describe the channel scanning behavior of iOS clients, particularly how they scan the U-NII-2e channels 100 - 144. The slide indicated that these channels must be scanned passively: the client must dwell on the the channel and listen for beacons, since DFS rules prevent it from sending probes.

The first question that came to mind when I saw the slide: what about U-NII-2, channels 52 - 64? These channels also require DFS, but were not listed on the slide. I thought that it was just an omission. I did some testing with my Motorola G4 and saw that it will not probe out on 52 - 64 unless it hears a beacon. Were iOS devices different? I had to test for myself.

I setup an Aruba IAP-315 in sniffer mode on channel 52 and captured in Wireshark. I used a display filter to see only beacons and probe requests. I took an iPhone SE running 10.3 and removed saved networks to simulate the phone being in a new environment. I turned on Wi-Fi on the phone and placed it less than a foot away from the IAP. This is what I saw:

No beacon frames, but probes from an unregistered MAC address received by the AP at -69 dBm. Keep in mind the phone is less than a foot from the AP. For comparison, I sniffed on channel 36 and saw probes from the same unregistered MAC at -30 dBm.

Next I ran a capture where I switched the channel from 52 to 56. Probe requests where seen on both channels, again with no beacons.

You can see from the time column that enough time is elapsing to see beacons. I didn't see any. I also captured on channels 60 and 64, but did not see any traffic on these channels at all.

So what is happening here? It looks like a client device is transmitting on a DFS channel without first hearing a beacon from a master AP on that channel. I don't think the phone is listening for radar, like an AP; because I see a probe on channel 52 within a second or two of turning Wi-Fi on.

Are DFS rules being broken?

Friday, April 21, 2017

RRM Neighbor Timeout Factor and Channel Utilization

If you work with Cisco wireless networks, I highly recommend that you read their Radio Resource Management white paper. If you want to understand how RRM works and how to tune it for your environment, this is the document you need to read.

The foundation of all RRM operations is the neighbor table. It is a list of each radio in the system and how other radios hear it, and how it hears other radios. The neighbor table is built using Neighbor Discovery Protocol, or NDP, frames. NDP frames are sent at the maximum power and minimum data rate supported by the radio and channel. The frequency at which NDP are sent depends on the Neighbor Packet Frequency settings under 802.11a/b - RRM - General. The Neighbor Packet Frequency determines how often a radio goes off-channel (yes, I said off-channel) to transmit a NDP for each channel in the band. Which channels are used to transmit NDP frames is controlled by the Noise/Interference/Rogue/CleanAir Monitoring Channels setting, also in 802.11a/b - RRM - General.

The default setting is Country Channels, which means the radio will try to send NDP frames for every channel allowed in your defined regulatory domain for that band. (DFS channels are special; see the RRM white paper for more details).

Just like all frames sent on a wireless medium, the NDP frames must follow the same rules of the DCF. The medium must be idle in order to transmit a NDP frame on the specific channel it is going to be sent on. Unlike regular frames, the NDP frames will not wait very long for the opportunity to transmit; remember it is off-channel and can't serve clients. The radio will simply re-schedule the next NDP transmission for that channel at the defined Neighbor Packet Frequency.

To compensate for short-term problems with transmitting NDP frames, RRM operation uses a Neighbor Pruning Interval value. After a neighbor is discovered, it will stay in the Neighbor Table for a specific amount of time, even if NDP transmission fails due to high channel utilization.

Prior to 8.0, the Neighbor Pruning Interval was fixed at 1 hour. In 8.0, it is fixed at 15 minutes.

Let's look at the defaults for 7.6 code and do an exercise. The Neighbor Packet Frequency is 60 seconds default in 7.6. In order for a radio to drop off the neighbor table, the NDP would need to fail transmit

60 times in a row. That's a lot of chances for an NDP to get through, which would result in a very stable Neighbor Table.

In 8.0 and above, things change. The default Neighbor Packet Frequency increases to 3 minutes, and the Neighbor Pruning Interval shortens to 15 minutes. This means a neighbor could drop off the table if 5 NDP transmissions in a row fail.

Why shorten the Neighbor Pruning Interval? Since the neighbor table is used for both DCA and TPC, a neighbor dropping out of the table could result in radios near it increasing their power. A shorter Neighbor Pruning Interval results in faster adjustment to the loss of an AP.

In 8.1 and above Cisco introduced a new parameter call the Neighbor Timeout Factor, or NTF for short. The NTF allows the user to adjust the Neighbor Pruning Interval in the following way:

In order for a radio to drop out of the Neighbor Table in 8.1 and above, the NDP transmission would have to fail NTF times in a row.

Now let's take a look at how this all ties in with channel utilization. Suppose there are two APs; AP "A" on channel 36 and "B" on channel 149. The two radios are close enough to one another to "hear" each other's NDP frames. Every 180 seconds, "A" goes off-channel to send a NDP frame on channel 149.

Now suppose that channel utilization on "B" is x%. It's a bit of a simplification, but this means that "A" has a x% chance to fail its NDP transmission on channel 149.

The chance that "A" will fail to transmit its NDP frame on channel 149 NTF times in a row is
Let's say that x is 50% and NTF was 5. The chance that "A" would fail to transmit a NDP frame on channel 149 would be .03, or 3%. Conversely, the chance that NDP transmission would succeed on at least one of the 5 attempts would be 97%, or

Suppose you wanted the chance of at least one NDP frame to be transmitted out of 5 attempts to be 99%. What would the channel utilization have to be under for this to happen?

Channel utilization on "B" would need to be under 40% to guarantee 99% stability in the Neighbor Table. Keep in mind that channel utilization is not the only factor in NDP transmission; you also have to deal with scan defer settings for voice traffic.

The take aways:

  • The longer the Neighbor Prune Internal (higher the NTF), the more stable RRM will be. The tradeoff is not adjusting to loss of APs as quickly. 
  • Use the formulas above to calculate what channel utilization you need to stay under in order for NDP transmission to succeed for a given NTF. Plug that number into your trap thresholds. 
  • Dense environments with high channel utilization or voice clients will need higher NTF values. The default of 5 may not be enough. 
  • What works for 5 GHz may not work for 2.4 GHz. Consider using different NTF values for each band.

Sunday, April 16, 2017

Broadcast Key Rotation - Part 2

At the end of my last blog, I discussed what happens if a client misses the broadcast key rotation for the AP it is connected to. We know that a client that misses the key rotation will be disconnected, but how many retries are made before the client is removed?

Here are the default EAP settings for a Cisco controller-based wireless network:

The parameters EAPOL-Key Timeout and EAPOL-Key Max retries should be the answer to the question. The default settings would mean there are three attempts at sending the broadcast key to a client, with the 2nd and 3rd attempts being spaced apart by 1 second. If a client can't get the new broadcast key in 2 seconds, it is disconnected.

I tested this by setting an AP at minimum power, using channel 165, and my Moto G4. After connecting, I moved my phone away from the AP and watched debug output on the controller console to see if the key rotation was successful. Eventually I got far enough away and put enough attenuation between my phone and the APs for the key rotation to fail.

First, let's have a look at the output of debug dot1x all:

The first seven lines show that the key rotation has started and that the first attempt is being made at transmitting the new key to my phone. Take note of the third line, where it states "message 5 - group." About 1 second later, the first retransmission happens, after the timeoutEvt message appears in the log. Note that at the end of that line it reads "message  = M5." M5 must mean the group key, based on line 3 in the debug. Another second goes by, and there is another timeoutEvt message. The key is transmitted one more time. Another second goes by, and at 15:04:33 the client is disconnected.

It appears that the EAPOL-Key Timeout and EAPOL-Key Max Retries parameters do indeed control the behavior of broadcast key retransmissions. While I was logging the debug output, I was also capturing frames on channel 165 with an Aruba IAP. I fired up Wireshark, applied the filter that shows the key rotation frame (wlan.ta = wlan.sa = BSSID, wlan.ra = Moto 4G), and scrolled to 15:04:30. And, there's nothing there! The key rotation frames were not seen over the air.

The debug output clearly says that the key was sent three times, so what happened? To find out, I had to go back in the capture to 15:03:48, where I saw this

Less than a minute before the key rotation, my client sent a Null Data Packet to the AP saying that it was going into a sleep mode. I wrote a Wireshark filter to look for all Null Data Packets from my phone and what power management message it was sending. The message at 15:03:48 was the last one sent by the phone on channel 165.

Since the AP had not received an NDP from my phone by the time the broadcast key rotation started, the AP believed the client was still in a sleep state. An AP will not transmit a queued frame to an associated client if it thinks it is asleep. It needs to know the client is awake by receiving a NDP with the "client will stay awake" bit set to 1. This goes for key rotation frames too. Looking further in the capture, the AP did not attempt to transmit de-authentication frames to the client either.

What's the take away here? The combination of power save measures and key rotation can result in clients being disconnected from a WLAN without knowing they have been kicked off. It's known that some clients ignore the DTIM interval in beacons, preferring to save power over receiving broadcast traffic (remember, broadcast and multicast traffic is delivered at the DTIM interval beacon, when the DTIM counter value is zero). Clients are expected to be awake at the DTIM interval beacon to receive broadcast and multicast traffic, but some clients would rather save battery power.

Personally, I recommend increasing the default broadcast key rotation interval from the default 1 hour to something a bit longer, like 12 or 24 hours. If you have a WLAN that is not supporting voice, consider increasing the DTIM period to 3. This will allow clients that do honor the DTIM interval to conserve power, while avoiding problems with clients that don't honor it.

Friday, April 14, 2017

Broadcast Key Rotation in WPA2-Enterprise WLANs

Wireless networks using WPA2-Enterprise security with 802.1X authentication are a common sight in corporate environments. It provides a secure way for devices to communicate over the air.

While studying for the CWSP exam, I became familiar with the mechanisms of WPA2-Enterprise authentication. After a client has provided the correct credentials, the AP (and the DS behind it) performs what is known as the four-way handshake with the client.

ACKs not shown for brevity!
The purpose of the 4-way handshake is to securely exchange a pair of encryption keys. One of the two keys is the Pairwise Temporal Key, or PTK for short. This key is used to encrypt/decrypt unicast traffic to/from the client. The other key is the Group Temporal Key, or GTK. This key is used to encrypt/decrypt broadcast and multicast traffic for all stations on the BSSID. Because of its purpose, the GTK is also referred to as the broadcast key.

Since anyone within earshot of a wireless network can see its traffic, and since all broadcast traffic is encrypted with the same GTK, there is a possibility that an eavesdropper could collect enough broadcast traffic to guess the key. For this reason the GTK is rotated, or changed, for all stations on the BSSID periodically. The new GTK needs to be delivered securely to each station on the BSSID, which means it needs to be sent via unicast to each station, and encrypted with each station's PTK. The lifetime of the GTK is often called the broadcast key rotation interval, and it specifies how often the GTK must be changed for all stations on a BSSID that uses WPA.

For Cisco lightweight-AP based networks, the default broadcast key rotation interval is 3,600 seconds, or 1 hour. You can see the defined interval by issuing the show advanced eap command.

To see broadcast key rotation in action, it helps to shorten the interval to something manageable. I don't know about you, but I'm not waiting a hour to watch the key rotate.

The change to the broadcast key interval takes effect after the next scheduled key rotation. If there are clients connected to an AP, you could end up waiting a while. My lab environment had no clients when I made the change.

There are two ways to "watch" the key rotation: via debug or over the air. To see the broadcast key rotation via debug, use debug dot1x all enable. Keep in mind that doing this in a production environment will likely produce a lot of output to the terminal. Here is what you will see when the broadcast key is rotated.

The Easy Way

You can see that the AP sends the GTK to the client, and that the AP resets the timer for the next key rotation.

Seeing the key rotation over the air with a packet analyzer is a bit trickier. It's easy to tell when a client associates and completes the 4-way handshake, but what do you look for to see the broadcast key rotation? The key rotation does not decode in Wireshark as an EAPOL packet; the station is already authenticated and the 802.1X port is unblocked.

The client has to be in an awake state to receive the new GTK, so I decided the best way to find the key rotation was to watch power management null-data packets around the time that I expected to see the key rotation take place. I configured my trusty Moto 4G to connect to the WPA2-Enterprise WLAN on my lab AP, and just let it sit without moving a lot of data. Taking a look at my capture, I want to see when a beacon tells my client to wake up, and what happens after that.

To see that, I need the Association ID of my client on the AP, which you can see from the Association Response when the client first connects.

No Surprise, I'm the only client
So my association ID is 1. Now I'm going to look for Beacons from the SSID I'm connected to that have the DTIM count value of 0 and are telling my client to wake up.

Wakey Wakey
(Now, I know what you're thinking: If the client is asleep, how can it hear the beacon? This brings up an interesting discussion. The client is supposed to wake up for every DTIM period. The AP doesn't know for sure that the client woke up until it receives a null-data packet from the client indicating that it will stay awake).

I'm up, what do you want? 

Immediately after, I see this QoS Data frame sent from the AP to my client. What was interesting about this frame was the Source Address field was the same as the Transmitter Address field. It was not a frame being delivered from an upstream source; it was coming directly from the AP.

You're going to have to trust me here.

My client was connected for a while, at least long enough to see two or three broadcast key rotations at the 120 second interval. So I make a Wireshark filter to match frames where the transmitter address is the AP, the Source Address is the AP, and the frame type is data: wlan.ta == bssid && wlan.sa == bssid && wlan.fc == 0x2 .

Well would you look at that.
The time deltas between those frames are lining up perfectly with the broadcast key rotation interval of 120 seconds.

So, you may be asking yourself what happens if a client does not wake up from a sleep state to receive the new GTK. The answer: they are de-authenticated from the BSS. How long will the AP wait, and how many retries will happen before the client is disconnected?

Hello again
Look at the EAPOL-Key Timeout and Max Retries parameters here. You could infer that the client is given 1 second and two retries before being disconnected. But is that what really determines it?

For another blog, perhaps. But I do know for certain that broadcast key rotation intervals that are too short will cause problems for clients that enter power-save states. This is especially true for clients that will not wake up for short DTIM values, like iPhones. It's best to extend the broadcast key interval out past the default of one hour, and make sure your WLANs have DTIM values greater than 2 where applicable (not recommended for voice networks).

Friday, January 27, 2017

The Importance of Soft Skills

In wireless networking, we tend to focus on technical details. Wi-Fi is complicated, and the strength of a Wi-Fi professional should be in their expert knowledge of how Wi-Fi works.

If you are looking to break into working in Wi-Fi, there is also another important thing to brush-up on: your soft skills. Information Technology workers often get so wrapped up in the "Technology" part of their job that they forget about the most important part: people. We work primarily with, and for, people. The solutions you create and problems you fix ultimately help other people.

What if your personal physician was a brilliant M.D. from Harvard that was well respected in their field for in-depth knowledge, but who was also rude, late to appointments, and could not communicate well? Would you keep that doctor?

Soft skills are defined as "personal attributes that enable someone to interact effectively and harmoniously with with other people." In other words, behave in a way that doesn't make your co-workers want to stab you. Here are some of those skills:
  • Effective oral and written communication. Be able to clearly communicate the information that you want your audience to digest. 
  • Describe technical details to non-technical people. Be able to describe why something, for technical reasons, will/won't work to people not versed in the jargon. Use analogies and metaphors to get a point across. 
  • Don't scoff at people for their lack of knowledge of something you are knowledgeable in. Making someone feel stupid is a quick way to sour your relationship with them. Conversely, don't be intimidated by people that may be knowledgeable in other fields that may question your expertise. Be confident, but not cocky. 
  • Have integrity. Do what you say you will do. 
  • Be transparent. Don't hide your reasoning for choices you make. 
  • Be a team-player. Find value in your coworkers and encourage them to learn more. 
Developing these skills takes time and effort. One sure way to develop many of these skills is to teach. Hold seminars or workshops, or teach at a community college. I taught college classes for years before I started in I.T., and even a few years after. Teaching helped me hone my soft skills. 

Be and expert in your field, but don't neglect the soft side of Information Technology. 

Wednesday, January 11, 2017

Using Cisco APs in Sniffer Mode to Measure Attenuation

My previous blog entries have relied heavily on using Cisco lightweight APs in sniffer mode for packet analysis. This entry is no different. For a primer on using lightweight APs for packet capture, click here.

I had the idea of using a lightweight AP in sniffer mode to measure the attenuation of a wall in my office. I understand that my method here is not typical, doesn't translate well to pre-installation techniques, and doesn't replace AP-on-a-stick. This blog is more of a "what's a cool thing I can do with a sniffer-mode AP and Wireshark."

Measuring attenuation though an obstacle is more involved than one may think, and I learned a few things studying the standard methods before capturing any packets. The signal source should be at least 4 meters from the obstruction, and the measuring device should be at least 1 meter from the other side of the obstruction. Using these distances, instead of something closer, means the dB loss will be more linear with distance, as apposed to inverse-square. See Nigel Bowden's excellent blog on this subject at Ekahau's website.

A lightweight AP that was already installed on the ceiling was used as a signal source. The AP was broadcasting two SSIDs on both 2.4 (channel 11) and 5 GHz (channel 36) bands with a beacon interval of 100ms. If you plan on trying this yourself, you should map at least 2 SSIDs to each radio; I'll explain later.

I used a sniffer mode AP on a long patch cable so I could move around. I started the packet capture, held the sniffer AP line-of-sight to signal source, then moved a few feet to my left to put the obstruction between the sniffer and the source.

Once I had the capture, I had to filter out any packets that were not from the AP I was measuring. The easiest way I found to do this was to look for beacon frames (wlan.fc.type_subtype == 0x8), and frames received with better than -65 dBm strength (wlan_radio.signal_dbm > -65). After reviewing to make sure my filter worked, I exported the packets (File ->Export Specified Packets), making sure to select "Displayed" for the export. This step will make generating the graphs later a bit easier.

Open the capture file created by the export, and select Statistics -> I/O graph. Uncheck the box next to the "All Packets" default graph; we don't need to see it. Click the plus sign to add a new graph. I changed the name to "Channel 11". In the display filter field, enter a filter in Wireshark display filter syntax to limit what packets will be considered for the graph. I only want to see packets on channel 11, so I enter the filter wlan_radio.channel == 11. For the Y axis, change the drop-down from "Packets" to "AVG(Y Field)". In the Y-field box, enter a Wireshark display filter of the thing you want to graph. In our case, we want to see signal strength in dBm, so I put in wlan_radio.signal_dbm. (If you didn't know, when you highlight an item in the Packet Details window that Wireshark has a decoder for, it will show you relevant filter syntax in the status bar.)

Here is what the graph looks like. At left is line-of-sight, then behind the obstruction for about ten seconds. After that, I put the sniffer AP on a table and walked back to my workstation to stop the capture.

Note that the interval value is set for 1 second. This tells Wireshark to get the average value of signal strength for all packets on channel 11 over each 1 second period. With a beacon interval of 100ms, this should give you 10 samples for each SSID mapped to the radio. This is why you want more than one SSID; it gives you more samples to average over. If the Interval was set to 100ms, there would be points on the graph where there were no packets received during the interval. Wireshark considers this a value of zero, which I guess would be fine if we weren't working with negative numbers.

Repeat the process for channel 36. Click on the "Duplicate this graph" button, and change the display filter to wlan_radio.channel == 36. To make things easy to read, change the color of the line so it is distinguishable from the first one.

Eyeballing the graph, it looks like channel 11 encountered about 4 dBm of loss from the obstruction, but channel 36 had a whopping 10 - 12 dBm of loss.

I know it's not going to change the way Wi-Fi pros measure attenuation, but this was a fun way to visualize RF loss from obstructions but using some tools anyone with a Cisco lightweight infrastructure can replicate.