When Enterprise IT Meets Consumer IOT

As this is my first ever blog post, i’m going to share a story..

A ran into a very curious problem recently, one, that if i was not a wireless engineer A) Would have likely not happened in the first place and B) Been able to troubleshoot and ultimately resolve.

Like many people, i have purchased numerous consumer IoT devices to automate or control various functions within my home. Unlike most people though, i am a wireless engineer, fortunate enough to have access to enterprise grade Cisco Wireless Access Points (APs) predominately for testing and evaluation purposes as part of my day to day role.

I have one AP installed internally and one externally, these provide pervasive coverage in all rooms indoors and for my garden office which is situated ~20m from my home.

Only on the indoor AP do i advertise a BSSID that all my IoT devices attach too, all of which, up until recently, worked harmoniously.

At the beginning of the year i purchased a Ring Video Doorbell and a Ring Chime v2, this device is integrated with the latter to provide audible alerts when a visitor to my front door presses the Ring Doorbell button.

It is a fairly simplistic device, and according to the datasheet only supports 2.4GHz 802.11 B/G/N PHY’s and 1 Spatial Stream (Confirmed by the WLANPi Profiler)

Ring Chimev2 PCAP – 9c-76-13-ce-22-5e_2.4GHz.pcap

WLANPi Profiler
WLAN Pi Profiler – Ring Chime v2

The intial setup of this device using the Ring Smartphone App and a QR code on the reverse of the unit provides a very straight forward method of out the box (OTB) configuration, and took less than 5mins to get up and running. Once associated to a wireless network it displays a solid Blue status LED.

Recently, i noticed that i was no longer receiving audible alerts, and it was showing as offline in the Rng cloud dashboard. A quick visual check lead me to believe that the device was no longer receiving power, the Blue status LED on the front of the unit, which was previously illuminated, was now extinguished.

Normal operation, and what i was experiencing now are shown in the following videos.

Ring Chimev2 – Operational
Ring Chimev2 – Non-Operational

Initially i thought that it was a mains power issue, however after testing various sockets, on different ring mains in the home, using a reference device that i knew was functional, this didn’t seem to be the case, this led me to think that the Ring Chime itself was faulty, a subsequent call to Ring ensued, several troubleshooting steps taken and a new device was soon on its way. Perfect. It never occured to me at the time that this was anything other than a hardware fault.

The replacement device arrived, it was powered on, and began the installation sequence, indicated by a green, flashing LED. However, it would not complete the onboarding process, nor would it attach to my IoT BSSID.

In a flurry of frustration, i retreated to my office to try and debug the issue further, using a LAB AP of the same model (Cisco 9120AXI) that is installed internally and a base configuration for my IoT BSSID applied manually in haste, using the same passphrase.

I could have advertised the PROD IoT BSSID on my outdoor AP, but i’m a great fan of paranoia based security, and any network broadcasted external is running WPA3/SAE only. But thats a conversation for another day.

Now, when power was applied to both units, the LED status indicator became active on the old and new one, cue additional confusion and anger…the onboarding process completed on the new unit, and the old one was also now visible on the Ring cloud dashboard again.

At this point i started to ask myself what was different, i made the comment earlier that i only advertise the IoT BSSID indoors and It wasnt until this point that i had considered that it could be a WiFi issue, because, well nothing had changed on my WLAN configuration recently..or had it.

So, as any good wireless engineer would do, i fired up my WLAN Pi again to do an OTA (Over the air) packet capture, this used to be a more lengthy task, as detailed by WiFi Nigel on his excellent blog but since the release of Wireshark 4.x, it now includes a WiFi Remote Capture based on wifidump the author of which being Adrian Granados of Intuitibits fame.

WiFi Remote Capture in Wireshark 4.x

To my suprise, i am seeing frame exchanges on both Ring Chimes and the LAB AP, all looks well, Probe Request/Probe Response, Open System Authentication, followed by Association Request/Response and a 4 Way Handshake.

Cisco AP Radio Configuration
Packet Capture – Ring Chimev2 – Authentication Success

Now we know that both Ring Chimes are indeed fully functional, and i feel stupid for asking for a replacement…but why?!

I took both devices back to where they were originally installed inside, and repeated the test, for avoidance of doubt i pulled the power on the LAB AP beforehand. The key difference this time is that when connected to mains power, there was no visual cue to indicate that either device was active, the packet captures told a different story though.

Ring Chimev2 - Authentication Failure
Ring Chimev2 – Authentication Failure

Again, we observe Probe Request/Probe Response, Open System Authentication, followed by Association Request/Response. However, immediately prior to EAPOL 4-Way Handshake, there is a Radio Measurement Request Action Frame sent by the AP. Following which the authenticator (AP) sends EAPOL Message #1 to the supplicant, several times, but eventually times out and de-authenticates the client.

Action Frame - Radio Measurement Request
Action Frame – Radio Measurement Request

Now, it’s at this point i think back to a presentation that Peter Mackenzie did at the Open Reality – Wifi Design Day in 2021, the topic of which was The Art of Troubleshooting if you haven’t seen this before i highly recommend watching it, and for that matter, any content that Peter shares.

The reason i thought of that presentation was because based on my experience i quickly assumed that this was related to 802.11k (Art) as i’m fully aware that this protocol has the capability to both request and share neighbour information for the purpose of improving client roaming, as such my first response was to try turning off 802.11k on WLAN profile associated with this BSSID on my Cisco 9800 WLC and run the test again.

First we can check to verify that 802.11k is indeed enabled on this WLAN profile by referencing the probe response frame sent by the AP.

RM Information Element included in probe response

Time to turn off 802.11k by un-checking these fields within the WLAN profile.

802.11k Disabled on Cisco 9800 WLAN Profile

Vertify that 802.11k is disabled.

RM Information Element missing from probe response

After verification, the test was repeated, however, same result, and no change in the client behaviour.

Now for the science.

In addition to the Assisted Roaming (11k) WLAN configuration on the Cisco 9800 WLC, there are also two options under the heading 11k Beacon Radio Measurement.

GUI – 11k Radio Beacon Measurement
CLI- 11k Radio Beacon Measurement

To be honest, this isn’t something i have given much thought too in the past, i have tampered with 802.11k on many occasions, but after checking a several Cisco 9800 WLCs running code between 17.3 and 17.9, this options seems to be disabled on new WLAN profiles by default.

A review of the 17.x IOS-XE configuration guides doesn’t provide any further insight either, and isnt even referenced.

Time to go down the rabbit hole that is the IEEE 802.11-2020 Standard to try and get an authoritative description of what this feature does, or at least should do.

There are numerous explanations online of what this feature is used for but decided to take ChatGPT for a spin. End result, not bad!

An 802.11k radio measurement request is a request for information about the radio environment that is made by a WLAN device using the 802.11k protocol. The request may be made by a client device, such as a laptop or a smartphone, or by an AP or wireless controller. The request may specify the type of measurement to be performed, the duration of the measurement, and other parameters. The requesting device may also specify a reason for the measurement, such as to determine the best channel to use, to evaluate the performance of the WLAN system, or to troubleshoot a problem.

Upon receiving an 802.11k radio measurement request, the WLAN device that is being asked to perform the measurement will collect the requested information and report it back to the requesting device. The requesting device can then use the measurement results to make decisions about how to optimize its own performance or the performance of the WLAN system as a whole.

OpenAi

The Action frame format for this feature is detailed in Section 9.6.6 on the IEEE 802.11-2020 Standard.

We can verify that the Action frame seen in the capture is indeed a Radio Measurement Request by looking at both the Category and Subfield.

9.4.1.11 Action field – The Category field is set to one of the nonreserved values shown in the Code column of Table 9-51. Action frames of a given category are referred to as Action frames. For example, frames in the QoS category are called QoS Action frames. Action frames of a given category and further identified by a subfield in the Action Details field are referred to as frames. For example, frames in the QoS category with a QoS Action subfield of ADDTS Request are called ADDTS Request frames.

IEEE 802.11-2020
Radio Measurement Action Frame
Radio Measurement Request Frame

The next piece of the puzzle refers to the Measurement Request elements.

9.4.2.20.1 General – The Measurement Request element contains a request that the receiving STA undertake the specified measurement action. The Measurement Request element is included in Spectrum Measurement Request frames, as described in 9.6.2.2, or Radio Measurement Request frames, as described in 9.6.6.2. Measurement types 0, 1, and 2 are defined for spectrum management and are included only in Spectrum Measurement Request frames. The use of Measurement Request elements for spectrum management is described in 11.8.7. All other measurement types are included only in Radio Measurement Request frames. The use of Measurement Request elements for radio measurement is described in 11.10.

IEEE 802.11-2020
Measurement Request Mode field format

Although the information is decoded by wireshark in plain text, by looking at the bits which are set in the Measurement Request Mode Field we can determine that the AP requested that the STA (Ring Chime) make a beacon measurement.

The Enable bit (bit 1) is used to differentiate between a request to make a measurement and a request to control the measurement requests and triggered or autonomous reports generated by the destination STA. The Enable bit is further described in Table 9-97.

IEEE 802.11-2020

Ok, so now we’re beginning to understand the purpose of this Action frame, but what still isnt clear is why the STA (Ring Chime) reacted the way it did. Section 11.10 Radio Measurement Procedures seemed to give some additional insight, specifically 11.10.5 Station Responsibility for Conducting Measurements.

11.10.5 Station responsibility for conducting measurement – A radio measurement-capable STA shall decode and interpret each Radio Measurement Request frame that it receives and shall assess the contents against its capabilities and the impact on its own performance. A measurement request may be refused by the receiving STA by sending a Radio Measurement Report frame in which the refused bit in the Measurement Report Mode field is set to 1. The reasons for refusing a measurement request are outside the scope of this standard but may include reduced quality of service, unacceptable power consumption, measurement scheduling conflicts, or other significant factors. A STA shall cancel all in-process radio measurements and shall delete all pending, unprocessed radio measurement requests upon receipt of a Disassociation frame or upon (re)association with a BSSID different from its most recent association.

IEEE 802.11-2020

Again, still no smoking gun, but seems to lean towards it being chipset and or driver problem being unable to interpret the Action frame and resulting in a failure to respond with EAPOL Message 2, causing a 4-Way handshake timeout, ultimately leading to a disassociation and de-authentication request from the authenticator (AP).

I did also briefly test the following devices to see i could re-create the issue, each time the Action frame was sent and all were able to attach to the BSSID without issue and immediately pass traffic.

Other devices tested

  • iPad Pro 5th Generation
  • iPhone 13
  • Dell Latitude 5310 Laptop w/ Intel AX201 Chipset

None of these devices, even though succesfully associated, responded with a Beacon Measurement Report.

11.10.9.1 Beacon report -If a STA accepts a Beacon request it shall respond with a Radio Measurement Report frame containing Beacon reports for all observed BSSs matching the BSSID and SSID in the Beacon request, at the level of detail requested in the Reporting Detail. If the Reporting Detail is 1 and the optional Request subelement is included in the Beacon request, the corresponding Beacon report shall include the list of elements listed in the Request subelement.

The RCPI in the Beacon report indicates the power level of the received Beacon, Measurement Pilot, or Probe Response frame. For repeated measurements (when the Radio Measurement Request frame contains a nonzero value for the Number of Repetitions field), the transmission of the Beacon report may be conditional on the measured RCPI or RSNI value.

IEEE 802.11-2020

For now, i can only summarise that the majority of clients choose not to implement this, although if anyone has a PCAP that can show a Beacon Measurement Report, i would love to see it, even just to believe it actually exsists.

Conclusion

Beyond this explanation, i was unable find anything concrete that would explain why the Ring Chime essentially froze and stopped responding to EAPOL messages. Once the beacon measurement settings were removed, both devices now function perfectly.

It also seems that although this configuration references 802.11k, the two are independant on one another, even when 802.11k was explicitly disabled, Radio Measurement Reports were still sent to all STAs associating to the BSSID.

The moral of the story is that it was a change that i made to the operational configuration of the PROD AP in the first instance that caused the issue, it wasn’t audited, and happened several weeks prior to me noticing that there was an impact, which ultimately lead me to believe there was a fault in the device itself. It also serves as notice, that when troubleshooting, you should replicate the test environment exactly as it is designed in production, else the test will yield indeterministic results.

Whilst i wouldn’t expect most consumers to follow strict change control processes, we see this problem on a daily basis in the enterprise, and ultimately being able to track change activity allows for rapid root cause analysis (RCA) and roll back.

It’s also highly likely that a consumer grade AP would not implement this feature and go entirely un-noticed for the lifetime of the product. Perhaps I’ll drop them a link to this post and see if there is a second chapter to this story.

Thanks for reading! Any comments or opinions would be greatly appreciated.