[Wi-Fi Clinic] Sticky client | Wi-Fi Roaming

Teaching Points

In this article, you will learn:

  • Wi-Fi Roaming
    • Definition
    • How it works
    • Protocols
  • Know what RSSI is
  • Transmission Power Control
  • Minimum RSSI Control
  • Data Rate Control

Table of Contents

Situation

  • “My smart phone does not switch to more closer access point.” 
  • “My smart phone connects to 2.4 GHz rather than faster 5 GHz.”
  • “My Wi-Fi phone call drops for a few seconds when moving from one part of my home to another.”

Scenario 1: Sticky Client

When I walk from my kitchen where access point A is located to master bed room where access point B is, my smart phone remains connected to AP A despite AP B is physically much closer.

If I manually disconnect and reconnect in the bedroom, the client device connects properly to the bedroom AP B. However, the opposite works properly i.e. walking from the bedroom to kitchen switches appropriately from AP B to AP A.

Scenario 2: 2.4 GHz preferred over 5 GHz

I have a single SSID for both 2.4 GHz and 5 GHz so that the AP/client will automatically select the best band i.e. Smart Connect.

When I test my speed using internet speed test, I get only 10-20 Mbps in one area of my home. I found out my connection is on the 2.4 GHz band when this happens. If I force connect to the 5 GHz band by using a separate SSID for 5GHz and 2.4GHz, I get 100-200 Mbps.

Scenario 3: Voice over internet protocol (VoIP) temporary disconnection

Despite having multiple access points throughout my home with good Wi-Fi signal everywhere, I lose couple seconds of conversation if I walk from my ground level kitchen down to the basement office while talking on my Wi-Fi call.

All of these scenarios are real world experience I had my own.

All three cases are related to failure of [what we consider] proper/seamless roaming.

What’s roaming?

In Wi-Fi, roaming refers to a process where connected Wi-Fi client device e.g. tablets, smart phones switch from one access point (AP) to another.

Why do we want roaming?

The idea is simple, you want to have the client device connected to the performance i.e. throughput producing AP.

Does this mean if I have only one AP, there is no roaming?

Actually, the answer is no. As in the sample scenario, combined SSID where 2.4 GHz and 5 GHz are on the same SSID, there is essentially roaming or roaming-equivalent between radio band can or should happen. you want your client to be back to 5 GHz when that’s available but if you go too far, then want 2.4 GHz for better range of coverage.

In an ideal world though, you do not want to have any roaming. If you could have one single AP with maximum throughput radio e.g. 5 GHz covering entire home with the fastest Wi-Fi connection, that’s the ideal. It’s analogous to you can get to your destination by a single bus/train without a transfer that’s the most convenient.

However, one’s home may be too big to be covered by any of available single AP unit on the market, or further you go away from the AP, they become slower. So in order to cover the more areas with faster Wi-Fi, you just need more than one AP. If your destination is too far, you have no choice but to use more than one mass transit/transportation system.

In such case, the most users now a day want to have a virtual single network i.e. one SSID as if multiple APs are forming a giant single AP and covering the whole area of desired coverage. One of the main feature required for users to feel this way is the “seamless roaming”.

Seamless roaming

Seamless roaming is a concept where the switching from one AP to another happens so fast, user would not notice or see any drop in its Wi-Fi connection during the switch process.

The seamless roaming is relatively subjective depending on user’s application/requirement. For example, streaming Netflix while switching from one AP to another may be seamless because part of data may be buffered on your client device.

However, the same setup when use the real time app e.g. Wi-Fi call may not provide the seamless experience. This is because in such application even a couple seconds of drop in connection is noticeable.

It’s is usually, said that 50 ms or under is considered target goal for seamless roaming even on the highly real time demanding application such as Wi-Fi call (i.e. Voice over Internet Protocol = VoIP) application (ref, ref) because human brain generally cannot perceive an event faster than 100 ms (ref).

How does roaming occur?

These series are meant to explain Wi-Fi topics in conceptual detail without technical detail. So I will not be talking about the various steps involved in actual roaming process but let’s keep looking at analogy to understand the big picture.

There may be some confusion about this process description, especially when some of online forum users try to explain to others who have just started to learn about Wi-Fi roaming. My personal description is “roaming process is client driven access point controlled/dependent process”. Let me explain what I mean by this.

If someone says it is completely client dependent and AP has no say, I would not trust the person’s advise. They are either misunderstanding the concept entirely or making their deep understanding into a one sentence without depth of explanation, which results in confusion to new users.

Wi-Fi Roaming is client driven, access point controlled/dependent process.

In a big picture, roaming between Wi-Fi client device and access point can be thought of as following general steps:

  1. AP advertise their availability
  2. Client search and choose an AP from available list.
  3. AP accepts the request
  4. Client and AP association completes

*The real roaming steps consists multiple back and forth steps (ref).

Client Driven

Based on the step 2, the process can be said to be client driven because client chooses which AP to associate.

For example, iOS device uses following criteria to determine how they choose AP. If its Android or other device the criteria are different.

If multiple 5 GHz SSIDs are scored the same, iOS chooses a network based on the following criteria:

  • 802.11ax is preferred over 802.11ac
  • 802.11ac is preferred over 802.11n or 802.11a
  • 802.11n is preferred over 802.11a
  • 80 MHz channel width is preferred over 40 MHz or 20 MHz
  • 40 MHz channel width is preferred over 20 MHz

iOS and iPadOS select target BSSIDs based on:

  • Whether the client transmits or receives a series of 802.11 data packets
  • The difference in signal strength against the current BSSID’s RSSI
Reference: About wireless roaming for enterprise

Access Point Controlled/Dependent

Step 1 and 3 are where access point have a say.

The simplest example where AP actually have control over where client associate is the common practice of splitting radio SSID i.e. 2.4 GHz vs. 5 GHz SSID. By doing so, client is forced to choose one or the another. As a client device user, you choose one and the client device is “forced” to use the radio or simply no connection. This is clearly AP controlled example.

Another example is AP can actually “reject” or kick out already connected client. I will come back to this later in more detail.

So why are some people believe roaming is completely client dependent process?

It is because the client to AP connection always initiates from client device. Even the split radio example, technically you are selecting which radio band to connect from the client device.

In fact, it is possible that you can set AP such that it rejects or kick out client, but client may still choose the same AP that it just got disconnected. If it keeps doing that, you get unstable connection and perhaps gets blacklisted on AP and lose WiFi connection for certain amount of time.

During the process of roaming, I see 3 categories potential issues.

  1. Transition issue: Slow roaming issue
  2. Sticky client Issue
  3. Priority issue i.e. X (signal strength) over Y (throughput)

Slow roaming issue: This is essentially an issue related to how long it takes from the moment client disconnect from the original AP until it connects to the new AP.

Sticky client Issue: This is a failure of switching from one AP to “physically more closer AP” due to the client holding onto the original AP for too long.

X (signal strength) over Y(throughput) issue: For the client to decide which AP to associate with, the client could choose the signal strength over the throughput. If this happens, even though the weaker signal strength on 5GHz radio is often faster than the stronger signal strength on 2.4G for the throughput, the client may decide to connect to 2.4 GHz band.

Now let’s take how we can solve each of these issues one at a time.

Transition Issue (Slow Roaming)

The solution for slow roaming issue have two options:

  1. Use WPA Personal without 802.11r (recommended for home users)
  2. Create a dedicated SSID for client devices supporting 802.11r and use WPA Enterprise with 802.11r turned on

802.11 k,v and r

Switching from one AP to another always have finite time. This is a required part of the security. In order to reduce the time for re-association, IEEE (organization behind Wi-Fi standard) has developed some protocols: 802.11 k, v and r (ref).

If you are interested how each technology tries to reduce roaming time, you can google and find tons of articles about them. Key is that both AP and client must support each standard. Most of the recent iPhones, iPads (ref) and Samsung Galaxy devices (ref) appear to support these standards.

If you know your clients, and also AP support these features, you would consider turning them on (if not on by default). Having said this, if I turn on 802.11r on my UniFi network, I got constant warning messages on network history log. So I have mine off though still have seamless roaming. In fact, this is an known issue/side effect of 802.11r.

There are a slew of devices, including some printers that can not, and never will, support 11r – it’s due to the way the handshake was implemented, and in all cases it is a bug in the WiFi software of the device/printer. So the only way to work-around it was to have separate SSIDs – one that supports 11r, and one that does not.

https://community.ui.com/questions/Unifi-support-for-Roaming-with-802-11r-802-11k-and-802-11v/8351c3d0-217b-411b-bfb6-4731818f2449#answer/feb02c30-9eb1-4b2a-bbaf-bab74fd47973

One of the issues with 802.11r is that many older client devices don’t have drivers that support it, and in fact even have trouble properly detecting and associating to networks with 802.11r enabled. […] many older client drivers cannot read and interpret the new FT information element in the beacon frames properly so they see the beacons as corrupted frames. Therefore, to ensure maximum client compatibility, the common recommendation is to disable fast roaming when using WPA2 Personal, and only use it for WPA2 Enterprise networks.

https://www.networkcomputing.com/wireless-infrastructure/wifi-fast-roaming-simplified

For this reason, Cisco came up with what’s called adaptive 802.11r where 802.11r does not necessary need to turn on 802.11r on their network to use them i.e. supported device benefits the standard without hurting older devices without the technology support (ref). On the other hand, UniFi official forum moderator suggests to separate SSID.

WPA Personal vs. Enterprise

Although if you are reading this article, you may be one of more security conscious individual. For such, if an option of WPA Personal and Enterprise were given, you may feel like choosing the enterprise over the personal because it just sounds more secure. It it true that Enterprise is more secure but that comes with more overhead/cost. In our case, it directly translates into roaming time.

The real benefit of 802.11r comes from not having to do the 802.1X/EAP exchange when using WPA2 Enterprise security.  Even with a local RADIUS server, this exchange can easily take several hundred milliseconds, and far longer if your RADIUS server is not on your LAN, but requires access over the Internet. Thus, fast roaming should ALWAYS be enabled when you are using WPA2 Enterprise security.

https://www.networkcomputing.com/wireless-infrastructure/wifi-fast-roaming-simplified

For most home users, WPA Personal is totally sufficient (in my personal opinion).

Cellular Roaming

Some platforms support roaming between WiFi and cellular network during WiFi call to ensure even better seamless roaming during calls. Both HPE Aruba and Ruckus Unleashed support this. To enable this feature, neither system cost anything extra but we must make sure to enter or have entry for proper cellular network in the correct setting section for each system.

For example in HPE Aruba Instant, the WiFi calling section is automatically populated as shown below. However, in Ruckus Unleashed I had to manually add the entry for my cellular company network address.

Aruba Instant Wi-Fi Calling

Once these are activated, I can get truly seamless roaming for one of most roaming dependent application, voice calls. Unfortunately, at the time of this writing, system like Ubiquiti UniFi does not support this.

Cell Size

Before diving into the solution of the other two issues, one key concept we need to discuss that is a cell size, also known as the basic service area (BSA). In a context of Wi-Fi, cell refers to the physical range/area that is covered by a given access point (ref). For the proper roaming, we need to adjust cell sizes of each AP.

Cell Size may be controlled by 3 main ways:

  • Transmission Power
  • Data Rate Control
  • RSSI Control

Transmission Power

Conceptually a straight forward. By increasing a transmission power, the AP signal can reach further physical distance i.e. increases its cells size.

For example, UniFi Access Point allows level based control: low, medium, high and auto as well as custom setting. This can be done under device > config > radios.

UniFi AP: Settings > Transmission Power

In a systems that have automatic radio resource management like Ruckus Unleashed, HPE Aruba Instant Mode, transmission gets adjusted automatically based on the other AP on the same network. UniFi’s auto mode sounds as if this can happen, they do not. They now say it chooses the power on start up but in the past it has always been set to the max.

Data Rate Control

Further the distance between the client and AP, there will be lower virtual/wireless link (PHY), which is represented as a data rate. By selecting a minimum data rate required for connection between AP and client, the lower data rate connection won’t be accepted i.e. cell size is reduced.

In UniFi this is called minimum data rate control. In Ruckus Unleashed, its called BSS Min Rate.

RSSI Control

RSSI stands for received signal strength indication, which is a signal strength seen by given client for given AP. This is indicated by negative integer with a unit of dBm. The further away the client is, the weaker the signal it receives i.e. more larger negative number RSSI becomes.

Similar to data rate control, AP can potentially set minimum RSSI required to associate with it. Anything below the minimum RSSI, client will be prevented from connecting to the AP.

For example, UniFi required enabling advanced features on to use this. Currently, UniFi minimum RSSI setting is a soft approach i.e. AP kicks out associated client but there is no way to prevent the client from reattempting to connect back to the same AP (ref).

This setting can be found on AP > Settings > Minimum RSSI.

Sticky Client Issue

Now let’s look at Sticky client issue.

Key concept to remember here is general rule is client decides when to dissociate/start looking for new AP. For example, iOS devices use RSSI value of -70dBM as their cut-off value (ref).

iPhone, iPad, and iPod touch monitor and maintain the Basic Service Set Identifier (BSSID)’s connection until the Received Signal Strength Indicator (RSSI) exceeds -70 dBm. Then, the device scans for roam candidate BSSIDs for the new Extended Service Set Identifier (ESSID).

About wireless roaming for enterprise (Apple)

Basically, client needs to first disconnect before being able to associate with a closer AP. So if you do nothing and leave things to default setting in system like UniFi, or perhaps consumer routers, you indeed may be just waiting for client to dissociate and feel as if you have no control.

However, as we’ve seen in the above section, we can configure AP’s cell size such that we can kick out client by design.

Here is a brief example.

Using a client you are aiming to fix sticky client issue:

  1. Gather RSSI threshold of the device e.g. -70 dBm for iOS
  2. Check RSSI at location where you aim to achieve AP switch/roaming i.e. where currently having sticky client issue.

Most likely you will find out your AP (sticky one) has RSSI value higher than the threshold.

3. Reduce cell size.

My preference is either transmission power or minimum data rate control before touching RSSI minimum. Whatever you change, you need to make sure the other AP is there to serve the spot.

I’d personally choose data rate control before RSSI. With rate control, 5 GHz I’d start with 12 Mbps and see if it achieves your desired roaming. If everything is working but not achieving roaming at desired spot, you can try go one level higher at a time. In general if you are going this route, I think it’s safer and more logical to create SSID that is specific to devices that you like to roam well e.g. phones, tablet, and isolate from IoTs. This way your settings like minimum data rate control will not affect other devices.

Basically, it’s important to note any of these settings can potentially create unstable network. Imagine you set minimum data rate control to 24 Mbps on 5 GHz radio, but one of your client can only get 12 Mbps connection, perhaps being too far. That client now cannot connect to the network at all.

In certain system, this is automatic and more advanced. From my limited number of tested equipments, HPE Aruba’s client match technology works really well.

What makes ClientMatch different is that it uses a system-level view of the entire network to continuously monitor the health of all associated clients. By dynamically gathering client information (e.g. signal strength and channel utilization) from each AP without any client-based software to install or maintain it’s easy to implement at scale. This client data is then aggregated and shared among all APs to coordinate and make real-time decisions as conditions change.

For instance, ClientMatch will identify when a client is connected to an oversubscribed AP and when there’s a less congested AP with a stronger signal only 15 feet away. It will then dynamically move clients appropriately.

What is Aruba Client Match?

Basically, I’ve deployed UniFi, Ruckus Unleashed and HPE Aruba multi-AP setup in my house in attempt to achieve similar end to end high throughput in my home. Using the exact same client device (iPhone), Aruba was the only one without any manual configuration able to achieve nearly perfect roaming instantaneously anywhere in my home to the closest AP, and hence highest throughput. So it is perfectly possible for well deployed, configured AP, you can control roaming behaviors of client devices.

X (signal strength) over Y (throughput) issue

Although as you have seen it is not just signal strength that determines the priority of AP in certain devices like Apple, signal strength often feels like taking priority over the throughput, especially when someone notice the issue. This is because most of us look at throughput and say “that’s slow, bad connection”. Most of don’t look at WiFi bar 3 vs. 4 and say that’s bad signal.

Practically, this is often seen on a combined SSID design i.e. 2.4 GHz and 5 GHz are on the same SSID. 2.4 GHz has better range so it often gets stronger signal. So once the threshold by client is met, 2.4 GHz connection may take priority over the 5 GHz even though if you force connect to 5 GHz, you can actually get faster throughput.

The simplest solution in this situation is to separate 2.4 GHz and 5 GHz bands into their own SSID. However, some still prefers to keep smart/combined SSID i.e. single SSID for both 2.4 GHz and 5 GHz bands.

In such case, the solution for an issue where 2.4 GHz band chosen over the 5 GHz is the same as sticky client. Conceptually, we just need to make sure 2.4 GHz signal from a given AP is not stronger than 5 GHz where 5 GHz is reachable. In case of iOS, -65 dBm or above on 5 GHz band preferentially choose 5 GHz over the 2.4 GHz.

If the BSSID’s RSSI is greater than -65 dBm, the device prefers a 5 GHz network.

About wireless roaming for enterprise

Again, system like HPE Aruba takes care this part well also so I have combined SSID and never had issue 2.4 GHz chosen over 5 GHz with the set up. In fact, I can walk from 2.4 GHz range of a given AP and work towards the AP, and Aruba setup switches to 5 GHz despite 2.4 GHz single continues to get stronger.

Conclusion

In summary, roaming issue can be categorized into three types: transition, sticky client, and priority issues. Transition issue can be primarily fixed by using various protocol/technologies, which requires both client and access points to support. Sticky client requires either manual or automatic radio resource management, in particular cell size. Priority issue, primary in the situation signal chosen over throughput, you can treat it as sticky client issue between different radio bands and try solving that way. But the easier way is to split into dedicated radio band SSID. I hope you find this helpful.

Reference