I had a familiar conversation at work recently. A colleague asked: "There's a feature I only want to show to users if some hardware's present. The Android API for hardware presence says the API must be called on a background thread. But I need it from the UI thread, and on application startup, at the first frame. How can I do this?"
- Where does the API get the data from? From some driver implemented by the device manufacturer.
- How long could the API call take? Up to 2 seconds, with retries.
- Does the data ever change? No, the same device will always return the same value.
If we call this API on the UI thread, we might block user interaction for two seconds. That’s probably long enough to cause "Application Not Responding" timeouts (crashes!) on some devices. And it's definitely long enough to annoy users.
It’s a shame that this logically static data is fetched from a general-purpose asynchronous API, backed by the manufacturer's hardware-abstraction-layer drivers, which could be arbitrary code.
The call is at least interprocess communication, which can be slow even if the other side is implemented well. But this app was running was in a car (Android Automotive), so it's possible the driver might even fetch the data from another vehicle computer via the in-vehicle Ethernet. This might be even slower; the other computer might not even reply at all!
What can we do?
So we have a choice, either:
- Fetch the data synchronously at app startup, and risk app downtime while the app waits for the response, or:
- Fetch the data asynchronously. This may need a reworking of the code to support async loading. The code will have to handle the case where the data isn’t ready yet, then update later on once the data is ready; even if the user's in the middle of doing something with the UI. A bunch more complexity.
Please don't block the app
I told my colleague, you have to change your code to be asynchronous; we can't risk app downtime if the API is slow, because of your one feature.
You have to show something to the user while you're waiting for this data; perhaps you should hide your feature until API responds? "Pessimistic UI" and "Optimistic UI" are opposite ways of how to handle this.
This is an everyday example of the CAP Theorem. The CAP theorem (loosely) states that out of:
- Consistency (serving the right answer)
- Availability (always serving data)
- Partition-Tolerance (robustness in face of network disconnections)
You can choose max 2 of 3.
Our Android system can experience network partitions (between computers in the car), and we some data we want to read across the network... should we either:
- Block waiting for the data: choosing consistency, waiting, possibly for a long time, and blocking the user from doing their work while you wait? or
- Code asynchronously, 'guessing' until the data arrives with a default answer (maybe: "the hardware doesn't exist, disable the feature"). We lose consistency, but maintain availability.
Most modern applications are networked, so have to deal with network partitions. Usually, availability correlates with business outcomes: users being able to do the work they need to, and ultimately pay the business. So, for many applications, consistency is given up first.
What's Acceptable to Block On?
There's always a choice. Often, your app needs to make many async calls: over the network; or inter-processes; or to disk. These connections can fail. In our app, we generally accept blocking on resources where there's no other way:
- Blocking on disk reads to load our application's code.
- Making inter-process calls to known-mostly-reliable components, e.g. calling SurfaceFlinger to render frames to the screen.
We generally don't accept blocking on:
- Network calls.
- Inter-process calls backed by arbitrary unknown code.
Spotting CAP Problems
It's useful to spot when you're facing a CAP problem. CAP is stereotyped as mostly applying to server-side databases, but it comes up a lot in networked client applications.
Ask: when making a request that could take a long time, should our app wait a long time for the response (choosing consistency)?, or assume an answer until we get the response (choosing availability)?