Scaling your connected products

Posted 17 Nov 2016

Once your connected product works, how do you scale it? Pilgrim Beart, Founder of DevicePilot, explains various processes you’ll need to change in order to scale your IoT product.

There are many challenges to address in the early stages of delivering a new connected product, including:

Designing, building and integrating all parts of your solution (hardware, software, service and application)
User trials to find and address bugs in your technology and proposition.

These two stages are all absorbing, but think ahead to the third and final stage: scaling! Your customer is convinced that your connected application works and that users will pay for it. It’s just a matter of scaling up now – right? Unfortunately scaling brings a whole new set of challenges, which perhaps did not feature in your original plans.

From IoT R&D to IoT production

In the early stages, doing things manually is often seen as the best solution. Humans’ ability to adapt to unforeseen problems is just what is needed. Human supervision of early users coaxes them through teething problems, and provides you with invaluable feedback.

But manual doesn’t scale. You can’t hire more and more people every year. That’s why scaling up is fundamentally different from the previous challenges.

With only 10 or 100 early users, it is entirely feasible to:

Run devices on flaky servers
Visit the customer to install, maintain and diagnose
Deal with customer problems reactively
Test new code manually
Upgrade individual devices manually

If this sounds familiar, it’s time to plan the transition from IoT R&D to IoT production.

Five processes you’ll have to change to scale

Above we listed five manual processes that do not scale; it’s not an exhaustive list, but it’s a good place to start. Let’s see how we can make these processes scalable.

1) Servers

As well as architecting your servers to “scale-out”, you will also need to make your human processes more reliable and available. A modern CI/CD workflow with immutable servers and a “DevOps” mindset is the right way to go, with servers now a cheap and disposable commodity item. People-time is a precious resource, so sweat the machines, not the people.

2) Site visits

If your kit needs on-site installation or repair, to scale you’ll need to outsource this and build processes to interact with those third parties efficiently. For instance on a site visit, an installer should only be able to claim they’ve installed or fixed a device once your central system confirms it, enforcing correct processes and quality. You should remotely diagnose problems to ensure repairers arrive with the right parts to fix the problem (perhaps before the customer even knows there is a problem). This shows fantastic customer service, delivered cost effectively.

3) Customer support

There is always a role for people to support the customer and catch the long tail of infrequent, unanticipated problems. Don’t use support staff to catch the frequent problems, or they’ll drown in a sea of customer ill will.

An example of how to get this right is from my previous company AlertMe, where a small percentage of customers experienced wireless connectivity problems (that’s radio for you). Dealing with this reactively would have been a disaster, so we built a proactive system to automatically spot the problem and dispatch a wireless repeater, turning a bad customer experience into a delightful one at the lowest possible cost.

4) Testing

In a modern workflow, every developer produces multiple user-visible features or fixes every day, producing a constant stream of new code that needs testing. Developers write unit tests and regression tests, which are run automatically by your build robot when new code is checked in.

However, with IoT the interactions of your connected device with the complex real world will test your product more thoroughly than you can imagine, so whole-product functional/integration testing is vital but not manual. It can take over 10 days for a person to exercise a simple connected device through all states (e.g. testing a radio interruption during a code upgrade). This will kill your product release frequency and latency.

The solution is to instantiate your devices virtually. The embedded software in your products should also run in the cloud. That means you can exercise all features completely automatically, fast-forward time and spin up bazillions of virtual devices to throw against your service and prove it works at scale. It’s a big effort, but it’s much cheaper than leaving your customers to test your product at scale.

5) Upgrading

Upgrading firmware is the totemic in-field device process. You’re fooling yourself if you think you won’t have to do it. Your device will contain not only multiple bugs but also security holes too, and you’ll also need to track evolving standards. Upgrading is a requirement fraught with peril.

You can’t upgrade if a battery is flat or a device is offline or in use. When you start an upgrade, there are many ways it can fail, “bricking” your devices. Through all of this you need to maintain a clear view of whether a particular upgrade is working, and whether that new code is better than the old. It’s vital to have a centralised process for triggering and monitoring upgrades, and for throttling that process so any systemic problems are discovered early.

Identify your systemic failure modes

A theme emerges from the above. During the trial stage, between 10-50% of your kit won’t work properly. This is partly because you have various generations of prototype hardware and software in the field. You’ll be tempted to believe that once all your customers are running the latest versions, everything will be fine. You’d be wrong.

In my experience, giving users proper production hardware and software might decrease to around 5-10% unhappy customers. With 1m customers, that percentage equates to 50,000 unhappy customers potentially abusing you publically on Twitter and returning your product. Your business and your brand won’t survive that.

As well as bug fixing, your other vital activity during trials is identifying all the predictable failure modes of your product and deciding how to address each at scale. These can be addresses purely in software (e.g. making devices retry connections, or fall back to a bootloader if a firmware upgrade fails). But others require processes to resolve them, which need to be made scalable.

But what is scalability?

We’ve mentioned “manual processes don’t scale”. But some processes are inherently manual (e.g. battery replacement). How do we make those processes scalable?

The key is recognising the parts where you are the bottleneck. Battery replacement requires people, but if those people are users, their number grows with the number of devices you ship. It’s a scalable problem if all centralised parts of the process are automated. You can achieve scalability via an automated process to spot low batteries, sending reminder emails or even dispatching new batteries.

Conclusion

There’s more to delivering a connected product than initially meets the eye. Connecting your product turns you into a service provider – for life – and it’s essential to think ahead to the implications around scaling. We’ve considered how to address scaling with five specific examples, and how in trials you must identify your product’s systemic issues that will involve automating centralised processes.

Reducing the barriers for IoT testing is crucial to manage scalability and devices/deployments heterogeneity. If you are looking to develop tools to simplify these process check Digital Catapult’s F-Interop open call and follow how DevicePilot is helping your business to manage your devices.

Pilgrim Beart is Founder of DevicePilot. You can follow him on Twitter @pilgrimbeart. Don’t forget to follow us too @DigiCatapult.