Machine learning, deep learning, neural networks, artificial intelligence...You can't work a day in tech without coming across one or more of these terms. For a developer just looking to get started, it's hard to wade through the jargon and ever-changing tools.
And when thinking about running machine learning models directly on mobile devices, the equation becomes even more complicated. There are unique considerations when working through the entire project lifecycle, from how to collect and label data, all the way to managing and improving models across platforms and devices.
The goal of this resource guide is to serve as a detailed glossary that will help you make sense of the quickly-evolving landscape for mobile machine learning.There's a whole lot to cover, so let's jump right in.
AI, a subset of machine learning, is composed of two modes: Training models with data, predictably called training; and using those models to make a prediction, called inference. Until recently, both of these modes were exclusively on servers, cloud services, and desktop computers.
However, we're currently at an inflection point, and mobile devices are soon going to dominate inference. The growth of the entire AI ecosystem is going to be fueled by mobile inference capabilities. Thanks to the scale of the mobile ecosystem, the world will become filled with amazing experiences enabled by mainstream AI technology…in fact we're already seeing evidence of this today.
A classic technology dynamic is being repeated with AI: Tasks that were once only practical in the cloud are now being pushed to the edge, as mobile devices become more powerful. Increases in processing power on desktops and the cloud have driven the last decade of AI growth. Amazon, Microsoft, and Google now offer services for both training and hosting machine learning models at scale. Relatively recently, mobile devices have emerged with specialized AI hardware and capabilities. But why would we do this on mobile devices when servers are so capable?
At some point, it becomes easier to move the models to the data than the data to the models. Sensors in your phone, like microphones and accelerometers, sample at a rate of thousands of points per second. Cameras capture millions of pixels at sixty frames-per-second. It's all too much to reliably send over a network connection to a server. As long as we're inseparable from our smartphones, we'll need to move the models to the data.
Today's best-of-breed mobile devices are reasonably good inference machines, but the next generation will be purpose-built for deep learning. We are currently making this transition—from AI-enabled to AI-first.
Without a more full understanding of the potential benefits of on-device ML, it might seem like sticking with a traditional server-side/cloud-based approach makes the most sense. But there are a few distinct advantages to choosing on-device machine learning.
Because model predictions are made on-device, no device-to-cloud data transfer is required. Thus, models running on-device are capable of processing data streams in real-time, speeding up key UX components.
In recent months, the subject of protecting data privacy has become central in the larger AI industry. With on-device AI, data doesn't need to be sent to a server or the cloud for processing. This closed loop offers end users more control over who sees their data, when, and how.
Sending data to the cloud for processing also requires an active and reliable internet connection. This is a significant barrier to access for remote and developing areas of the world. On-device AI is therefore an essential mechanism for democratizing this transformative technology.
Avoiding heavy data processing between devices and the cloud can also be a huge cost-saver, as modern devices increasingly have advanced Neural Processing Units (NPUs) that can handle these workloads on-device.
Before actually building and deploying AI to mobile, you'll need to collect and label data on which to train models. This can be a time-intensive and costly process, but if you know what to pay attention to, you can avoid common pitfalls and collect and label high-quality datasets. Not an exhaustive list, but these are a few need-to-know concepts when working with data for mobile ML.
Data is the foundation to any successful ML project and taking the time to gather, organize, and understand your data is essential.
One of the most important concepts when working with data to train ML models is cross-validation. Cross-validation is the process of splitting data into multiple, mutually exclusive sets so that we can objectively evaluate a model's accuracy and guard against overfitting. It might be tempting to train a model on all of the data we have—doing so might result in very high reported accuracy numbers. However, as soon as we use our model on new data submitted by users in the real world, we notice that performance is far lower. This is because the model has overfit or memorized the training data, but has not learned the general concepts of our dataset and is thus unable to function in novel environments. Cross-validation guards against this by holding out a specific portion of the dataset during training and reserving it for use as a novel test case for the model.
The training split is the portion of the dataset reserved for the model to learn from. While we often look at the accuracy and performance of models on training data, we can't rely on those numbers for accurate representations of how the model will perform in the wild.
The validation split is typically a small portion of the dataset (5-20%) held out from training and used for quick checks to make sure a model is learning during training. Because the validation split is not shown to the model during learning, it can't be memorized. And because it's small, it doesn't add much time to training jobs.
The test split is a portion of the dataset mutually exclusive of validation and training that's used to measure the model's final accuracy. Test splits are typically larger than validation splits, and thus are more time-consuming to evaluate. The more closely test splits match real-world data, the more confident you can be your trained model will perform well in production.
Raw data comes in the form of images, audio clips, text, etc. But raw data alone isn't enough to train the types of models we're often interested in. Most deep learning models are trained via supervised learning. They are shown both questions and answers and learn statistical patterns between the two. For example, training an image classification model to answer the question "Is this picture a cat or a dog?" requires both images of cats and dogs, as well as labels denoting which is which.
Annotations refer to the specific labels or answers we want models to produce when shown a particular input. Annotations for image classification models might be string labels or class numbers. Annotations for object detection models might be the coordinates of bounding boxes surrounding each person in an image. A single image can have many annotations, including multiple labels, bounding boxes, or segmentation masks. Different annotations can then be used to train models to perform different tasks. The most important thing is to keep track of is the associations between raw model inputs and the associated annotations.
Given the popularity of computer vision models, it's useful to understand the specific annotation types for common tasks:
Labels: Image labels are strings or integers denoting a specific class an image belongs to. For example, a classification model might output the label "cat" or "dog".
Bounding boxes: Bounding boxes are used for object detection tasks. They describe the four corners of a box enclosing an object in an image. Object detection models typically need both the coordinates of the box and a label for the object inside.
Keypoints: Keypoints are the coordinates of individual objects within an image. For example, the tip of a finger or the center of a ball. Keypoints are used to train pose estimation models.
Segmentation masks: Segmentation masks denote areas of an image that cover specific objects. For example, a segmentation mask that covers people can be used to train a model that separates people from backgrounds of photos or videos—like a virtual green screen.
Collecting and annotating data can be expensive and time consuming. Most of the time, it's not possible to collect data to match every possible scenario models will encounter in real-world use. Luckily, data augmentation techniques can help you get the most out of the data you do have.
Data augmentation works by taking a piece of data and associated annotations and randomly modifying it to introduce more variety for models to learn from. For example, images can be augmented by adjusting brightness, hue, and contrast, or by shifting positions, zoom, or adding noise. Audio can be pitched up or down or run through effects. When it comes to mobile machine learning, data augmentation is especially useful for simulating artifacts introduced by using the smaller, noisier sensors found in smartphones or IoT devices.
Most of the time, data augmentation is applied in real time during training loops, but it's also possible to create augmented datasets as a completely separate step in your workflow before training begins.
Over the past few years, a new data source has emerged, and it's radically changing the economics of machine learning: synthetic data. Rather than collecting and annotating data by hand, we're getting better at creating it programmatically, and in some cases, it's even better for training models than the stuff collected from the real world.
Synthetic data is data that's generated programmatically. For example: photorealistic images of objects in arbitrary scenes rendered using video game engines, or audio generated by a speech synthesis model from known text. It's not unlike traditional data augmentation where crops, flips, rotations, and distortions are used to increase the variety of data that models have to learn from.
While no small feat, collecting a dataset that's suitable to train a machine learning model is only part of the equation. The next step is using that data to train a machine learning model.
Thus far, we've used the phrases machine learning, AI, and deep learning interchangeably. Technically speaking, neural networks are just one type of machine learning model. Within the segmentation of neural network models, a subset has proven to be particularly powerful. "Deep" models, featuring many stacked layers, perform extremely well on many tasks, which has given rise to the field of "deep learning".
Although there are many machine learning algorithms that don't use neural networks at all, their versatility makes them a popular choice for many projects. But in order to power ML experiences on devices with compute and power constraints, these more traditional methods are typically less suitable for on-device ML tasks.
Underneath every machine learning model is a low-level framework supplying the basic operations used to train a model and make predictions. Though there are more than a dozen frameworks in use, the vast majority of deep learning projects are written within the TensorFlow or PyTorch ecosystems. If you're just getting started with mobile machine learning and you've done some research on a particular model or feature you hope to build, chances are you've come across a repository that uses one of these tools.
TensorFlow is one of the fastest-growing and most popular open source software projects of all time. You've probably heard TensorFlow in association with neural networks and deep learning, but it's a general framework for executing numeric operations using data flow graphs.
In recent months, TensorFlow has made a large push to improve usability and performance. Eager execution and prioritization of the Keras API make it easier than ever to build and debug models, while the new MLIR compiler and GPU delegates make it possible to run performant models on any hardware, from CPUs to TPUs. Specifically, TensorFlow Lite has become the defacto model specification for running neural networks on Android devices and microcontrollers. More on that later.
A popular Python-based framework for deep learning, most often used with a TensorFlow backend. For mobile ML, we recommend Keras over TensorFlow directly because it's simpler, and coremltools has the easiest time converting Keras models to Core ML. TensorFlow models can be extracted for conversion to TensorFlow Lite, as well.
Developed and maintained by Facebook AI, PyTorch is a relative newcomer to the deep learning scene, celebrating it's 3-year birthday in January, 2020. In just a few years, though, it's gained considerable traction and is the framework of choice for popular deep learning courses like fast.ai. PyTorch has become the preferred framework of choice for deep learning researchers due to its simple but flexible API. Recently, PyTorch has improved its offering for production use cases with the introduction of PyTorch Serving and PyTorch Mobile (covered below).
While models built with the above frameworks will work on server-side applications, they won't yet run on smartphones. To do that, you'll need to convert your model (which we'll cover later) into a mobile-ready model format. The model format you'll need will depend on the platform you're deploying the model to (iOS or Android).
Core ML was announced by Apple at WWDC '17. It's a specification for trained models that all Mac devices can parse and compile into hardware-accelerated machine learning code. Converters now exist to transform models from most frameworks (e.g. TensorFlow or PyTorch) into Core ML files, which can be added to your iOS project. The most recent version of Core ML includes a first and promising first step towards on-device training. It also allows developers to write custom operations that leverage all available hardware, making it possible to implement almost any model if you're willing to dig into low-level GPU code.
If you're feeling up to it, you can convert raw TensorFlow models for use on-device. TensorFlow Lite is a pared down, mobile-optimized runtime to execute models in iOS and Android apps. For now, though, there's limited support for many operations, and performance can be somewhat lacking. However, the TFLite team is making significant progress, and some recent benchmarks suggest performance improvements, especially when accessing the GPU delegate. One additional note—TensorFlow Lite has recently enhanced its support for iOS, adding a Core ML delegate that bridges it with TFLite.
PyTorch's cross-platform mobile ML framework is a relatively new tool for helping mobile developers and machine learning engineers embed PyTorch models on-device. Currently, it allows any TorchScript model to run directly inside iOS and Android applications. It also includes support for model optimization techniques like quantization and a dedicated runtime. It's still in its early days, and as such is a bit behind in terms of functionality and model performance, and the framework specifically lacks GPU support for now.
OpenCV is one of the most mature, well-supported computer vision frameworks. Both iOS and Android have been supported for years and there are tons of algorithms already implemented, including the latest neural networks. Like PyTorch Mobile, OpenCV does not currently support the use of GPUs on mobile devices.
The problem you're tackling, in terms of the kind of model you'll need, may have already been solved for you. Things like general image recognition with thousands of object categories, language translation, and voice recognition are all offered as APIs by major providers and startups alike.
Pre-trained models are great for getting started quickly, building an MVP, or validating an idea. For example, you might want to test out a feature that automatically organizes a users photos via an image classification model. You can get up and running quickly with a pre-trained model that predicts 1000 labels from the popular ImageNet dataset. No training required. If the user experience is successful, you can then move up in sophistication and think about training a custom model for your use case.
There are a few popular SDKs providing high quality pre-trained mobile-ready models.
When deploying ML models to mobile, it's important to remember that these devices have certain power and compute resource restraints. As such, understanding and investing in tools and processes that optimize on-device models is crucial in order to effectively balance model size, speed, and accuracy.
Choosing the right architecture is one of the most important decisions you'll have to make. Many popular neural networks such as VGG or Mask-RCNN rose to fame thanks to their incredibly accurate predictions. Unfortunately, these models often contain hundreds of millions of parameters and can take up as much as 500MB of disk space.
This isn't going to cut it for mobile devices. Instead, mobile machine learning use cases require smaller, more efficient architectures like MobileNet or SqueezeNet. These models take up a fraction of the space (5-15MB) while sacrificing only a few percentage points of accuracy.
It's also important that architecture selection takes specific layers and mathematical operations into account. State-of-the-art models from the latest papers may have great performance on the latest generation of GPUs, but if mobile hardware doesn't support the specific calculations made within the model, they may not run at all or will be relegated to the CPU, making them unusable for your app.
By default, most machine learning models are trained with parameters stored as 32-bit floating point numbers. In practice, there is no reason for calculations to be accurate out to the 8th decimal place. Quantizing model parameters to 8-bit integers or smaller can reduce model size by a factor of 4 or more while improving speed. Amazingly, if quantization is simulated during training, this compression results in almost no loss in accuracy. Core ML supports quantization, as does TensorFlow in its broader Model Optimization Toolkit
It turns out that only a very small fraction of a neural network's parameters are responsible for accurate predictions. Pruning techniques iteratively remove useless parameters during training, resulting in smaller, faster models, without a loss of accuracy. TensorFlow's Model Optimization Toolkit also includes support for model pruning.
Mobile frameworks like Core ML and TensorFlow Lite have their own conversion tools, but even so, conversion remains a tricky proposition. There are a couple of best practices that will help ensure model conversion works properly. First, test model conversion early and often to ensure that the mathematical operations underlying deep learning architectures are supported by mobile model formats. And second, stay up to date with the newest converter releases, as these versions offer the most robust support and performance. However, they can be unstable and need to be tested regularly.
Federated learning is a model training technique that enables devices to learn collaboratively from a global, cloud-based model. This global model is first trained server-side using, and each device then downloads the model and improves it using federated data collected directly from the device, training a new model version with the locally-collected data. These model changes are summarized as an update and sent back to the global model in the cloud. It's a promising technique for increasing model personalization and ensuring user data privacy. TensorFlow has their own open-source framework, TensorFlow Federated, for training models on decentralized data.
A prediction is the output of a machine learning model. Predictions may also be called inferences.
Accuracy is a quantitative measurement of how closely model predictions match ground-truth data. Quantifying accuracy for some types of models is easy. For example, the accuracy of an image classification model is the fraction of images the model correctly labels. For other model types, accuracy is more difficult to measure. For example, how do we define the accuracy of an object detection model that draws bounding boxes?
One of the most important things for any mobile machine learning project is developing a good measure of accuracy that you can map onto the user experience. For example, an image classification model that achieves 95% accuracy when organizing photo albums may be more than enough to provide a seamless user experience. But a language translation model that achieves 95% word accuracy may be completely useless.
Model size refers to how much space models take up on disk or in memory. Size is more important
for mobile machine learning projects than server-based projects. Neural networks that are hundreds or even gigabytes in size may be just fine when deployed on large GPUs in the cloud, but they would never fit onto a mobile device. Techniques like pruning and quantization (covered above) help shrink models for edge deployment.
We've only scratched the surface of the many tools, frameworks, and terms you'll come across on your mobile ML journey. But our hope is that, with this guide, you'll have steady footing as you embark.
There's a whole lot more to know about many of the entries in this glossary. Here are a few additional resources that will help you explore mobile ML in more depth:
A curated list of awesome mobile machine learning resources for iOS, Android, and edge devices.
Our sponsored publication, which covers the intersection of mobile dev and ML.
The Mobile Machine Learning Lifecycle [Ebook]
An inside look at the challenges and opportunities of mobile machine learning.
Machine Learning on Mobile [Ebook]
14 real-world examples of what machine learning can do on mobile.