Style transfer is a computer vision technique that allows us to recompose the content of an image in the style of another. If you’ve ever imagined what a photo might look like if it were painted by a famous artist, then style transfer is the computer vision technique that turns this into a reality.
But how does style transfer actually work? What are the different approaches, what are its potential benefits and limitations, and how might you use it in your business?
In this guide, you’ll find answers to all of those questions and more. Whether you’re an experienced machine learning engineer considering implementation, a developer wanting to learn more, or a product manager looking to explore what’s possible with computer vision and style transfer, this guide is for you.
Here’s a look at what we’ll cover:
Style transfer is a computer vision technique that takes two images—a content image and a style reference image—and blends them together so that the resulting output image retains the core elements of the content image, but appears to be “painted” in the style of the style reference image.
A selfie, for example, would supply the image content, and the van Gogh painting would be the style reference image. The output result would be a self-portrait that looks like a van Gogh original!
Here’s what this looks like in practice:
Style transfer is an example of image stylization, an image processing and manipulation technique that’s been studied for several decades within the broader field of non-photorealistic rendering.
If we think about solving this task via a traditional supervised learning approach, learned style transfer requires a pair of input images—both an original image and an artistic representation of that original image. From there, a machine learning model learns the transformation and can apply it to new original images.
Unfortunately, this approach is largely impractical, as these kinds of image pairs rarely exist. In recent years, a new approach, neural style transfer (NST), has changed what’s possible. NST employs deep neural networks to power these transformations. Neural networks are used to extract statistical features of images related to content and style so that we can quantify how well the style transfer is working without the explicit image pairs. With this improved approach, only a single style reference image is needed for the neural network to apply it to original content images.
Early versions of NST, however, were not without shortcomings. They treated the task as an optimization problem, requiring hundreds or thousands of iterations to perform style transfer on a single image. To tackle this inefficiency, researchers developed what’s referred to as fast neural style transfer. Fast style transfer also uses deep neural networks but trains a standalone model to transform any image in a single, feed-forward pass. Trained models can stylize any image with just one iteration through the network, rather than thousands.
State-of-the-art style transfer models can even learn to imprint multiple styles via the same model so that a single input content image can be edited in any number of creative ways.
For the rest of this guide, we’ll use the term “style transfer” as a stand-in for current and state-of-the-art approaches to NST.
If we’re being honest with ourselves, not everyone is born an artist. Some are more adept at language or other tactile tasks. But with recent advances in technologies like style transfer, almost anyone can enjoy the pleasure that comes along with creating and sharing an artistic masterpiece.
This is where the transformative power of style transfer lies. Artists can easily lend their creative aesthetic to others, allowing new and innovative representations of artistic styles to live alongside original masterpieces.
In addition to empowering people all around the world to experiment with their own creativity, we see the importance of style transfer playing out in the commercial art world. In recent months, Christie’s featured AI artwork that sold at one of their auctions for more than $430,000.
And with the continued improvement of AI-accelerated hardware, both in the cloud and on the edge, style transfer can now be applied to captured and live video. This new capability opens up endless doors in design, content generation, and the development of creativity tools.
Given this evolution, we can see how style transfer can be applied in a number of ways:
Of course, this isn’t an exhaustive list, but it includes some of the primary ways in which style transfer is shaping our future.
Now that we know a bit about what style transfer is, the distinctions between different types of style transfer, and what it can be used for, let’s explore in more depth how it actually works.
In this section, we’ll look at several deep learning-based approaches to style transfer and assess their advantages and limitations. While there are a number of traditional methods – including the ones mentioned above – for the purposes of this overview, we’re going to look at the approaches that use neural networks, which have become the state-of-the-art methods for style transfer.
In general, deep learning architectures suitable for style transfer are based on variations of convolutional neural networks (CNNs). For a gentle introduction to CNNs, check out this overview.
Tracing the evolution of style transfer research is a helpful way to make key distinctions between approaches. From single- and multiple-style model methods to arbitrary-style model approaches, examining the continued development of these techniques and optimizations can help us understand what’s possible with style transfer.
Training a style transfer model requires two networks: a pre-trained feature extractor and a transfer network. The pre-trained feature extractor is used to avoid having to us paired training data. It’s usefulness arises from the curious tendency for individual layers of deep convolutional neural networks trained for image classification to specialize in understanding specific features of an image.
Some layers learn to extract the content of an image (the shape of a dog or the position of a car), while others learn to focus on texture (the small brush strokes of a painter or the fractal patterns of nature). Style transfer exploits this by running two images through a pre-trained neural network, looking at the pre-trained network’s output at multiple layers, and comparing their similarity. Images that produce similar outputs at one layer of the pre-trained model likely have similar content, while matching outputs at another layer signals similar style.
The pre-trained model enables us to compare the content and style of two images, but it doesn't actually help us create the stylized image. That’s the job of a second neural network, which we’ll call the transfer network. The transfer network is an image translation network that takes one image as input and outputs another image. Transfer networks typically have an encode-decoder architecture.
At the beginning of training, one or more style images are run through the pre-trained feature extractor, and the outputs at various style layers are saved for later comparison. Content images are then fed into the system. Each content image passes through the pre-trained feature extractor, where outputs at various content layers are saved. The content image then passes through the transfer network, which outputs a stylized image. The stylized image is also run through the feature extractor, and outputs at both the content and style layers are saved.
The quality of the stylized image is defined by a custom loss function that has terms for both content and style. The extracted content features of the stylized image are compared to the original content image, while the extracted style features are compared to those from the reference style image(s). After each step, only the transfer network is updated. The weights of the pre-trained feature extractor remain fixed throughout. By weighting the different terms of the loss function, we can train models to produce output images with lighter or heavier stylization.
There are too many specific neural network architectures to cover them all here, but there we’ll highlight a few robust, reliable ones that make good places to start.
A 2016 paper by Johnson et al was the first to train an independent neural network to stylize images in a single, feed-forward pass. Large VGG16 models pre-trained on ImageNet are used for feature extractors, and a relatively small encoder-decoder network serves as the transfer network. In this approach, a single transfer network is trained for each desired style.
In 2017, a year after the original fast style transfer technique was published, researchers at Google extended the technique to allow a single transfer network to produce images in multiple styles and even blend more than one style together. Their main contribution was the inclusion of “conditional instance normalization” layers within the network so that the stylized image produced could be conditioned on an additional model input.
These networks take a content image as input, along with an additional vector telling the network how much of each style to apply to the image. So for example, a model could be trained on a van Gogh, Picasso, and Matisse painting. When it’s time to stylize an image, the user can input [1, 0, 0] for van Gogh, [0, 1, 0] for Picasso], or [0.33, 0.33, 0.33] for a blend of all three. This is a nice approach because it removes the need to train and store multiple models for multiple styles, and it provides creative freedom by letting users mix and match a continuum of styles.
A common characteristic of both single and multi-style transfer models is that they can only produce images in styles that they’ve seen during training. A model trained on van Gogh’s work cannot produce images like Picasso without retraining the entire network. Arbitrary style transfer by Huang et al changes that. Arbitrary style transfer models take a content image and a style image as input and perform style transfer in a single, feed-forward pass. In essence, the model learns to extract and apply any style to an image in one fell swoop.
Finally, there are a few optimization and extensions of style transfer that are worth mentioning.
The first is stable style transfer. If one were to train a style transfer model and then apply that model to frames of a video, they would find that the result contains a fair amount of flickering. Even small changes and noise from frame to frame in a video can be enough to introduce distracting temporal inconsistencies. For example, a soccer ball may change color mid flight. To solve this, stable style transfer models introduce a new loss term related to “temporal coherence”, where two sequential frames are stylized and compared so that the model learns to produce the same stylization for an object as it moves through the frame.
Another extension of style transfer is color preservation. In some cases, we may want to transfer the brush strokes of an artist to an image, but preserve the colors of the original palette. This can be done in a number of ways, including changing the input image representation from RGB to a different color space and applying style transfer only to the luminance channel only, or by applying a color transfer algorithm to the final stylized image.
Finally, while most style transfer use cases mentioned thus far have applied an artistic style to a photo image, it is possible to transfer other heuristics between two photorealistic images, such as weather or time of day. Photorealistic style transfer typically requires modifications to the encoder-decoder transfer network to reduce artifacting that’s introduced by deep convolution and upsampling layers.
If your use case requires that style transfer work in real-time, without internet connectivity, or on private data, you might be considering running your style transfer model directly on an edge device like a mobile phone or IoT board.
In those cases, you’ll need to choose specific model architectures to make sure everything runs smoothly on these lower power devices. Here are a few tips and tricks to ensure your models are ready for edge deployment:
To see just how small you can make these networks with good results, check out this post on creating a tiny style transfer model for mobile devices.
In this section, we’ll provide an overview of real-world use cases for style transfer. We’ve mentioned several of them in previous sections, but here we’ll dive a bit deeper and explore the impact this computer vision technique can have across industries.
Specifically, we’ll examine how style transfer can be used in the following areas:
One of the most clear applications of style transfer is in photo and video editing software. From sharing stylized selfies to augmenting user-generated music videos, and beyond, the ability to add famous art styles to images and video clips promises to add unprecedented power to these kinds of creativity tools.
And given the flexibility and performance of current deep learning approaches, style transfer models can easily be embedded on edge devices—for instance, mobile phones—allowing for applications that can process and transform images and video in real-time. This means that professional-quality photo and video editing tools can become more widely accessible and easier to use than ever before.
There are already a number of incredible tools that employ style transfer as part of their toolkits. Here are a few fun projects, apps, and demos to play with:
From poetry to artwork to music, we’ve only just entered the era of AI creations. And style transfer is one of the computer vision techniques making this wave of AI-powered artwork possible.
Whether it’s artwork sold at a high-end auction or up-and-coming artists finding new ways to share their aesthetic with the world, style transfer promises to change the ways we think about art, what originality means, and how we present art in the real world.
We could also imagine style transfer being used to create reproducible, high-quality prints for office buildings, or for large-scale advertising campaigns. These are just a few possible ways in which style transfer could change how we view the commercial impacts of art.
At a press conference during the 2019 Game Developers Conference, Google introduced Stadia, it’s cloud-powered video game streaming service. And one of the primary features included in that demo was an in-game style transfer feature that automatically recomposes the virtual world with textures and color palettes from a potentially limitless range of art styles.
Google noted that efforts like these are intended to “empower the artist inside of every developer.” This implies that gaming, much like other applications of style transfer, will broaden the accessibility of artistic creation to those who we might not traditionally see as artists.
Much like gaming, where immersive virtual worlds represent the anchor of the user experience, virtual reality has also had its share of interest in exploring what’s possible with style transfer.
While the applications with style transfer in VR are still largely in the research phase, the possibilities are exciting and promising. Facebook touts the potential of style transfer to radically alter the ways VR developers tell visual stories through their applications, games, films, and more.
And there are some early demos that show how style transfer could help augment the immersiveness of worlds created for virtual reality. Here’s one example
We hope the above overview was helpful in understanding the basics of style transfer and how it can be used in the real world. But with all things, more answers lead to more questions.
This final section will provide a series of organized resources to help you take the next step in learning all there is to know about style transfer.
In the interest of keeping this list relatively accessible, we’ve curated our top resources for each of the following areas:
The benefits of using style transfer aren’t limited to applications that run on servers or in the cloud.
In fact, style transfer models can be made small and fast enough to run directly on mobile devices, opening up a range of possibilities, including powerful image and video editors, creativity tools, and more.
From brand loyalty, to user engagement and retention, and beyond, implementing style transfer on-device has the potential to delight users in new and lasting ways, all while reducing cloud costs and keeping user data private.
Fritz AI is the machine learning platform that makes it easy to teach devices how to see, hear, sense, and think. To learn more about how Fritz AI can help you build amazing style filters, powerful photo and video editors, and more, check out our Style Transfer API.
For more inspiration, check out our tutorials for creating masterpieces in real-time on iOS and for transforming a city square to a van Gogh painting on Android.