Meet the Emmy-winning engineer whose algorithms are behind your Netflix binge

Photo Credit: Jesse Petersen

Every time you hit play on a video, chances are you have Al Bovik to thank for its visual quality.

Bovik, professor and Provost’s Chair in Engineering in the��Department of Electrical, Computer and Energy Engineering, has spent decades developing algorithms that now influence nearly 80% of internet and social media content.

At the center is digital visual perception, or using the neuroscience of human vision to make streamed video look as sharp and natural as possible. His work is used by familiar brands like Netflix, Amazon and YouTube.

Understanding not just how cameras capture patterns of light, Bovik explains, but how the brain interprets it is an important element that drives his research.��

“The question that really gripped me over time was: can we model mathematically how we see?” Bovik says. “That’s a very different and much harder problem.”

His achievements in visual perception processing has landed him two Emmys: a Primetime Emmy Engineering Award for�� and a Technology and Engineering Emmy from the Academies of Television Arts and Sciences. They also earned him the IEEE Edison Medal, which he shares with Alexander Graham Bell, Nikola Tesla and Ray Dolby.
��
We sat down with Bovik to discuss his career, the neuroscience hiding behind your favorite TV or movie and why his proudest achievement isn’t just theories and algorithms.
��
For someone outside the field, how would you describe what digital processing is?

At its simplest, image processing is about manipulating visual information using computations. Digital processing involves inventing theories and algorithms to help make television and movies more efficient, faster and higher quality. What I do is more than that. It is modeling the visual parts of the brain mathematically, then using those models to create algorithms for better photography, TV shows and movies.

What first drew you toward the field of digital processing?

I’m a deeply visual person. Whenever I travel, the first place I go is an art museum. If I go a week without seeing a movie, I go into withdrawal. I’m a visual, spatial thinker and suddenly here was a field that lived at the intersection of mathematics and how we see the world. Then I took an image processing class from Thomas Huang, one of the inventors of image compression, and everything changed overnight.��I knew immediately: This is what I want to do. I’ve never looked back.

What does the science of human vision reveal about how we see digital content?

We know that image processing happens in various brain centers, including the primary visual cortex��—��the very back of the brain. Vision requires processing an enormous amount of raw information, compressing it into concise, efficient representations that the brain can use to recognize a car on the highway or track a bird in flight. We can model that mathematically and start exploring questions like why do we look where we look, or where does your gaze land when you’re driving? The same holds true in videos��—��your eyes are directed to certain areas when viewing a particular scene.

What are the acclaimed algorithms that you innovated, ones people don’t necessarily notice?

We created a variety of algorithms used throughout the streaming and social media industries. These algorithms use mathematical models of how visual distortions are perceived in the human brain, using them to predict how a human will rate the visual quality of a picture or video. For example, they are widely used to control the compression of television and movies streamed worldwide. Compression is necessary since videos are huge and would not be practically streamable otherwise. One of them, called structural similarity (SSIM), allows the big streamers and social media platforms to compress content as much as possible to the point just before noticeable distortions appear. Engineers at companies like Netflix, Meta Platforms, Amazon and YouTube use this technology.

Can you walk us through what’s happening technically when someone presses play on Netflix?

Let’s say you’re watching��Stranger Things. The moment you start a scene, up in the cloud, approximately 20 different versions of that scene have already been prepared, each compressed a different amount, each perceptually optimized using our algorithms. Some are also spatially downsampled: A 4K video might have versions encoded at 2K or even lower resolution.

Your device, whether it’s on your phone or TV, measures the available bandwidth, which changes constantly, especially if someone is on the move in a city with tall buildings and requests whichever of those 20 versions best fits your current conditions. This happens scene by scene, continuously.

Here’s the part that surprises most people: You might think you’re watching 4K, but if your bandwidth is constrained, you might actually be receiving a heavily compressed 2K version that’s been decompressed and upsampled back to 4K on your TV. Visually, you can’t tell the difference because of our video quality algorithms.��

Your algorithms also help determine how much video can be compressed before viewers notice a difference. How does that work?��

Another algorithm we developed, called visual information fidelity, or VIF, predicts how a person will perceive the quality of a video after it has been compressed. It tells the Netflix video quality system the point where distortions may be visible. Netflix’s video streaming is built on these neuroscience principles and sometimes I say that they have now become a visual neuroscience company.

Professor Al Bovik and two former PhD students at the 2015 Primetime Emmy Engineering Awards ceremony.��

How did your first successful model, structural similarity, come about?��

Almost by accident, honestly. My students and I were working on video compression, and we ran into a fundamental problem: How do you even measure whether your results are good? How does a human perceive the quality of a picture? Nobody had really solved that, and most researchers thought it was unsolvable. So we built our own model. We were amazed when the entire television industry noticed and adopted it. The streaming world discovered it early while they were wrestling with the question of how much to compress video before it starts looking distorted to a viewer. This was especially important since the new wireless/smartphone systems had very limited bandwidth. SSIM gave them a way to find that compression point and deliver perceptually compressed videos to everyone. Every photo uploaded to Facebook, Instagram, WhatsApp or Reels is now optimized using a model rooted in visual neuroscience. We had successfully introduced the principles of visual neuroscience throughout the internet.��

You’ve also worked with Meta for nearly a decade on virtual and augmented reality. What does that world look like?

It’s one of the most exciting problems I’ve worked on. Imagine wearing advanced AR glasses here in Colorado, while your colleague is wearing a similar pair in Paris. You can see each other in 3D, in real time, as if you’re in the same room. The challenge is that the display is an inch from your eye, so you need a far denser resolution, perhaps 8K or 16K, which means vastly more data to compress and transmit. Our approach is the avatar model: rather than sending a live 3D video feed, you build a photo realistic 3D model of the person which is stored on your friend’s AR glasses, and only transmit their facial movements determined by cameras and image processing in your own glasses, which requires far less bandwidth. The 3D avatar is animated in real time on the receiving end.

What are you most proud of during your teaching career and working partnering with some of the largest digital giants?��

I ask myself, “Am I giving my students the best possible opportunities?” My students are not just programmers, and they’re not just video engineers. They’re also trained as visual psychologists and neuroscientists. The thing I’m most proud of is the successes of my students. The Netflix video team is largely composed of students from our��Laboratory for Image and Video Engineering (LIVE). What matters most to me are the people who came through this lab and went on to shape an industry. No less than six of my students have Emmy statuettes on their shelves at work or home. If I were to ask myself why I’m here at 91��ý, that’s the answer, along with living in the Colorado mountains!

What’s an aspect that people may not realize about your work in image processing?��

The internet now accounts for nearly 10% of global carbon emissions, and that’s growing fast. Our algorithms help reduce internet video data volume, which is 80% of internet traffic, by nearly 25%. By reducing the amount of data moving through global networks, we are shaving off a meaningful fraction of that footprint, and the ecological impact is real.

Burning question: Do you have a favorite movie and show that has used your algorithm?��

Pretty much any movie or TV show I watch will be processed by these algorithms. These would include British Mystery shows like Broadchurch, Grace and Prime Suspect, which my wife and I watch all the time, and movies with great acting, cinematography and directing, like��The Godfather,��2001: A Space Odyssey,��Blade Runner, Spartacus, Gladiator and many more. This year, I especially liked��Sinners and��One Battle After Another.

91��ý