Skip to Content

[Under Construction] Building a 'shot-based' serverless AV1 video encoder in Azure

Table of Contents

Streaming video takes center stage

Streaming video has now become the main source of content consumption for many consumers. The number of consumers paying for streaming video services has skyrocketed in recent years. A recent survey by Deloitte concluded that for the first time more US consumers pay for streaming video services (69%) than they do for cable or satellite TV (65%). It is not surprising that streaming video has taken over given the excellent ad-free user experience.

The key component to making streaming video possible is the internet bandwidth because the quality of video being streamed is directly proportional to number of bits used to represent the pixels in the video i.e. bitrate. Therefore, a video with a higher resolution (i.e. larger number of pixels) requires a higher bitrate which translates to the user needing more bandwidth to stream the video successfully. To make this business model work without requiring every user to sign up for the highest speed internet plan, streaming video providers rely on video compression codecs such as H.264, VP9, H.265 etc. These codecs allow delivery of physical media (Blu-ray) like quality when viewed at normal TV -> couch distances for a fraction of the bitrate.

There’s a problem in paradise

The video compression space is always evolving due to the constraints placed by available internet bandwidth especially in mobile scenarios. The most popular video codec in use today is AVC/H.264 codec which is owned by the Motion Picture Experts Group (MPEG). Before a content creator can distribute video in H.264 format It’s successor H.265 was approved in 2013 and expected to offer about 25% to 50% bitrate savings for the same quality. However, H.265 has not been able to capture the market in the same way H.264 did due to its patent licensing fees which can be as high as $25 million annually. Youtube is one of the most notable streaming properties to skip H.265 adoption and rely on Google’s own VP9 codec. When one of the top video streaming sites doesn’t adopt the latest and greatest, you know there’s a problem!

A problem is just an opportunity!

In 2015, a bunch of internet, content creator and browser companies realized this problem and formed the Alliance for Open Media (AOMedia) which features some of the biggest names including Amazon, Apple, ARM, Cisco, Facebook, Google, IBM, Intel, Microsoft, Netflix and NVIDIA. The first order of business for AOMedia consortium is to deliver a state-of-the-art codec that is also royalty free. AV1, released in 2018 fulfills that vision and promises to deliver 30% better compression than H.265! Youtube has already started testing videos with AV1 through its TestTube page. Netflix has also shown support for AV1 by calling it “our primary next-gen codec”.

av1 h265 h264 vp9 video codec generations

Video codecs generations diagram by Tsahi Levent-Levi (source)


AV1 is based largely on Google’s VP9 codec and incorporates tools and technologies from Mozilla’s Daala, Cisco’s Thor, and Google’s VP10 codecs. In its current form, the AV1 encoder is quite slow compared to existing H.265 & H.264 encoders. But there are efforts underway by folks at Mozilla and Xiph to build an AV1 encoder that’s focused purely on speed from scratch – rav1e!

Shots, Shots, Shots!

Until the encoding speed of the AV1 encoders fares better when compared to existing encoders, this proof of concept introduces a potential way to speed up the overall encoding tasks. It does this by splitting the input video into “shots” and is inspired by Netflix’s approach to parallelizing their video encoding process. “Shots” as Netflix describes are – “portions of video with a relatively short duration, coming from the same camera under fairly constant lighting and environment conditions. It captures the same or similar visual content, for example, the face of an actor standing in front of a tree and — most important — it is uniform in its behavior when changing coding parameters.”

The solution

Strap in! This is going to be a long one!

TL;DR

To implement this solution, we need an algorithm that splits the input video into shots. Fortunately for us, Microsoft Video Indexer supports this scenario. Before getting started we’ll setup Video Indexer in our subscription. For the rest of the steps, here’s a quick overview of what’s going to happen:

"shot-based" serverless distributed AV1 video encoder in Azure

  1. User uploads an MP4 video file to Azure Blob Storage
  2. Because of the Azure Event Grid integration with Azure Blob Storage, a file upload event triggers a notification
  3. The event notification is consumed by the first Logic App. The first step in the Logic App is to upload the video to Microsoft Video Indexer service
  4. Once the video is indexed, we retrieve the video insights and store it in the “insights” Azure File share
  5. While the video indexing is happening, we also copy the video file from Azure Blob Storage to the “source” Azure File share where it can be accessed by container instances later
  6. When the indexing is complete, an “Indexing complete” notification is sent to trigger the second Logic App
  7. In the second Logic App, the first step is to retrieve the video insights saved earlier
  8. Next, we use an Azure Function to parse the shots data and create our container instance definitions as well as shots encoding commands for each container instance
  9. Now we can use the Logic App-Container Instance connector to create container instances based on container instance definitions defined in the last step
  10. As the container instances finish their respective encoding jobs, they save the output video in the “shots” Azure File share
  11. Next, we trigger another Azure Function to iterate over the output files and create a ffmpeg concat file
  12. Once we have a concat file, we create another container instance with ffmpeg installed to execute the concat file
  13. The output of the preview container instance i.e. all the encoded shots files that are combined to one file is saved in the “output” Azure File share
  14. The user can then download the encoded file from the “output” Azure File share
User Experience

While building this solution, I wanted to keep the user experience simple. Hence a user needs to take only these steps:

  1. Upload an MP4 video file to a specified Azure Blob Storage Account
  2. Download the encoded file from the “output” Azure File share

Implementation Details

Setup Microsoft Video Indexer
  1. Start by going to https://vi.microsoft.com/en-us/ and logging in with your Azure account
  2. Once logged in, click “Create new account”

    Azure Video indexer create account

  3. Once you’ve logged into your Azure subscription, fill in the details for the Video Indexer instance you’d like to create.

    Connect Azure Video Indexer to Azure subscription

  4. It can take a few minutes for the Video Indexer to connect to your subscription. Once that is done, copy the account id of your new account

    Azure Video Indexer account id

  5. Now login with your Azure subscription at https://api-portal.videoindexer.ai/developer and copy the Primary or Secondary key

  6. That’s it! Now Video Indexer instance is all setup in your subscription

Blob upload events
  1. Create a storage account. I named mine “serverlessn codermedia”
  2. In the storage account, create a container called “media” in the “Blobs” section. This is where the user will upload an .MP4 video file.
  3. In the “Files” section, add 4 new file shares

    a. insights – we’ll store the insights about indexed video here
    b. output – we’ll store the full encoded video here that the user can download
    c. shots – we’ll store the individual encoded shots video files here
    d. source – we’ll store the user uploaded video file here for access by the container instances

    A screenshot of a cell phone Description automatically generated

  4. Once the storage account is created, click the “Events” section of the storage account. In the “Events” section, use the “When a new blob is uploaded” quick start logic app to get started.

    A screenshot of a cell phone Description automatically generated

  5. Next screen shows the Azure Blob Storage and Azure Event Grid connections
    A screenshot of a cell phone Description automatically generated

  6. First create the connection for the storage account you just created A screenshot of a cell phone Description automatically generated

  7. Next, sign into Azure Event Grid with your current Azure subscription. Once you’ve done these steps, you should see the following screen showing green status!

    A screenshot of a cell phone Description automatically generated

  8. Hit continue and you should now land on the Logic Apps designer

  9. In the “When a resource event occurs”
    a. select Event Type Item of Microsoft.Storage.BlobCreated
    b. Add two new parameters – “Suffix Filter” with value ”.mp4” and “Subscription Name” with value anything you want

  10. In the “If true” section g. Delete all steps except “Compose”

  11. Your Logic App at this point should look like below

    A screenshot of a social media post Description automatically generated

  12. Save the logic app with whatever name you choose. In this solution, I named it as “video-indexer-logic-app”

Upload video to Microsoft Video Indexer
  1. After the “Compose” action, add a “Create SAS URI by path” action

    a. For the “Blob path”, choose the “Outputs” from the previous Compose action. You will have to click “See more” to see the output from the Compose action.
    b. Make sure you’re connected to the same Azure Blob Storage connection we defined earlier (storage-la-conn in this case)

    A screenshot of a social media post Description automatically generated

  2. Now add a “Get Account Access Token” action for Video Indexer (V2) connector.

    A screenshot of a cell phone Description automatically generated

    a. The first time you do this, you will need to enter the Video Indexer API Key we copied earlier and enter a name for this Logic App-Video Indexer connection
    A screenshot of a cell phone Description automatically generated
    b. Once the connection is created, select the location you deployed your Video Indexer instance to earlier.
    c. Select the account Id we saved earlier
    d. Select “Yes” for “Allow Edit”

    A screenshot of a cell phone Description automatically generated

  3. Now add a “Upload video and index” step and fill in the following details as shows in the image. A screenshot of a social media post Description automatically generated

For the Video Name field you can choose any name or make it dynamic using the expression tab to enter split(triggerBody()?[‘subject’], ‘/’)?[6]. This splits the input video Uri to just the file name that was uploaded A screenshot of a social media post Description automatically generated

Copy user video to “source” Azure File share
  1. Now we need to copy the source video file to the “source” Azure File share so that our encoding containers instances can access it. For that, add a “Create container group” action and configure it like shown below.

    We’re using a small wget container that will download the video from the SAS Uri we generated earlier and then copy it to “source” Azure File Share. Note that we’re using a minimal docker image, therefore we’ll need to use “–no-check-certificate” with wget to download from HTTPS SAS Uri of Azure Blob Storage.

    Note that I’m creating this container in a new resource group “encoding-containers-rg” to keep a dedicated resource group for creating container instances.

    A screenshot of a cell phone Description automatically generated

    For the containers field, you can use the following JSON to configure easily

    [
        {
            "name": <select output Video Id of “Upload and index” step,
            "properties": {
                "image": "inutano/wget",
                "resources": {
                    "requests": {
                        "cpu": 1,
                        "memoryInGB": 0.5
                    }
                },
                "command": [
                    "wget",
                    "--no-check-certificate",
                    "-O",
                    "/aci/source/< enter into expression tab split(triggerBody()?['subject'], '/')?[6] >",
                        <Insert Web Url i.e. SAS Uri we generated earlier>
                ],
                "volumeMounts": [
                    {
                        "mountPath": "/aci/source/",
                        "name": "source",
                        "readOnly": false
                    }
                ]
            }
        }
    ]


  2. Next, add an “Until” action to check for the completion of the previous container instance. Before filling in the details of the “Until” action, add a “Delay” and “Get properties of a container group” action like below.

    A screenshot of a computer Description automatically generated

    Once this is done, now you can fill in the details of the “Until” action like below. NOTE: there are a few different state variables that show up. Choose the one I highlighted in the image below. Also in the advanced mode make sure the value is following to make sure you’ve selected the correct variable

    @equals(body('Get_properties_of_a_container_group')?['properties']?['instanceView']?['state'], 'Succeeded')

    A screenshot of a computer Description automatically generated


  3. Now for some cleanup! Let’s add a “Delete container group” action

    A screenshot of a cell phone Description automatically generated

First logic app created!

At the end of above steps, your first logic app “video-indexer-logic-app” should look like below. I chose to leave the “If false” condition empty. You can setup an email notification for example if you choose to do so.

A screenshot of a cell phone Description automatically generated

Under Construction

Rest of the article is under construction, check back soon

comments powered by Disqus