This blog was written by Aditya Rathore, an Agora Superstars. The Agora Superstar program empowers developers around the world to share their passion and technical expertise, and create innovative real-time communications apps and projects using Agora’s customizable SDKs.
Hello there, weary web surfer! Have you spent countless hours trying to find a way to combine multiple video streams? Or do you just want to add a cool new feature to your Agora-based video calling app? brought here thanks to my SEO skills. No matter what brought you here I hope this tutorial provides a solution to your problem.
Introduction
In this tutorial, we explore a method to add a camera overlay to a screen-share feed and stream that as a single video track via the Agora SDK, which helps deliver a more seamless and reliable presentation experience in a video calling application.
Begin by creating an HTML file and sourcing the Agora Web SDK using the CDN link:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Then, create a JS file and initialize a variable to store the App ID generated from Agora console:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We need a simple form to get the channel and username with two buttons: a Stream button for the presenter and a Join button for attendees:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In addition, we need two divs where the streams will be displayed:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Note: transform: rotateY(180deg) is used to vertically flip the remote stream
Basic Terms
Since we will be interfacing with the underlying browser API for getting media feeds, it’s important to distinguish between a couple of similar terms to avoid confusion:
Track / MediaStreamTrack — A wrapper object around a byte stream generated by either hardware or software which can contain, for example, video, audio, and screen share data
Stream / MediaStream — An object that can contain various tracks and a few event listeners and helper functions
We can now start working on initializing the Agora client.
If you are already familiar with the process and want to jump to the next section, we basically initialize the client with live mode and bind stream initializations, with only audio for the Stream button, and video and audio for the Join button, along with necessary leave button bindings.
Let’s start by creating the Agora RTC client object with the mode set to live. You are free to use any codec you like, but for this project I will be using H264:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Since we will be working on streams in more than one function, we initialize some global variables into which we will store the initialized streams
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We initialize the client object as well as declaring the error handling function, which will be reused multiple times in the project:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Now we add the event listeners and accompanying handler functions to define a basic behavior for the app:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Finally, we define the join and leave behaviors and bind them to the relevant buttons:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
createStream() initializes a Stream object. However, this is in the context of the Agora SDK and so differs slightly from the browser API. But logically it is the same entity with a similar function. The media stream is initialized with video and audio tracks when the Join button is clicked because that is intended for the audience. When the Stream button is clicked, a Stream is initialized without any video track, because that is what we will be adding in the next sectioncreateStream() initializes a Stream object. However, this is in the context of the Agora SDK and so differs slightly from the browser API. But logically it is the same entity with a similar function. The media stream is initialized with video and audio tracks when the Join button is clicked because that is intended for the audience. When the Stream button is clicked, a Stream is initialized without any video track, because that is what we will be adding in the next section..
Note: Since the focus of this tutorial is on stream multiplexing, I will not be touching on the fundamentals. You can read https://www.agora.io/en/blog/building-a-group-video-chat-web-app/ to gain a better understanding of the basic Agora Web SDK workflow and what exactly the above code snippet is doing.
With our basic setup complete, we can move on to the fun part:
Implementing video multiplexing
In order to multiplex the two streams, we have to first initialize them. We declare a new function that will handle the multiplexing for us. Inside it, we call the necessary browserAPI functions to get user video and screen streams:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
getUserVideo() and getUserScreen functions as the name suggests will give us the necessary media streams with the required video tracks. However, these tracks are raw data, and we need a way to be able to decode them. At the moment, the HTML5 canvas that we will be using to merge the two streams cannot decode video streams. Hence we will be using two off-screen video elements to convert data streams to video. These are initialized toward the beginning of the file as follows:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Let’s also initialize the canvas we will be using to merge the two videos. This will also be off-screen because we will be using the Agora SDK to display the final stream:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
At this point, you might wonder about the performance implications of having three off-screen elements. The answer is that making them off-screen does not magically remove the load. However, this practice is not completely redundant, because the majority of the workload consists of decoding the two video tracks, which would have to be done anyway to display them. and since they are off-screen with fixed positions, they don’t cause DOM repaints and effectively act as background video decoders. Some optimization can still be achieved, however, which we will touch on later in this tutorial.
Returning to our multiplexing function, now that we have the tracks we can pass them to our video elements and begin defining the necessary parameters. And we can draw a base shape on our canvas and append it to the DOM:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Yes, there are a lot of parameters. And yes, all of it is necessary. This diagram should help you understand what exactly is going on:
Besides the parameters, we have a drawInterval, which is the time in milliseconds that the drawVideo function (responsible for drawing each frame of the video) that we will be touching on in the next section will be called. It is taking the highest frames per second (FPS), so to get the time interval we simply do seconds per frame.
The scale factor determines the percentage of screen width the camera circle radius should take.
Now let’s turn to the core of our implementation, which is one function to rule them all, the drawVideo function. Here’s how it works.
First, we take a frame from the screenStream from the video element and draw it on the canvas. We then save this state:
Next, we draw an arc (circle) and use it to clip the subsequent camera stream. This gives us a circular area of the camera stream and a blank canvas everywhere else due to the clip function:
We then restore the initially saved state to restore the screen feed everywhere except where there is already something being drawn — that is, the circle camera feed:
Here is the code for that
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Now, we call this function in the StreamMultiplexer function with the calculated interval to draw frames. The browser API offers a function to capture the canvas data and turn it into a video stream. Using the captureStream(), we pass the video track from the generated stream to the globalStream:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Finally calling the function after client init under the stream button behaviour function
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
To test, we start a web server. I will be using the Live Server npm package, for which the command is:
cd ./project-directory
Live-server .
Once we have the site open, we can input the channel name and username and click the Stream button, allowing for any needed permissions and selecting which screen we want to share. We should now be able to see our stream with the webcam overlay. We can duplicate our tab and click the Join button with a changed username to see that the stream is successfully being shared across the channel with the Agora SDK.
Conclusion
And that’s all it takes to build a presentation mode and integrate it into the Agora SDK, I feel it highlights the flexibility and freedom of use that comes with the SDK. Some optimizations can still be achieved. For example, in a more web app-like scenario an offscreenCanvas can be used in a worker that might help with performance to offload rendering away from the main thread. For more, see https://developers.google.com/web/updates/2018/08/offscreen-canvas.
Offloading video streams to a canvas opens doors to all sorts of video manipulations, the same concepts can be used to add chroma keying or custom backgrounds to your video calling app, the possibilities are endless and your power unlimited!