Back to Blog

How to Combine Video Streams Using Agora Web SDK

How to Combine Video Streams Using Agora Web SDK

This blog was written by Aditya Rathore, an Agora Superstars. The Agora Superstar program empowers developers around the world to share their passion and technical expertise, and create innovative real-time communications apps and projects using Agora’s customizable SDKs.


Hello there, weary web surfer! Have you spent countless hours trying to find a way to combine multiple video streams? Or do you just want to add a cool new feature to your Agora-based video calling app? brought here thanks to my SEO skills. No matter what brought you here I hope this tutorial provides a solution to your problem.

Introduction

In this tutorial, we explore a method to add a camera overlay to a screen-share feed and stream that as a single video track via the Agora SDK, which helps deliver a more seamless and reliable presentation experience in a video calling application.

So what are we going to do?

Prerequisites

Project Setup

Begin by creating an HTML file and sourcing the Agora Web SDK using the CDN link:

<!DOCTYPE html>
<html lang="en">
<head>
<title>Agora Web Presentation Mode Tutorial</title>
<script src="https://cdn.agora.io/sdk/release/AgoraRTCSDK-3.4.0.js"></script>
<script src="./agora-rtc.js"></script>
</head>
<body>
</body>
</html>
view raw index.html hosted with ❤ by GitHub

Then, create a JS file and initialize a variable to store the App ID generated from Agora console:

const AppId = “**********************” //Your App Id here

Building a Basic User Interface

We need a simple form to get the channel and username with two buttons: a Stream button for the presenter and a Join button for attendees:

<div class="form">
<span class="headingText"> Join or start a stream! </span>
<input type="text" id="userName" placeholder="Username" value="JohnDoe123" />
<input
type="text"
id="ChannelName"
placeholder="ChannelName"
value="DefaultChannel"
/>
<button class="submitButton" id="stream">Stream</button>
<button class="submitButton" id="join">Join</button>
<br />
<button
class="submitButton"
id="leave"
style="background-color: #ff0000; width: 100px; margin-top: 10px"
>
Leave
</button>
<br />
</div>

In addition, we need two divs where the streams will be displayed:

<div class="stream-container">
<div id="SelfStream" style="transform: rotateY(180deg)">
<div id="labelText" style="transform: rotateY(180deg)">
<span> Self Stream </span>
</div>
</div>
<div id="remoteStream" class="remoteStream">
<div id="labelText">
<span> Remote Stream </span>
</div>
</div>
</div>

Note: transform: rotateY(180deg) is used to vertically flip the remote stream

Basic Terms

Since we will be interfacing with the underlying browser API for getting media feeds, it’s important to distinguish between a couple of similar terms to avoid confusion:

Track / MediaStreamTrack — A wrapper object around a byte stream generated by either hardware or software which can contain, for example, video, audio, and screen share data

Stream / MediaStream — An object that can contain various tracks and a few event listeners and helper functions

You can read more at https://developer.mozilla.org/en-US/docs/Web/API/MediaStream

Setting up AgoraRTC

We can now start working on initializing the Agora client.

If you are already familiar with the process and want to jump to the next section, we basically initialize the client with live mode and bind stream initializations, with only audio for the Stream button, and video and audio for the Join button, along with necessary leave button bindings.

Let’s start by creating the Agora RTC client object with the mode set to live. You are free to use any codec you like, but for this project I will be using H264:

let client = AgoraRTC.createClient({
mode: "live",
codec: "vp8",
});

Since we will be working on streams in more than one function, we initialize some global variables into which we will store the initialized streams

let userVideoStream;
let userScreenStream;
let globalStream;

We initialize the client object as well as declaring the error handling function, which will be reused multiple times in the project:

let handlefail = function (err) {
window.alert("Shit is failing : " + err);
console.log(err);
};
// Initializing client
client.init(
appId,
() => console.log("AgoraRTC Client initialized"),
handlefail
);

Now we add the event listeners and accompanying handler functions to define a basic behavior for the app:

// Container for the remote streams to be appended as child to
let remoteContainer = document.getElementById("remoteStream");
client.on("stream-added", function (evt) {
client.subscribe(evt.stream, handlefail);
});
client.on("stream-subscribed", function (evt) {
console.log("I was called");
let stream = evt.stream;
addVideoStream(stream.getId());
stream.play(stream.getId());
});
// Function to add/append video streams to remoteContainer
function addVideoStream(streamId) {
let streamdiv = document.createElement("div");
streamdiv.id = streamId;
streamdiv.style.height = "380px";
remoteContainer.appendChild(streamdiv);
}
// Function to remove video stream
function RemoveVideoStream(streamId) {
let stream = evt.stream;
stream.stop();
let remDiv = document.getElementById(stream.getId());
remDiv.parentNode.removeChild(remDiv);
console.log("Remote stream is removed" + stream.getId());
}
client.on("stream-removed", RemoveVideoStream);
client.on("peer-leave", RemoveVideoStream);

Finally, we define the join and leave behaviors and bind them to the relevant buttons:

document.getElementById("stream").onclick = function () {
let channelName = document.getElementById("ChannelName").value;
let userName = document.getElementById("userName").value;
client.join(
null,
channelName,
userName,
() => {
// Init stream without a video track
var publicStream = AgoraRTC.createStream({
video: false,
audio: true,
screen: false,
});
globalStream = publicStream;
publicStream.init(function () {
publicStream.play("SelfStream");
client.publish(publicStream);
});
},
handlefail
);
};
document.getElementById("join").onclick = function () {
let channelName = document.getElementById("ChannelName").value;
let userName = document.getElementById("userName").value;
client.join(
null,
channelName,
userName,
() => {
var gameStream = AgoraRTC.createStream({
video: true,
audio: true,
screen: false,
});
globalStream = gameStream;
gameStream.init(function () {
gameStream.play("SelfStream");
client.publish(gameStream);
});
console.log(`App id: ${appId}\nChannel id: ${channelName}`);
},
handlefail
);
};
document.getElementById("leave").onclick = function () {
client.leave(function () {
console.log("Channel Left");
}, handlefail);
};

createStream() initializes a Stream object. However, this is in the context of the Agora SDK and so differs slightly from the browser API. But logically it is the same entity with a similar function. The media stream is initialized with video and audio tracks when the Join button is clicked because that is intended for the audience. When the Stream button is clicked, a Stream is initialized without any video track, because that is what we will be adding in the next sectioncreateStream() initializes a Stream object. However, this is in the context of the Agora SDK and so differs slightly from the browser API. But logically it is the same entity with a similar function. The media stream is initialized with video and audio tracks when the Join button is clicked because that is intended for the audience. When the Stream button is clicked, a Stream is initialized without any video track, because that is what we will be adding in the next section..

Note: Since the focus of this tutorial is on stream multiplexing, I will not be touching on the fundamentals. You can read https://www.agora.io/en/blog/building-a-group-video-chat-web-app/ to gain a better understanding of the basic Agora Web SDK workflow and what exactly the above code snippet is doing.

With our basic setup complete, we can move on to the fun part:

Implementing video multiplexing

In order to multiplex the two streams, we have to first initialize them. We declare a new function that will handle the multiplexing for us. Inside it, we call the necessary browserAPI functions to get user video and screen streams:

async function streamMultiplexer(){
userVideoStream = await getUserVideo();
userScreenStream = await getUserScreen();
...
}

getUserVideo() and getUserScreen functions as the name suggests will give us the necessary media streams with the required video tracks. However, these tracks are raw data, and we need a way to be able to decode them. At the moment, the HTML5 canvas that we will be using to merge the two streams cannot decode video streams. Hence we will be using two off-screen video elements to convert data streams to video. These are initialized toward the beginning of the file as follows:

let cameraElement = document.createElement("video");
cameraElement.style =
"opacity:0;position:fixed;z-index:-1;left:-100000;top:-100000;";
document.body.appendChild(cameraElement);
// Screen canvas init (offscreen)
let screenElement = document.createElement("video");
screenElement.style = document.body.appendChild(screenElement);
("opacity:0;position:fixed;z-index:-1;left:-100000;top:-100000;");

Let’s also initialize the canvas we will be using to merge the two videos. This will also be off-screen because we will be using the Agora SDK to display the final stream:

// Stream canvas init (offscreen , will be used to multiplex the streams)
let streamCanvas = document.createElement("canvas");
streamCanvas.height = 1080;
streamCanvas.width = 1920;
userCameraHeight = 960;
userCameraWidth = 540;
streamCanvas.style =
"opacity:0;position:fixed;z-index:-1;left:-100000;top:-100000;";
scaleFactor = 10;
let streamCanvasType = streamCanvas.getContext("2d");

At this point, you might wonder about the performance implications of having three off-screen elements. The answer is that making them off-screen does not magically remove the load. However, this practice is not completely redundant, because the majority of the workload consists of decoding the two video tracks, which would have to be done anyway to display them. and since they are off-screen with fixed positions, they don’t cause DOM repaints and effectively act as background video decoders. Some optimization can still be achieved, however, which we will touch on later in this tutorial.

Returning to our multiplexing function, now that we have the tracks we can pass them to our video elements and begin defining the necessary parameters. And we can draw a base shape on our canvas and append it to the DOM:

async function streamMultiplexer() {
...
cameraElement.srcObject = userVideoStream;
cameraElement.play();
screenElement.srcObject = userScreenStream;
screenElement.play();
streamCanvas.height = userScreenStream
.getVideoTracks()[0]
.getSettings().height;
streamCanvas.width = userScreenStream.getVideoTracks()[0].getSettings().width;
videoFrameRate = userVideoStream.getVideoTracks()[0].getSettings().frameRate;
camFrameRate = userScreenStream.getVideoTracks()[0].getSettings().frameRate;
drawInterval =
1000 / (camFrameRate > videoFrameRate ? camFrameRate : videoFrameRate);
cameraCircleRadius = (streamCanvas.width * scaleFactor) / 100;
offset = cameraCircleRadius / 5;
userCameraHeight = cameraCircleRadius * 2;
userCameraWidth =
userCameraHeight *
userVideoStream.getVideoTracks()[0].getSettings().aspectRatio;
// Init base canvas
document.body.appendChild(streamCanvas);
streamCanvasType.fillRect(
0,
0,
userScreenStream.width,
userScreenStream.height
);
...
}

Yes, there are a lot of parameters. And yes, all of it is necessary. This diagram should help you understand what exactly is going on:

How to Combine Video Streams Using Agora Web SDK screenshot 1

Besides the parameters, we have a drawInterval, which is the time in milliseconds that the drawVideo function (responsible for drawing each frame of the video) that we will be touching on in the next section will be called. It is taking the highest frames per second (FPS), so to get the time interval we simply do seconds per frame.

The scale factor determines the percentage of screen width the camera circle radius should take.

Now let’s turn to the core of our implementation, which is one function to rule them all, the drawVideo function. Here’s how it works.

First, we take a frame from the screenStream from the video element and draw it on the canvas. We then save this state:

How to Combine Video Streams Using Agora Web SDK screenshot 2

Next, we draw an arc (circle) and use it to clip the subsequent camera stream. This gives us a circular area of the camera stream and a blank canvas everywhere else due to the clip function:

How to Combine Video Streams Using Agora Web SDK screenshot 3

We then restore the initially saved state to restore the screen feed everywhere except where there is already something being drawn — that is, the circle camera feed:

How to Combine Video Streams Using Agora Web SDK screenshot 4

Here is the code for that

function drawVideo() {
streamCanvasType.drawImage(
screenElement,
0,
0,
streamCanvas.width,
streamCanvas.height
);
streamCanvasType.save();
streamCanvasType.arc(
streamCanvas.width - cameraCircleRadius - offset,
cameraCircleRadius + offset,
cameraCircleRadius,
0,
2 * Math.PI,
true
);
streamCanvasType.clip();
streamCanvasType.drawImage(
cameraElement,
streamCanvas.width - userCameraWidth - offset,
offset,
userCameraWidth,
userCameraHeight
);
streamCanvasType.restore();
}

Now, we call this function in the StreamMultiplexer function with the calculated interval to draw frames. The browser API offers a function to capture the canvas data and turn it into a video stream. Using the captureStream(), we pass the video track from the generated stream to the globalStream:

async function streamMultiplexer() {
...
setInterval(drawVideo, drawInterval);
// Get video stream from canvas
mergedStream = streamCanvas.captureStream(drawInterval / 1000);
tracks = mergedStream.getVideoTracks();
// Add tracks to global stream
globalStream.addTrack(tracks[0]);
}

Finally calling the function after client init under the stream button behaviour function

document.getElementById("stream").onclick = function () {
...
client.join(
null,
channelName,
userName,
() => {
// Init stream without a video track
var publicStream = AgoraRTC.createStream({
video: false,
audio: true,
screen: false,
});
globalStream = publicStream;
publicStream.init(function () {
publicStream.play("SelfStream");
client.publish(publicStream);
});
// ----------------------------------------------
streamMultiplexer();
// ----------------------------------------------
},
handlefail
);
};

Now to see if it actually works

Testing

To test, we start a web server. I will be using the Live Server npm package, for which the command is:

cd ./project-directory
Live-server .

Once we have the site open, we can input the channel name and username and click the Stream button, allowing for any needed permissions and selecting which screen we want to share. We should now be able to see our stream with the webcam overlay. We can duplicate our tab and click the Join button with a changed username to see that the stream is successfully being shared across the channel with the Agora SDK.

Conclusion

And that’s all it takes to build a presentation mode and integrate it into the Agora SDK, I feel it highlights the flexibility and freedom of use that comes with the SDK. Some optimizations can still be achieved. For example, in a more web app-like scenario an offscreenCanvas can be used in a worker that might help with performance to offload rendering away from the main thread. For more, see https://developers.google.com/web/updates/2018/08/offscreen-canvas.

Offloading video streams to a canvas opens doors to all sorts of video manipulations, the same concepts can be used to add chroma keying or custom backgrounds to your video calling app, the possibilities are endless and your power unlimited!

Other Resources

  • See the Agora Api Reference Docs for more information on the Agora sdk
  • You can find a live demo of the project here
  • You can find source code for the demo project here

I also invite you to join the Agora Developer Slack community.

RTE Telehealth 2023
Join us for RTE Telehealth - a virtual webinar where we’ll explore how AI and AR/VR technologies are shaping the future of healthcare delivery.

Learn more about Agora's video and voice solutions

Ready to chat through your real-time video and voice needs? We're here to help! Current Twilio customers get up to 2 months FREE.

Complete the form, and one of our experts will be in touch.

Try Agora for Free

Sign up and start building! You don’t pay until you scale.
Try for Free