Conversational AI is revolutionizing how people interact with artificial intelligence. Instead of trying to explain it all in one single text prompt, users can have natural, real-time voice conversations with AI agents. This opens exciting opportunities for more intuitive and efficient interactions.
Many developers have already invested significant time building custom LLM workflows for text-based agents. Agora’s Conversational AI Engine allows you to connect these existing workflows to an Agora channel, enabling real-time voice conversations without abandoning your current AI infrastructure.
In this guide, we’ll build a Python backend server that handles the connection between your users and Agora’s Conversational AI. By the end, you’ll have a production-ready backend that can power voice-based AI conversations for your applications.
Before getting started, make sure you have:
Let’s set up our Python server with FastAPI and Uvicorn. We’ll create a new project and install the necessary dependencies.
mkdir agora-convo-ai-server
cd agora-convo-ai-server
Next, let’s create a requirements.txt
for all the project dependencies.
touch requirements.txt
In requirements.txt
add the following dependencies:
fastapi
uvicorn
httpx
pydantic
agora-token-builder
python-dotenv
Next, install the dependencies by running the below-given command from the root of your project.
pip install -r requirements.txt
As we go through this guide, you’ll have to create new files in specific directories. So, before we start let’s create these new directories.
In your project root directory, create the /routes
, and /class_types
directories, and add the main.py
file. Additionally, we will be creating a .env
file for all the environment variables:
mkdir routes class_types
touch main.py
touch .env
Your project directory should now have a structure like this:
├── class_types
├── .env
├── main.py
├── requirements.txt
└── routes
Let’s implement our server’s entry point for our Fastify instance including a basic health check endpoint.
For now we’ll create a basic FastAPI app, and fill it in with more functionality as we progress through the guide. I’ve included comments throughout the code to help you understand what’s happening.
At a high level, we’re setting up a new Fastify app, with a simple router structure to handle the requests. I create a ping
endpoint that we can use for health checks.
Add the following code to main.py
:
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from dotenv import load_dotenv
import os
# Load environment variables
load_dotenv()
# Initialize FastAPI app
app = FastAPI(
title="Agora ConvoAI Python Server",
description="Python implementation of Agora ConvoAI server",
version="1.0.0"
)
# Configure CORS
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
# Health check endpoint
@app.get("/ping")
async def ping():
return {"message": "pong"}
# Main entry point
if __name__ == "__main__":
import uvicorn
port = int(os.getenv("PORT", 3000))
uvicorn.run(app, host="0.0.0.0", port=port)
Note: We are loading the PORT from the environment variables, it will default to3000
if not set in your.env
file.
Let’s test our basic FastAPI app by running:
uvicorn main:app --reload
You should see “Server is running on port 3000” in your console. You can now visit http://localhost:3000/ping
to verify the server is working.
The real power of our server comes from the Agora Conversational AI integration. Let’s get the boring stuff out of the way first, create the files for the types needed for working with Agora’s Conversational AI API:
touch class_types/agora_convo_ai_types.py
Add the following classes to class_types/agora_convo_ai_types.py
:
from enum import Enum
from pydantic import BaseModel
from typing import List, Optional
class TTSVendor(str, Enum):
MICROSOFT = "microsoft"
ELEVENLABS = "elevenlabs"
class TTSConfig(BaseModel):
vendor: TTSVendor
params: dict
class AgentResponse(BaseModel):
agent_id: str
create_ts: int
status: str
Now, let’s define the client request types.
Create class_types/client_request_types.py
:
touch class_types/client_request_types.py
Add the following classes inside class_types/client_request_types.py
from pydantic import BaseModel
from typing import List, Optional, Union
class InviteAgentRequest(BaseModel):
requester_id: Union[str, int]
channel_name: str
rtc_codec: Optional[int] = None
input_modalities: Optional[List[str]] = None
output_modalities: Optional[List[str]] = None
class RemoveAgentRequest(BaseModel):
agent_id: str
These new types give some insight on all the parts we’ll be assembling in the next steps. We’ll take the client request, and use it to configure the AgoraStartRequest and send it to Agora’s Conversational AI Engine. Agora’s Convo AI engine will add the agent to the conversation.
With our types defined, let’s implement the agent routes for inviting and removing agents from conversations.
Create the agent
route:
touch routes/agent.py
Start with importing FastAPI, our new class_types
and the agora_token_builder
library, because we'll need to generate tokens for the agent. Then we'll define the /agent
route.
from fastapi import APIRouter, HTTPException, Query
from typing import Optional, List, Union
from class_types.agora_convo_ai_types import TTSVendor, TTSConfig, AgentResponse
from class_types.client_request_types import InviteAgentRequest, RemoveAgentRequest
from agora_token_builder import RtcTokenBuilder
import os
import httpx
from datetime import datetime
import random
import string
import base64
import time
router = APIRouter(prefix="/agent", tags=["agent"])
First, we’ll implement the /agent/invite endpoint. This route needs to handle several key tasks:
Add the following code to the routes/agent.py
:
@router.post("/invite", response_model=AgentResponse)
async def invite_agent(request: InviteAgentRequest):
try:
name = generate_unique_name()
channel_name = request.channel_name or generate_channel_name()
expiration_time = int(datetime.now().timestamp()) + 3600
token = await _generate_token_(uid=os.getenv("AGENT_UID"), channel= channel_name)
# Get TTS configuration
tts_vendor = TTSVendor(os.getenv("TTS_VENDOR"))
tts_config = get_tts_config(tts_vendor)
# Prepare request body for Agora API
request_body = {
"name": name,
"properties": {
"channel": channel_name,
"token": token,
"agent_rtc_uid": os.getenv("AGENT_UID"),
"remote_rtc_uids": [str(request.requester_id)],
"enable_string_uid": isinstance(request.requester_id, str),
"idle_timeout": 30,
"asr": {
"language": "en-US",
"task": "conversation"
},
"llm": {
"url": os.getenv("LLM_URL"),
"api_key": os.getenv("LLM_TOKEN"),
"system_messages": [{
"role": "system",
"content": "You are a helpful assistant..."
}],
"greeting_message": "Hello! How can I assist you today?",
"failure_message": "Please wait a moment.",
"max_history": 10,
"params": {
"model": os.getenv("LLM_MODEL"),
"max_tokens": 1024,
"temperature": 0.7,
"top_p": 0.95
},
"input_modalities": request.input_modalities or os.getenv("INPUT_MODALITIES", "").split(","),
"output_modalities": request.output_modalities or os.getenv("OUTPUT_MODALITIES", "").split(",")
},
"tts": tts_config.dict(),
"vad": {
"silence_duration_ms": 480,
"speech_duration_ms": 15000,
"threshold": 0.5,
"interrupt_duration_ms": 160,
"prefix_padding_ms": 300
},
"advanced_features": {
"enable_aivad": False,
"enable_bhvs": False
}
}
}
# Make API call to Agora
async with httpx.AsyncClient() as client:
credential = generate_credentials()
response = await client.post(
f"{os.getenv('AGORA_CONVO_AI_BASE_URL')}/{os.getenv('AGORA_APP_ID')}/join",
json=request_body,
headers={
"Content-Type": "application/json",
"Authorization": f"Basic {credential}"
}
)
response.raise_for_status()
return response.json()
except Exception as e:
raise HTTPException(
status_code=500,
detail=f"Failed to start conversation: {str(e)}"
)
Define all the necessary functions:
def generate_unique_name():
channel_name_base = 'conversation'
timestamp = int(time.time() * 1000) # Current time in milliseconds
random_string = ''.join(random.choices('abcdefghijklmnopqrstuvwxyz0123456789', k=6)) # Generate a random string of length 6
unique_name = f"{channel_name_base}-{timestamp}-{random_string}"
return unique_name
def generate_credentials() -> str:
customer_id = str(os.getenv("AGORA_CUSTOMER_ID"))
customer_secret = str(os.getenv("AGORA_CUSTOMER_SECRET"))
credentials = customer_id + ":" + customer_secret
base64_credentials = base64.b64encode(credentials.encode("utf8"))
credential = base64_credentials.decode("utf8")
return credential
async def _generate_token_(
uid: int = Query(0, description="User ID"),
channel: str = Query(None, description="Channel name")
):
# Validate environment variables
if not os.getenv("AGORA_APP_ID") or not os.getenv("AGORA_APP_CERTIFICATE"):
raise HTTPException(
status_code=500,
detail="Agora credentials are not set"
)
expiration_time = int(datetime.now().timestamp()) + 3600
try:
# Generate token using agora-token-builder
token = RtcTokenBuilder.buildTokenWithUid(
appId=os.getenv("AGORA_APP_ID"),
appCertificate=os.getenv("AGORA_APP_CERTIFICATE"),
channelName=channel,
uid=uid,
role=1,
privilegeExpiredTs=expiration_time
)
return token
except Exception as e:
raise HTTPException(
status_code=500,
detail=f"Failed to generate Agora token: {str(e)}"
)
def generate_channel_name() -> str:
timestamp = int(datetime.now().timestamp())
random_str = ''.join(random.choices(string.ascii_lowercase + string.digits, k=6))
return f"conversation-{timestamp}-{random_str}"
def get_tts_config(vendor: TTSVendor) -> TTSConfig:
if vendor == TTSVendor.MICROSOFT:
required_vars = [
"MICROSOFT_TTS_KEY", "MICROSOFT_TTS_REGION",
"MICROSOFT_TTS_VOICE_NAME", "MICROSOFT_TTS_RATE",
"MICROSOFT_TTS_VOLUME"
]
if any(not os.getenv(var) for var in required_vars):
raise ValueError("Missing Microsoft TTS environment variables")
return TTSConfig(
vendor=vendor,
params={
"key": os.getenv("MICROSOFT_TTS_KEY"),
"region": os.getenv("MICROSOFT_TTS_REGION"),
"voice_name": os.getenv("MICROSOFT_TTS_VOICE_NAME"),
"rate": float(os.getenv("MICROSOFT_TTS_RATE", "1.0")),
"volume": float(os.getenv("MICROSOFT_TTS_VOLUME", "1.0"))
}
)
elif vendor == TTSVendor.ELEVENLABS:
required_vars = ["ELEVENLABS_API_KEY", "ELEVENLABS_VOICE_ID", "ELEVENLABS_MODEL_ID"]
if any(not os.getenv(var) for var in required_vars):
raise ValueError("Missing ElevenLabs environment variables")
return TTSConfig(
vendor=vendor,
params={
"key": os.getenv("ELEVENLABS_API_KEY"),
"model_id": os.getenv("ELEVENLABS_MODEL_ID"),
"voice_id": os.getenv("ELEVENLABS_VOICE_ID")
}
)
raise ValueError(f"Unsupported TTS vendor: {vendor}")
After the agent joins the conversation, we need a way to remove them from the conversation. This is where the /agent/remove
route comes in, it takes the agentID and sends a request to Agora's Conversational AI Engine to remove the agent from the channel.
Add the following code to the routes/agent.py
file, just below the /invite
route:
@router.post("/remove")
async def remove_agent(request: RemoveAgentRequest):
try:
async with httpx.AsyncClient() as client:
credential = generate_credentials()
response = await client.post(
f"{os.getenv('AGORA_CONVO_AI_BASE_URL')}/{os.getenv('AGORA_APP_ID')}/agents/{request.agent_id}/leave",
headers={
"Content-Type": "application/json",
"Authorization": f"Basic {credential}"
}
)
response.raise_for_status()
return response.json()
except Exception as e:
raise HTTPException(
status_code=500,
detail=f"Failed to remove agent: {str(e)}"
)
The agentRoutes
function defines two key endpoints:
/agent/invite
: Creates and adds an AI agent to a specified channel by:- Generating a secure token for the agent- Configuring TTS (Text-to-Speech) settings- Setting the AI’s behavior via system messages- Sending a request to Agora’s Conversational AI API/agent/remove
: Removes an AI agent from a conversation by:- Taking the agent_id from the request- Sending a leave request to Agora’s APINote: The Agent routes load a number of environment variables. Make sure to set these in your .env
file. At the end of this guide, I've included a list of all the environment variables you'll need to set.
Let’s update our main.py
file to register the agent routes:
# Previous imports remain the same
from routes import agent
# Previous code remains the same...
# Register routes
app.include_router(agent.router)
# Rest of the code remains the same...
Now we have the core Conversational AI functionality working! Let’s implement the token generation route, which will make it easier to test and integrate with frontend applications.
The goal with this guide is meant to build a stand-alone micro-service that works with existing Agora client apps, so for completeness we’ll implement a token generation route.
Create a new file at routes/token.py
:
touch routes/token.py
Expalining this code is a bit outside the scope of this guide, but if you are new to tokens i would recommend checking out my guide Building a Token Server for Agora Applications.
One unique element of the token route that’s worth highlighting is that if a uid or channel name are not provided, this code use 0 for the uid and generates a unique channel name. The channel name and UID are returned with every token.
Add the following code to the routes/token.py
file:
from fastapi import APIRouter, HTTPException, Query
from pydantic import BaseModel
from agora_token_builder import RtcTokenBuilder
import os
from datetime import datetime
import random
import string
router = APIRouter(prefix="/token", tags=["token"])
class TokenResponse(BaseModel):
token: str
uid: str
channel: str
def generate_channel_name() -> str:
timestamp = int(datetime.now().timestamp())
random_str = ''.join(random.choices(string.ascii_lowercase + string.digits, k=6))
return f"ai-conversation-{timestamp}-{random_str}"
@router.get("/", response_model=TokenResponse)
async def generate_token(
uid: int = Query(0, description="User ID"),
channel: str = Query(None, description="Channel name")
):
# Validate environment variables
if not os.getenv("AGORA_APP_ID") or not os.getenv("AGORA_APP_CERTIFICATE"):
raise HTTPException(
status_code=500,
detail="Agora credentials are not set"
)
# Generate channel name if not provided
channel_name = channel or generate_channel_name()
expiration_time = int(datetime.now().timestamp()) + 3600
try:
# Generate token using agora-token-builder
token = RtcTokenBuilder.buildTokenWithUid(
appId=os.getenv("AGORA_APP_ID"),
appCertificate=os.getenv("AGORA_APP_CERTIFICATE"),
channelName=channel_name,
uid=uid,
role=1,
privilegeExpiredTs=expiration_time
)
return TokenResponse(
token=token,
uid=str(uid),
channel=channel_name
)
except Exception as e:
raise HTTPException(
status_code=500,
detail=f"Failed to generate Agora token: {str(e)}"
)
Now, update main.py
file to register the token routes:
# Previous imports remain the same
from routes import token
# Previous code remains the same...
# Register routes
# Previous agent routes remain the same
app.include_router(token.router)
# Rest of the code
remains the same...
Before we can test our endpoints, make sure you have a client-side app running. You can use any applicaiton that implements Agora’s video SDK (web, mobile, or desktop). If you don’t have an app you can use Agora’s Voice Demo, just make sure to make a token request before joining the channel.
Let’s test our server to make sure everything is working correctly. First, ensure your .env
file is properly configured with all the necessary credentials.
Start the server in development mode:
uvicorn main:app --reload
Note: Make sure your .env
file is properly configured with all the necessary credentials. There is a complete list of environment variables at the end of this guide.
If your server is running correctly, you should see output like:
Server is running on port 3000
Let’s test our API endpoints using curl:
curl http://localhost:3000/token
Expected response (your values will be different):
{
"token": "007eJxTYBAxNdgrlvnEfm3o...",
"uid": "0",
"channel": "ai-conversation-1665481623456-abc123"
}
curl "http://localhost:3000/token?channel=test-channel&uid=1234"
curl -X POST http://localhost:3000/agent/invite \
-H "Content-Type: application/json" \
-d '{
"requester_id": "1234",
"channel_name": "YOUR_CHANNEL_NAME_FROM_PREVIOUS_STEP",
"input_modalities": ["text"],
"output_modalities": ["text", "audio"]
}'
Expected response (your values will be different):
{
"agent_id": "agent-abc123",
"create_ts": 1665481725000,
"state": "active"
}
curl -X POST "http://localhost:3000/agent/remove" \
-H "Content-Type: application/json" \
-d '{
"agent_id": "agent-123"
}'
Expected response:
{
"success": true
}
Agora Conversational AI Engine supports a number of customizations.
In the /agent/invite
endpoint, modify the system message to customize the agents prompt:
"system_messages": [{
"role": "system",
"content": "You are a technical support specialist named Alex. Your responses should be friendly but concise, focused on helping users solve their technical problems. Use simple language but don't oversimplify technical concepts."
}],
You can also update the greeting to control the initial message it speaks into the channel.
llm {
greeting_message: 'Hello! How can I assist you today?',
failure_message: 'Please wait a moment.',
}
Choose the right voice for your application by exploring the voice libraries:
Adjust VAD settings to optimize conversation flow:
vad: {
silence_duration_ms: 600, // How long to wait after silence to end turn
speech_duration_ms: 10000, // Maximum duration for a single speech segment
threshold: 0.6, // Speech detection sensitivity
interrupt_duration_ms: 200, // How quickly interruptions are detected
prefix_padding_ms: 400, // Audio padding at the beginning of speech
}
Here’s a complete list of environment variables for your .env
file:
# Agora Configuration
AGORA_APP_ID=
AGORA_APP_CERTIFICATE=
AGORA_CUSTOMER_ID=
AGORA_CUSTOMER_SECRET=
AGORA_CONVO_AI_BASE_URL=https://api.agora.io/api/conversational-ai-agent/v2/projects
AGENT_UID=
# LLM Configuration
LLM_MODEL=
LLM_URL=
LLM_TOKEN=
# Text-to-Speech Configuration
TTS_VENDOR=microsoft # Supported vendors: microsoft, elevenlabs
# Microsoft Azure TTS Configuration
MICROSOFT_TTS_KEY=
MICROSOFT_TTS_REGION=
MICROSOFT_TTS_VOICE_NAME=en-US-AndrewMultilingualNeural
MICROSOFT_TTS_RATE=1.0 # Range: 0.5 to 2.0
MICROSOFT_TTS_VOLUME=100.0 # Range: 0.0 to 100.0
# ElevenLabs TTS Configuration
ELEVENLABS_API_KEY=
ELEVENLABS_VOICE_ID=
ELEVENLABS_MODEL_ID=eleven_flash_v2_5
# Modalities Configuration
INPUT_MODALITIES=text
OUTPUT_MODALITIES=text,audio
# Server Configuration
PORT=3000
Congratulations! You’ve built an Express server that integrates with Agora’s Conversational AI Engine. Take this microservice and integrate it with your existing Agora backends.
For more information about Agora’s Convesational AI Engine check out the official documenation.
Happy building!