Channel Response Format Specification
Overview
This document describes how AI responses are formatted and rendered across different communication channels (REST API, WebSocket, WhatsApp, Microsoft Teams, and Voice/SIP). Each channel has unique capabilities and constraints that determine how content is transformed from the AI engine output to the final user-facing format.
Architecture Overview
┌─────────────────────────────────────────────────────────┐
│ LangGraph AI Engine (Channel Neutral) │
│ Returns: AIMessage with content │
└──────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ Message Extraction Layer │
│ Extract last AIMessage from state │
│ Handle string/list content formats │
└──────────────────┬──────────────────────────────────────┘
│
┌──────────┴──────────┐
▼ ▼
┌──────────────┐ ┌──────────────────┐
│ WebSocket │ │ Other Channels │
│ Parser │ │ (Plain Text) │
└──────┬───────┘ └────────┬─────────┘
│ │
▼ ▼
┌──────────────┐ ┌──────────────────┐
│ Structured │ │ Plain Text │
│ Content │ │ Response │
│ Blocks │ │ │
└──────┬───────┘ └────────┬─────────┘
│ │
▼ ▼
┌──────────────────────────────────────┐
│ Channel-Specific Delivery │
│ - WebSocket: JSON with cards │
│ - REST: Simple JSON response │
│ - Teams: Bot Framework Activity │
│ - WhatsApp: Meta Graph API │
│ - Voice: TTS Audio Stream │
└──────────────────────────────────────┘
Channel Capability Matrix
| Channel | Text | Markdown | Images | Cards | Buttons | Audio | Streaming | Typing Indicator |
|---|---|---|---|---|---|---|---|---|
| REST API | ✅ Plain | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ SSE | ❌ |
| WebSocket | ✅ Rich | ✅ Full | ✅ In Cards | ✅ Product | ✅ Actions | ❌ | ✅ Native | ✅ Duration |
| Teams | ✅ Plain | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ API |
| ✅ Plain | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ Read Receipt | |
| Voice/SIP | ✅ Spoken | ❌ | ❌ | ❌ | ❌ | ✅ TTS | ✅ Audio | ✅ State |
1. REST API Channel
Endpoint
POST /api/v1/chat
Input Format
{
"user_input": "What products do you have?",
"conversation_history": [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi! How can I help?"}
]
}
Output Format
{
"response": "We have several products including Product A, Product B, and Product C."
}
Response Processing
File: /core/app/api/v1/chat.py (lines 308-320)
- Extract last
AIMessagefrom LangGraph state - Handle both string and list-based content
- Convert to plain text string
- Return simple JSON response
Limitations
- No markdown rendering: Output is plain text only
- No structured content: Cards, images, buttons not supported
- No formatting: All content flattened to string
Use Cases
- Simple Q&A interactions
- Mobile apps with custom rendering
- Third-party integrations
- Webhook consumers
2. REST API Streaming Channel
Endpoint
POST /api/v1/chat/stream
Transport
Server-Sent Events (SSE) with text/event-stream content type
Output Format
data: {"chunk": "We have ", "done": false}
data: {"chunk": "several products ", "done": false}
data: {"chunk": "including Product A.", "done": false}
data: {"chunk": "", "done": true, "full_response": "We have several products including Product A.", "conversation_id": "uuid"}
Streaming Logic
File: /core/app/api/v1/chat.py (lines 499-541)
- Stream incremental chunks during graph execution
- Track accumulated content to avoid duplicates
- Send only new text portions
- Final message includes
done: trueand full response
Benefits
- Low latency: User sees response as it's generated
- Better UX: Progressive loading instead of waiting
- Efficient: Incremental delivery reduces perceived wait time
Use Cases
- Real-time chat interfaces
- Long-running queries
- Interactive conversations
3. WebSocket Channel
Endpoint
WS /api/v1/agent-flows/execute-ws/{flow_id}
Input Format
{
"type": "user_message",
"content": "Show me your laptops"
}
Output Format
{
"response": "Here are our top laptops:\n\n1. **Dell XPS 13**\n \n Price: $999\n [Buy Now](url)",
"content": [
{
"type": "text",
"text": "Here are our top laptops:",
"markdown": true,
"typing_duration_ms": 1200
},
{
"type": "cards",
"layout": "carousel",
"cards": [
{
"id": "dell_xps_13",
"title": "Dell XPS 13",
"images": ["url"],
"description": "Premium ultrabook",
"Price": {"raw": "$999"},
"Stock": "In Stock",
"actions": [
{
"text": "Buy Now",
"link": "url",
"type": "button"
}
]
}
]
}
],
"execution_path": [],
"tools_called": [],
"node_outputs": {},
"turn_messages": [],
"done": true
}
Response Processing
File: /core/app/api/v1/agent_flows_ws.py (lines 155-203)
- Extract AI response text
- Parse markdown into structured content blocks:
- Use
parse_markdown_to_content_blocks()utility - Detect numbered lists as product cards
- Extract images, prices, stock, action buttons
- Use
- Send JSON with both raw text and parsed content
Structured Content Parsing
File: /core/app/utils/message_parser_optimized.py
Card Detection Indicators
- Images:
syntax - Price: Contains
$,€,£,¥, or "price:" - Product ID: Contains "product id:" or "product_id:"
- Stock: Contains "stock:" or "in stock"
- Action Links: Markdown links
[text](url)
Card Structure
{
"id": "unique_hash", # Generated from title
"title": "Product Name",
"images": ["url1", "url2"], # All images in item
"description": "...", # Non-metadata text
"Price": {"raw": "$999"}, # Detected price
"Stock": "50 units", # Stock info
"actions": [
{
"text": "Buy Now",
"link": "url",
"type": "button" # or "link"
}
]
}
Action Button Detection
ACTION_BUTTON_TEXTS = [
"buy now", "buy", "purchase", "add to cart",
"view product", "shop now", "get it now", "order now"
]
Links with these texts are classified as type: "button", others as type: "link".
Content Block Types
Text Block
{
"type": "text",
"text": "Your message here",
"markdown": true,
"typing_duration_ms": 2000
}
Typing Duration: Calculated as 50ms * word_count for animation effect
Cards Block
{
"type": "cards",
"layout": "carousel",
"cards": [...]
}
Use Cases
- Interactive web chat widgets
- E-commerce product displays
- Rich media presentations
- Real-time collaborative apps
4. Microsoft Teams Channel
Webhook Endpoint
POST /api/v1/webhooks/channels/teams/messages
Adapter
File: /core/app/services/channels/teams_adapter.py
Uses Microsoft Bot Framework v3 API
Input Format
Bot Framework Activity:
{
"type": "message",
"id": "activity-id",
"from": {
"id": "user-aad-id",
"name": "John Doe"
},
"conversation": {
"id": "conversation-id"
},
"text": "What's the weather?"
}
Output Format
Plain text only (Bot Framework limitation):
{
"type": "message",
"text": "The weather is sunny today.",
"replyToId": "activity-id"
}
Response Processing
File: /core/app/api/v1/teams_channel.py (lines 356-421)
- Extract last
AIMessagefrom LangGraph - Handle list-based content by joining text parts
- Send plain text via Bot Framework API
- Include
replyToIdfor threading
Adapter Methods
| Method | Purpose | Input |
|---|---|---|
send_message() | Send text message | service_url, conversation_id, text, reply_to_id |
send_typing_indicator() | Show "bot is typing" | service_url, conversation_id |
update_message() | Edit existing message | service_url, conversation_id, activity_id, text |
delete_message() | Remove message | service_url, conversation_id, activity_id |
Authentication
OAuth 2.0 with Bot Framework:
- Token URL:
https://login.microsoftonline.com/{tenant}/oauth2/v2.0/token - Scope:
https://api.botframework.com/.default - Token Caching: 5-minute buffer before expiration
Limitations
- Plain text only: No markdown, HTML, or rich formatting
- No images: Cannot display inline images
- No cards: Bot Framework cards not implemented
- No buttons: Action buttons not supported
Use Cases
- Enterprise chat integration
- Internal helpdesk bots
- Team collaboration tools
5. WhatsApp Channel
Webhook Endpoint
POST /api/v1/webhooks/whatsapp
Adapter
File: /core/app/services/channels/whatsapp/adapter.py
Uses Meta Graph API v18.0
Input Format
Meta webhook payload:
{
"entry": [{
"changes": [{
"value": {
"messages": [{
"from": "1234567890",
"id": "wamid.xxx",
"text": {"body": "Hello"},
"type": "text"
}]
}
}]
}]
}
Output Format
Meta Graph API message:
{
"messaging_product": "whatsapp",
"recipient_type": "individual",
"to": "1234567890",
"type": "text",
"text": {
"preview_url": false,
"body": "Hello! How can I help you?"
}
}
Response Processing
File: /core/app/api/v1/twilio_whatsapp_webhook.py (lines 115-125)
- Extract last
AIMessagefrom LangGraph - Parse and Format:
- Use
parse_whatsapp_message()fromwhatsapp_message_parser.py - Cleans markdown, formatting headers/lists/bold
- Splits long text into multiple short message bubbles
- Formats links and removes images
- Use
- Send each bubble as a separate TwiML message
Adapter Methods
| Method | Purpose | Input |
|---|---|---|
send_message() | Send text to WhatsApp user | to (phone), text |
send_read_receipt() | Mark message as read | message_id |
resp.message(body) | Add message bubble to TwiML response | text body |
API Configuration
- Base URL:
https://graph.facebook.com/v18.0/{phone_number_id} - Auth: Bearer token from WhatsApp Business Account
- Rate Limits: Per Meta's Cloud API limits
Limitations
- Plain text only: No HTML or complex rich text (basic formatting sanitized)
- No images: Images are stripped from responses
- No cards: Structured content converted to text lists
- No buttons: Links provided as text
- Markdown stripped: Converted to plain text for mobile readability (e.g.
**bold**->bold)
Use Cases
- Customer support
- Order notifications
- Appointment reminders
- Marketing campaigns
6. Voice/SIP Channel (LiveKit)
Architecture
File: /core/livekit/agent.py
Multi-stage voice pipeline:
- STT (Speech-to-Text): Deepgram or OpenAI Whisper
- AI Processing: LangGraph with voice-optimized prompts
- TTS (Text-to-Speech): OpenAI or Deepgram Aura
- Transcript Delivery: WebRTC data channel
Voice-Specific Prompt Engineering
Lines 249-293 in agent.py
CRITICAL RULES FOR VOICE CONVERSATION:
1. LANGUAGE MATCHING: Respond in the user's language
2. VOICE OUTPUT FORMAT:
- NO JSON objects or structured data
- Natural conversational language only
- Pretend you're on a phone call
3. BE CONCISE FOR VOICE:
- SHORT and CONVERSATIONAL
- NO numbered lists (1, 2, 3...)
- NO bullet points read aloud
- Summarize information naturally
- For products: mention 2-3 options briefly
4. ROUTING: Make silently (don't mention agent names)
Input Format
Audio stream → STT → Text:
User: "What laptops do you have?"
Output Format
Text → TTS → Audio stream:
"We have several great laptops available.
Our top picks are the Dell XPS 13 for $999
and the MacBook Air for $1199.
Would you like more details on either?"
Response Processing
- LangGraph produces text response (same as other channels)
- Voice prompt removes all formatting:
- No JSON, markdown, or lists
- Converts to natural speech
- TTS synthesizes audio
- Transcripts sent to frontend via data channel
Transcript Format
{
"type": "transcript",
"role": "user",
"content": "What laptops do you have?",
"is_final": true,
"timestamp": "2024-01-15T10:30:00Z"
}
Transcript Handling
Lines 140-176 in agent.py
- Message deduplication: Cache last 5 seconds to prevent duplicates
- Interim vs. Final: Interim transcripts for live feedback, final for history
- WebRTC data channel: Sends transcripts to frontend in real-time
STT/TTS Configuration
Speech-to-Text Options
- Deepgram Nova 2: High accuracy, low latency
- OpenAI Whisper: Multi-language support
Text-to-Speech Options
- OpenAI TTS: Natural voices (alloy, echo, fable, etc.)
- Deepgram Aura: Fast streaming synthesis
Limitations
- Audio only: No visual content whatsoever
- No structured data: All JSON/cards removed
- Conversational constraints: Must sound natural when spoken
- Latency sensitive: Optimized for real-time interaction
Use Cases
- Phone support systems
- Voice assistants
- IVR (Interactive Voice Response)
- Accessibility applications
Message Extraction Utilities
Common Functions
File: /core/app/utils/message_utils.py
extract_response_text(messages)
Finds last AIMessage with content:
for msg in reversed(messages):
if isinstance(msg, AIMessage):
if msg.content or not (hasattr(msg, "tool_calls") and msg.tool_calls):
return extract_content_text(msg.content)
return ""
extract_content_text(content)
Handles both string and list-based content:
if isinstance(content, list):
text_parts = [
str(item) if not isinstance(item, dict) else item.get("text", "")
for item in content
]
return "".join(text_parts)
else:
return str(content) if content else ""
extract_tool_calls(messages)
Extracts all tool invocations:
tool_calls = []
for msg in messages:
if isinstance(msg, AIMessage) and hasattr(msg, "tool_calls"):
for tc in msg.tool_calls:
tool_calls.append({
"name": tc.get("name"),
"args": tc.get("args", {})
})
return tool_calls
Response Transformation Pipeline
Stage 1: LangGraph Output
All channels receive the same structured output from LangGraph:
final_state = {
"messages": [
HumanMessage(content="What products do you have?"),
AIMessage(content="We have Product A, Product B, Product C")
],
"current_node": "response_node",
"execution_path": [...],
"context": {...}
}
Stage 2: Message Extraction
Extract last AIMessage:
result_messages = final_state.get("messages", [])
for msg in reversed(result_messages):
if isinstance(msg, AIMessage):
if msg.content:
final_message = msg
break
Stage 3: Content Parsing (Channel-Specific)
WebSocket: Parse to Structured Blocks
from app.utils.message_parser_optimized import parse_markdown_to_content_blocks
parsed_content = parse_markdown_to_content_blocks(response_text)
Other Channels: Plain Text
response_text = extract_content_text(final_message.content)
Stage 4: Channel Delivery
REST API
return QueryResponse(response=response_text)
WebSocket
await websocket.send_json({
"response": response_text,
"content": parsed_content,
"done": True
})
Teams
await teams_adapter.send_message(
service_url=service_url,
conversation_id=conversation_id,
text=response_text,
reply_to_id=activity_id
)
WhatsApp
await whatsapp_adapter.send_message(
to=phone_number,
text=response_text
)
Voice
# TTS handled by LiveKit session
# Transcript sent via data channel
await send_transcript("assistant", response_text, is_final=True)
Format Conversion Best Practices
1. Sanitization
File: /core/app/utils/sanitizers.py
Always sanitize user input before processing:
from app.utils.sanitizers import sanitize_prompt_input
clean_input = sanitize_prompt_input(user_input, strict=False)
2. Content Type Detection
Check message content type before extraction:
if isinstance(msg.content, list):
# Handle list-based content
pass
elif isinstance(msg.content, str):
# Handle string content
pass
3. Voice-Specific Formatting
For voice channels, inject special instructions:
if channel == "voice":
system_message = """
Respond in natural conversational language.
NO numbered lists, bullet points, or structured data.
Keep responses SHORT and CONVERSATIONAL.
"""
4. WebSocket Card Detection
Only parse markdown for WebSocket:
if channel == "websocket":
parsed_content = parse_markdown_to_content_blocks(response_text)
else:
parsed_content = None # Other channels use plain text
Error Handling
Common Error Scenarios
1. Empty Response
if not response_text or response_text.strip() == "":
response_text = "I apologize, but I couldn't generate a response."
2. Malformed Content
try:
parsed_content = parse_markdown_to_content_blocks(response_text)
except Exception as e:
logger.error(f"Failed to parse markdown: {e}")
parsed_content = [{"type": "text", "text": response_text, "markdown": False}]
3. Channel Delivery Failure
try:
await adapter.send_message(to=recipient, text=response_text)
except Exception as e:
logger.error(f"Failed to send message: {e}")
# Store in database for retry
await save_failed_message(recipient, response_text, error=str(e))
Performance Considerations
1. WebSocket Parsing Overhead
Markdown parsing adds ~50-100ms latency:
- Impact: Minimal for most use cases
- Optimization: Cached regex patterns via
frozenset
2. Voice Latency
Total voice pipeline: ~300-500ms
- STT: 100-200ms
- LLM: 100-200ms
- TTS: 100-200ms
- Optimization: Streaming TTS for faster TTFB
3. Teams/WhatsApp API Limits
External API rate limits:
- Teams: ~60 requests/minute per bot
- WhatsApp: Varies by tier (1K-10K/day)
- Mitigation: Queue messages, implement backoff
Testing Response Formats
Unit Tests
Test Message Extraction
def test_extract_response_text():
messages = [
HumanMessage(content="Hello"),
AIMessage(content="Hi there!")
]
result = extract_response_text(messages)
assert result == "Hi there!"
Test Markdown Parsing
def test_parse_markdown_to_cards():
markdown = """
1. **Product A**

Price: $99
[Buy Now](url)
"""
content = parse_markdown_to_content_blocks(markdown)
assert content[0]["type"] == "cards"
assert len(content[0]["cards"]) == 1
Integration Tests
Test WebSocket Response
async def test_websocket_structured_response():
async with websocket_client("/agent-flows/execute-ws/flow-id") as ws:
await ws.send_json({"type": "user_message", "content": "Show products"})
response = await ws.receive_json()
assert "content" in response
assert isinstance(response["content"], list)
Test Teams Delivery
async def test_teams_send_message():
response = await teams_adapter.send_message(
service_url="https://smba.trafficmanager.net/...",
conversation_id="conv-id",
text="Test message"
)
assert response.status_code == 200
Future Enhancements
1. Adaptive Cards for Teams
Implement Microsoft Adaptive Cards for rich formatting:
{
"type": "AdaptiveCard",
"body": [
{"type": "TextBlock", "text": "Product A"},
{"type": "Image", "url": "..."}
],
"actions": [
{"type": "Action.OpenUrl", "url": "...", "title": "Buy"}
]
}
2. WhatsApp Interactive Messages
Add support for buttons and lists:
{
"type": "interactive",
"interactive": {
"type": "button",
"body": {"text": "Choose an option:"},
"action": {
"buttons": [
{"type": "reply", "reply": {"id": "1", "title": "Option 1"}}
]
}
}
}
3. Rich Media for REST API
Add optional structured output:
{
"response": "We have laptops",
"structured_content": {
"cards": [...]
}
}
4. Voice Emotion Detection
Analyze user tone and adjust response style:
if user_emotion == "frustrated":
response_style = "empathetic_and_calm"
Summary
The chatbot implements channel-specific response formatting with:
- WebSocket: Rich structured content with markdown parsing and product cards
- REST API: Simple plain text with optional streaming
- Teams: Plain text via Bot Framework with typing indicators
- WhatsApp: Plain text via Meta Graph API
- Voice: Natural conversational speech with all formatting removed
All channels share the same AI engine (LangGraph) but transform output according to each platform's capabilities and constraints. WebSocket is the only channel with full structured content support, while voice channels use specialized prompt engineering to produce natural speech output.