Skip to main content

Channel Response Format Specification

Overview

This document describes how AI responses are formatted and rendered across different communication channels (REST API, WebSocket, WhatsApp, Microsoft Teams, and Voice/SIP). Each channel has unique capabilities and constraints that determine how content is transformed from the AI engine output to the final user-facing format.

Architecture Overview

┌─────────────────────────────────────────────────────────┐
│ LangGraph AI Engine (Channel Neutral) │
│ Returns: AIMessage with content │
└──────────────────┬──────────────────────────────────────┘


┌─────────────────────────────────────────────────────────┐
│ Message Extraction Layer │
│ Extract last AIMessage from state │
│ Handle string/list content formats │
└──────────────────┬──────────────────────────────────────┘

┌──────────┴──────────┐
▼ ▼
┌──────────────┐ ┌──────────────────┐
│ WebSocket │ │ Other Channels │
│ Parser │ │ (Plain Text) │
└──────┬───────┘ └────────┬─────────┘
│ │
▼ ▼
┌──────────────┐ ┌──────────────────┐
│ Structured │ │ Plain Text │
│ Content │ │ Response │
│ Blocks │ │ │
└──────┬───────┘ └────────┬─────────┘
│ │
▼ ▼
┌──────────────────────────────────────┐
│ Channel-Specific Delivery │
│ - WebSocket: JSON with cards │
│ - REST: Simple JSON response │
│ - Teams: Bot Framework Activity │
│ - WhatsApp: Meta Graph API │
│ - Voice: TTS Audio Stream │
└──────────────────────────────────────┘

Channel Capability Matrix

ChannelTextMarkdownImagesCardsButtonsAudioStreamingTyping Indicator
REST API✅ Plain✅ SSE
WebSocket✅ Rich✅ Full✅ In Cards✅ Product✅ Actions✅ Native✅ Duration
Teams✅ Plain✅ API
WhatsApp✅ Plain✅ Read Receipt
Voice/SIP✅ Spoken✅ TTS✅ Audio✅ State

1. REST API Channel

Endpoint

POST /api/v1/chat

Input Format

{
"user_input": "What products do you have?",
"conversation_history": [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi! How can I help?"}
]
}

Output Format

{
"response": "We have several products including Product A, Product B, and Product C."
}

Response Processing

File: /core/app/api/v1/chat.py (lines 308-320)

  1. Extract last AIMessage from LangGraph state
  2. Handle both string and list-based content
  3. Convert to plain text string
  4. Return simple JSON response

Limitations

  • No markdown rendering: Output is plain text only
  • No structured content: Cards, images, buttons not supported
  • No formatting: All content flattened to string

Use Cases

  • Simple Q&A interactions
  • Mobile apps with custom rendering
  • Third-party integrations
  • Webhook consumers

2. REST API Streaming Channel

Endpoint

POST /api/v1/chat/stream

Transport

Server-Sent Events (SSE) with text/event-stream content type

Output Format

data: {"chunk": "We have ", "done": false}

data: {"chunk": "several products ", "done": false}

data: {"chunk": "including Product A.", "done": false}

data: {"chunk": "", "done": true, "full_response": "We have several products including Product A.", "conversation_id": "uuid"}

Streaming Logic

File: /core/app/api/v1/chat.py (lines 499-541)

  1. Stream incremental chunks during graph execution
  2. Track accumulated content to avoid duplicates
  3. Send only new text portions
  4. Final message includes done: true and full response

Benefits

  • Low latency: User sees response as it's generated
  • Better UX: Progressive loading instead of waiting
  • Efficient: Incremental delivery reduces perceived wait time

Use Cases

  • Real-time chat interfaces
  • Long-running queries
  • Interactive conversations

3. WebSocket Channel

Endpoint

WS /api/v1/agent-flows/execute-ws/{flow_id}

Input Format

{
"type": "user_message",
"content": "Show me your laptops"
}

Output Format

{
"response": "Here are our top laptops:\n\n1. **Dell XPS 13**\n ![Dell XPS](url)\n Price: $999\n [Buy Now](url)",
"content": [
{
"type": "text",
"text": "Here are our top laptops:",
"markdown": true,
"typing_duration_ms": 1200
},
{
"type": "cards",
"layout": "carousel",
"cards": [
{
"id": "dell_xps_13",
"title": "Dell XPS 13",
"images": ["url"],
"description": "Premium ultrabook",
"Price": {"raw": "$999"},
"Stock": "In Stock",
"actions": [
{
"text": "Buy Now",
"link": "url",
"type": "button"
}
]
}
]
}
],
"execution_path": [],
"tools_called": [],
"node_outputs": {},
"turn_messages": [],
"done": true
}

Response Processing

File: /core/app/api/v1/agent_flows_ws.py (lines 155-203)

  1. Extract AI response text
  2. Parse markdown into structured content blocks:
    • Use parse_markdown_to_content_blocks() utility
    • Detect numbered lists as product cards
    • Extract images, prices, stock, action buttons
  3. Send JSON with both raw text and parsed content

Structured Content Parsing

File: /core/app/utils/message_parser_optimized.py

Card Detection Indicators

  • Images: ![alt](url) syntax
  • Price: Contains $, , £, ¥, or "price:"
  • Product ID: Contains "product id:" or "product_id:"
  • Stock: Contains "stock:" or "in stock"
  • Action Links: Markdown links [text](url)

Card Structure

{
"id": "unique_hash", # Generated from title
"title": "Product Name",
"images": ["url1", "url2"], # All images in item
"description": "...", # Non-metadata text
"Price": {"raw": "$999"}, # Detected price
"Stock": "50 units", # Stock info
"actions": [
{
"text": "Buy Now",
"link": "url",
"type": "button" # or "link"
}
]
}

Action Button Detection

ACTION_BUTTON_TEXTS = [
"buy now", "buy", "purchase", "add to cart",
"view product", "shop now", "get it now", "order now"
]

Links with these texts are classified as type: "button", others as type: "link".

Content Block Types

Text Block

{
"type": "text",
"text": "Your message here",
"markdown": true,
"typing_duration_ms": 2000
}

Typing Duration: Calculated as 50ms * word_count for animation effect

Cards Block

{
"type": "cards",
"layout": "carousel",
"cards": [...]
}

Use Cases

  • Interactive web chat widgets
  • E-commerce product displays
  • Rich media presentations
  • Real-time collaborative apps

4. Microsoft Teams Channel

Webhook Endpoint

POST /api/v1/webhooks/channels/teams/messages

Adapter

File: /core/app/services/channels/teams_adapter.py

Uses Microsoft Bot Framework v3 API

Input Format

Bot Framework Activity:

{
"type": "message",
"id": "activity-id",
"from": {
"id": "user-aad-id",
"name": "John Doe"
},
"conversation": {
"id": "conversation-id"
},
"text": "What's the weather?"
}

Output Format

Plain text only (Bot Framework limitation):

{
"type": "message",
"text": "The weather is sunny today.",
"replyToId": "activity-id"
}

Response Processing

File: /core/app/api/v1/teams_channel.py (lines 356-421)

  1. Extract last AIMessage from LangGraph
  2. Handle list-based content by joining text parts
  3. Send plain text via Bot Framework API
  4. Include replyToId for threading

Adapter Methods

MethodPurposeInput
send_message()Send text messageservice_url, conversation_id, text, reply_to_id
send_typing_indicator()Show "bot is typing"service_url, conversation_id
update_message()Edit existing messageservice_url, conversation_id, activity_id, text
delete_message()Remove messageservice_url, conversation_id, activity_id

Authentication

OAuth 2.0 with Bot Framework:

  • Token URL: https://login.microsoftonline.com/{tenant}/oauth2/v2.0/token
  • Scope: https://api.botframework.com/.default
  • Token Caching: 5-minute buffer before expiration

Limitations

  • Plain text only: No markdown, HTML, or rich formatting
  • No images: Cannot display inline images
  • No cards: Bot Framework cards not implemented
  • No buttons: Action buttons not supported

Use Cases

  • Enterprise chat integration
  • Internal helpdesk bots
  • Team collaboration tools

5. WhatsApp Channel

Webhook Endpoint

POST /api/v1/webhooks/whatsapp

Adapter

File: /core/app/services/channels/whatsapp/adapter.py

Uses Meta Graph API v18.0

Input Format

Meta webhook payload:

{
"entry": [{
"changes": [{
"value": {
"messages": [{
"from": "1234567890",
"id": "wamid.xxx",
"text": {"body": "Hello"},
"type": "text"
}]
}
}]
}]
}

Output Format

Meta Graph API message:

{
"messaging_product": "whatsapp",
"recipient_type": "individual",
"to": "1234567890",
"type": "text",
"text": {
"preview_url": false,
"body": "Hello! How can I help you?"
}
}

Response Processing

File: /core/app/api/v1/twilio_whatsapp_webhook.py (lines 115-125)

  1. Extract last AIMessage from LangGraph
  2. Parse and Format:
    • Use parse_whatsapp_message() from whatsapp_message_parser.py
    • Cleans markdown, formatting headers/lists/bold
    • Splits long text into multiple short message bubbles
    • Formats links and removes images
  3. Send each bubble as a separate TwiML message

Adapter Methods

MethodPurposeInput
send_message()Send text to WhatsApp userto (phone), text
send_read_receipt()Mark message as readmessage_id
resp.message(body)Add message bubble to TwiML responsetext body

API Configuration

  • Base URL: https://graph.facebook.com/v18.0/{phone_number_id}
  • Auth: Bearer token from WhatsApp Business Account
  • Rate Limits: Per Meta's Cloud API limits

Limitations

  • Plain text only: No HTML or complex rich text (basic formatting sanitized)
  • No images: Images are stripped from responses
  • No cards: Structured content converted to text lists
  • No buttons: Links provided as text
  • Markdown stripped: Converted to plain text for mobile readability (e.g. **bold** -> bold)

Use Cases

  • Customer support
  • Order notifications
  • Appointment reminders
  • Marketing campaigns

6. Voice/SIP Channel (LiveKit)

Architecture

File: /core/livekit/agent.py

Multi-stage voice pipeline:

  1. STT (Speech-to-Text): Deepgram or OpenAI Whisper
  2. AI Processing: LangGraph with voice-optimized prompts
  3. TTS (Text-to-Speech): OpenAI or Deepgram Aura
  4. Transcript Delivery: WebRTC data channel

Voice-Specific Prompt Engineering

Lines 249-293 in agent.py

CRITICAL RULES FOR VOICE CONVERSATION:

1. LANGUAGE MATCHING: Respond in the user's language

2. VOICE OUTPUT FORMAT:
- NO JSON objects or structured data
- Natural conversational language only
- Pretend you're on a phone call

3. BE CONCISE FOR VOICE:
- SHORT and CONVERSATIONAL
- NO numbered lists (1, 2, 3...)
- NO bullet points read aloud
- Summarize information naturally
- For products: mention 2-3 options briefly

4. ROUTING: Make silently (don't mention agent names)

Input Format

Audio stream → STT → Text:

User: "What laptops do you have?"

Output Format

Text → TTS → Audio stream:

"We have several great laptops available.
Our top picks are the Dell XPS 13 for $999
and the MacBook Air for $1199.
Would you like more details on either?"

Response Processing

  1. LangGraph produces text response (same as other channels)
  2. Voice prompt removes all formatting:
    • No JSON, markdown, or lists
    • Converts to natural speech
  3. TTS synthesizes audio
  4. Transcripts sent to frontend via data channel

Transcript Format

{
"type": "transcript",
"role": "user",
"content": "What laptops do you have?",
"is_final": true,
"timestamp": "2024-01-15T10:30:00Z"
}

Transcript Handling

Lines 140-176 in agent.py

  • Message deduplication: Cache last 5 seconds to prevent duplicates
  • Interim vs. Final: Interim transcripts for live feedback, final for history
  • WebRTC data channel: Sends transcripts to frontend in real-time

STT/TTS Configuration

Speech-to-Text Options

  • Deepgram Nova 2: High accuracy, low latency
  • OpenAI Whisper: Multi-language support

Text-to-Speech Options

  • OpenAI TTS: Natural voices (alloy, echo, fable, etc.)
  • Deepgram Aura: Fast streaming synthesis

Limitations

  • Audio only: No visual content whatsoever
  • No structured data: All JSON/cards removed
  • Conversational constraints: Must sound natural when spoken
  • Latency sensitive: Optimized for real-time interaction

Use Cases

  • Phone support systems
  • Voice assistants
  • IVR (Interactive Voice Response)
  • Accessibility applications

Message Extraction Utilities

Common Functions

File: /core/app/utils/message_utils.py

extract_response_text(messages)

Finds last AIMessage with content:

for msg in reversed(messages):
if isinstance(msg, AIMessage):
if msg.content or not (hasattr(msg, "tool_calls") and msg.tool_calls):
return extract_content_text(msg.content)
return ""

extract_content_text(content)

Handles both string and list-based content:

if isinstance(content, list):
text_parts = [
str(item) if not isinstance(item, dict) else item.get("text", "")
for item in content
]
return "".join(text_parts)
else:
return str(content) if content else ""

extract_tool_calls(messages)

Extracts all tool invocations:

tool_calls = []
for msg in messages:
if isinstance(msg, AIMessage) and hasattr(msg, "tool_calls"):
for tc in msg.tool_calls:
tool_calls.append({
"name": tc.get("name"),
"args": tc.get("args", {})
})
return tool_calls

Response Transformation Pipeline

Stage 1: LangGraph Output

All channels receive the same structured output from LangGraph:

final_state = {
"messages": [
HumanMessage(content="What products do you have?"),
AIMessage(content="We have Product A, Product B, Product C")
],
"current_node": "response_node",
"execution_path": [...],
"context": {...}
}

Stage 2: Message Extraction

Extract last AIMessage:

result_messages = final_state.get("messages", [])
for msg in reversed(result_messages):
if isinstance(msg, AIMessage):
if msg.content:
final_message = msg
break

Stage 3: Content Parsing (Channel-Specific)

WebSocket: Parse to Structured Blocks

from app.utils.message_parser_optimized import parse_markdown_to_content_blocks
parsed_content = parse_markdown_to_content_blocks(response_text)

Other Channels: Plain Text

response_text = extract_content_text(final_message.content)

Stage 4: Channel Delivery

REST API

return QueryResponse(response=response_text)

WebSocket

await websocket.send_json({
"response": response_text,
"content": parsed_content,
"done": True
})

Teams

await teams_adapter.send_message(
service_url=service_url,
conversation_id=conversation_id,
text=response_text,
reply_to_id=activity_id
)

WhatsApp

await whatsapp_adapter.send_message(
to=phone_number,
text=response_text
)

Voice

# TTS handled by LiveKit session
# Transcript sent via data channel
await send_transcript("assistant", response_text, is_final=True)

Format Conversion Best Practices

1. Sanitization

File: /core/app/utils/sanitizers.py

Always sanitize user input before processing:

from app.utils.sanitizers import sanitize_prompt_input

clean_input = sanitize_prompt_input(user_input, strict=False)

2. Content Type Detection

Check message content type before extraction:

if isinstance(msg.content, list):
# Handle list-based content
pass
elif isinstance(msg.content, str):
# Handle string content
pass

3. Voice-Specific Formatting

For voice channels, inject special instructions:

if channel == "voice":
system_message = """
Respond in natural conversational language.
NO numbered lists, bullet points, or structured data.
Keep responses SHORT and CONVERSATIONAL.
"""

4. WebSocket Card Detection

Only parse markdown for WebSocket:

if channel == "websocket":
parsed_content = parse_markdown_to_content_blocks(response_text)
else:
parsed_content = None # Other channels use plain text

Error Handling

Common Error Scenarios

1. Empty Response

if not response_text or response_text.strip() == "":
response_text = "I apologize, but I couldn't generate a response."

2. Malformed Content

try:
parsed_content = parse_markdown_to_content_blocks(response_text)
except Exception as e:
logger.error(f"Failed to parse markdown: {e}")
parsed_content = [{"type": "text", "text": response_text, "markdown": False}]

3. Channel Delivery Failure

try:
await adapter.send_message(to=recipient, text=response_text)
except Exception as e:
logger.error(f"Failed to send message: {e}")
# Store in database for retry
await save_failed_message(recipient, response_text, error=str(e))

Performance Considerations

1. WebSocket Parsing Overhead

Markdown parsing adds ~50-100ms latency:

  • Impact: Minimal for most use cases
  • Optimization: Cached regex patterns via frozenset

2. Voice Latency

Total voice pipeline: ~300-500ms

  • STT: 100-200ms
  • LLM: 100-200ms
  • TTS: 100-200ms
  • Optimization: Streaming TTS for faster TTFB

3. Teams/WhatsApp API Limits

External API rate limits:

  • Teams: ~60 requests/minute per bot
  • WhatsApp: Varies by tier (1K-10K/day)
  • Mitigation: Queue messages, implement backoff

Testing Response Formats

Unit Tests

Test Message Extraction

def test_extract_response_text():
messages = [
HumanMessage(content="Hello"),
AIMessage(content="Hi there!")
]
result = extract_response_text(messages)
assert result == "Hi there!"

Test Markdown Parsing

def test_parse_markdown_to_cards():
markdown = """
1. **Product A**
![Image](url)
Price: $99
[Buy Now](url)
"""
content = parse_markdown_to_content_blocks(markdown)
assert content[0]["type"] == "cards"
assert len(content[0]["cards"]) == 1

Integration Tests

Test WebSocket Response

async def test_websocket_structured_response():
async with websocket_client("/agent-flows/execute-ws/flow-id") as ws:
await ws.send_json({"type": "user_message", "content": "Show products"})
response = await ws.receive_json()
assert "content" in response
assert isinstance(response["content"], list)

Test Teams Delivery

async def test_teams_send_message():
response = await teams_adapter.send_message(
service_url="https://smba.trafficmanager.net/...",
conversation_id="conv-id",
text="Test message"
)
assert response.status_code == 200

Future Enhancements

1. Adaptive Cards for Teams

Implement Microsoft Adaptive Cards for rich formatting:

{
"type": "AdaptiveCard",
"body": [
{"type": "TextBlock", "text": "Product A"},
{"type": "Image", "url": "..."}
],
"actions": [
{"type": "Action.OpenUrl", "url": "...", "title": "Buy"}
]
}

2. WhatsApp Interactive Messages

Add support for buttons and lists:

{
"type": "interactive",
"interactive": {
"type": "button",
"body": {"text": "Choose an option:"},
"action": {
"buttons": [
{"type": "reply", "reply": {"id": "1", "title": "Option 1"}}
]
}
}
}

3. Rich Media for REST API

Add optional structured output:

{
"response": "We have laptops",
"structured_content": {
"cards": [...]
}
}

4. Voice Emotion Detection

Analyze user tone and adjust response style:

if user_emotion == "frustrated":
response_style = "empathetic_and_calm"

Summary

The chatbot implements channel-specific response formatting with:

  1. WebSocket: Rich structured content with markdown parsing and product cards
  2. REST API: Simple plain text with optional streaming
  3. Teams: Plain text via Bot Framework with typing indicators
  4. WhatsApp: Plain text via Meta Graph API
  5. Voice: Natural conversational speech with all formatting removed

All channels share the same AI engine (LangGraph) but transform output according to each platform's capabilities and constraints. WebSocket is the only channel with full structured content support, while voice channels use specialized prompt engineering to produce natural speech output.