Channel Response Format Specification

Overview

This document describes how AI responses are formatted and rendered across different communication channels (REST API, WebSocket, WhatsApp, Microsoft Teams, and Voice/SIP). Each channel has unique capabilities and constraints that determine how content is transformed from the AI engine output to the final user-facing format.

Architecture Overview

┌─────────────────────────────────────────────────────────┐
│         LangGraph AI Engine (Channel Neutral)           │
│         Returns: AIMessage with content                  │
└──────────────────┬──────────────────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────────────────┐
│         Message Extraction Layer                         │
│         Extract last AIMessage from state                │
│         Handle string/list content formats               │
└──────────────────┬──────────────────────────────────────┘
                   │
        ┌──────────┴──────────┐
        ▼                     ▼
┌──────────────┐      ┌──────────────────┐
│ WebSocket    │      │ Other Channels   │
│ Parser       │      │ (Plain Text)     │
└──────┬───────┘      └────────┬─────────┘
       │                       │
       ▼                       ▼
┌──────────────┐      ┌──────────────────┐
│ Structured   │      │ Plain Text       │
│ Content      │      │ Response         │
│ Blocks       │      │                  │
└──────┬───────┘      └────────┬─────────┘
       │                       │
       ▼                       ▼
┌──────────────────────────────────────┐
│   Channel-Specific Delivery          │
│   - WebSocket: JSON with cards       │
│   - REST: Simple JSON response       │
│   - Teams: Bot Framework Activity    │
│   - WhatsApp: Meta Graph API         │
│   - Voice: TTS Audio Stream          │
└──────────────────────────────────────┘

Channel Capability Matrix

Channel	Text	Markdown	Images	Cards	Buttons	Audio	Streaming	Typing Indicator
REST API	✅ Plain	❌	❌	❌	❌	❌	✅ SSE	❌
WebSocket	✅ Rich	✅ Full	✅ In Cards	✅ Product	✅ Actions	❌	✅ Native	✅ Duration
Teams	✅ Plain	❌	❌	❌	❌	❌	❌	✅ API
WhatsApp	✅ Plain	❌	❌	❌	❌	❌	❌	✅ Read Receipt
Voice/SIP	✅ Spoken	❌	❌	❌	❌	✅ TTS	✅ Audio	✅ State

1. REST API Channel

Endpoint

POST /api/v1/chat

Input Format

{
  "user_input": "What products do you have?",
  "conversation_history": [
    {"role": "user", "content": "Hello"},
    {"role": "assistant", "content": "Hi! How can I help?"}
  ]
}

Output Format

{
  "response": "We have several products including Product A, Product B, and Product C."
}

Response Processing

File: /core/app/api/v1/chat.py (lines 308-320)

Extract last AIMessage from LangGraph state
Handle both string and list-based content
Convert to plain text string
Return simple JSON response

Limitations

No markdown rendering: Output is plain text only
No structured content: Cards, images, buttons not supported
No formatting: All content flattened to string

Use Cases

Simple Q&A interactions
Mobile apps with custom rendering
Third-party integrations
Webhook consumers

2. REST API Streaming Channel

Endpoint

POST /api/v1/chat/stream

Transport

Server-Sent Events (SSE) with text/event-stream content type

Output Format

data: {"chunk": "We have ", "done": false}

data: {"chunk": "several products ", "done": false}

data: {"chunk": "including Product A.", "done": false}

data: {"chunk": "", "done": true, "full_response": "We have several products including Product A.", "conversation_id": "uuid"}

Streaming Logic

File: /core/app/api/v1/chat.py (lines 499-541)

Stream incremental chunks during graph execution
Track accumulated content to avoid duplicates
Send only new text portions
Final message includes done: true and full response

Benefits

Low latency: User sees response as it's generated
Better UX: Progressive loading instead of waiting
Efficient: Incremental delivery reduces perceived wait time

Use Cases

Real-time chat interfaces
Long-running queries
Interactive conversations

3. WebSocket Channel

Endpoint

WS /api/v1/agent-flows/execute-ws/{flow_id}

Input Format

{
  "type": "user_message",
  "content": "Show me your laptops"
}

Output Format

{
  "response": "Here are our top laptops:\n\n1. **Dell XPS 13**\n   ![Dell XPS](url)\n   Price: $999\n   [Buy Now](url)",
  "content": [
    {
      "type": "text",
      "text": "Here are our top laptops:",
      "markdown": true,
      "typing_duration_ms": 1200
    },
    {
      "type": "cards",
      "layout": "carousel",
      "cards": [
        {
          "id": "dell_xps_13",
          "title": "Dell XPS 13",
          "images": ["url"],
          "description": "Premium ultrabook",
          "Price": {"raw": "$999"},
          "Stock": "In Stock",
          "actions": [
            {
              "text": "Buy Now",
              "link": "url",
              "type": "button"
            }
          ]
        }
      ]
    }
  ],
  "execution_path": [],
  "tools_called": [],
  "node_outputs": {},
  "turn_messages": [],
  "done": true
}

Response Processing

File: /core/app/api/v1/agent_flows_ws.py (lines 155-203)

Extract AI response text
Parse markdown into structured content blocks:
- Use parse_markdown_to_content_blocks() utility
- Detect numbered lists as product cards
- Extract images, prices, stock, action buttons
Send JSON with both raw text and parsed content

Structured Content Parsing

File: /core/app/utils/message_parser_optimized.py

Card Detection Indicators

Images: ![alt](url) syntax
Price: Contains $, €, £, ¥, or "price:"
Product ID: Contains "product id:" or "product_id:"
Stock: Contains "stock:" or "in stock"
Action Links: Markdown links [text](url)

Card Structure

{
    "id": "unique_hash",          # Generated from title
    "title": "Product Name",
    "images": ["url1", "url2"],   # All images in item
    "description": "...",         # Non-metadata text
    "Price": {"raw": "$999"},     # Detected price
    "Stock": "50 units",          # Stock info
    "actions": [
        {
            "text": "Buy Now",
            "link": "url",
            "type": "button"       # or "link"
        }
    ]
}

Action Button Detection

ACTION_BUTTON_TEXTS = [
    "buy now", "buy", "purchase", "add to cart",
    "view product", "shop now", "get it now", "order now"
]

Links with these texts are classified as type: "button", others as type: "link".

Content Block Types

Text Block

{
  "type": "text",
  "text": "Your message here",
  "markdown": true,
  "typing_duration_ms": 2000
}

Typing Duration: Calculated as 50ms * word_count for animation effect

Cards Block

{
  "type": "cards",
  "layout": "carousel",
  "cards": [...]
}

Use Cases

Interactive web chat widgets
E-commerce product displays
Rich media presentations
Real-time collaborative apps

4. Microsoft Teams Channel

Webhook Endpoint

POST /api/v1/webhooks/channels/teams/messages

Adapter

File: /core/app/services/channels/teams_adapter.py

Uses Microsoft Bot Framework v3 API

Input Format

Bot Framework Activity:

{
  "type": "message",
  "id": "activity-id",
  "from": {
    "id": "user-aad-id",
    "name": "John Doe"
  },
  "conversation": {
    "id": "conversation-id"
  },
  "text": "What's the weather?"
}

Output Format

Plain text only (Bot Framework limitation):

{
  "type": "message",
  "text": "The weather is sunny today.",
  "replyToId": "activity-id"
}

Response Processing

File: /core/app/api/v1/teams_channel.py (lines 356-421)

Extract last AIMessage from LangGraph
Handle list-based content by joining text parts
Send plain text via Bot Framework API
Include replyToId for threading

Adapter Methods

Method	Purpose	Input
`send_message()`	Send text message	service_url, conversation_id, text, reply_to_id
`send_typing_indicator()`	Show "bot is typing"	service_url, conversation_id
`update_message()`	Edit existing message	service_url, conversation_id, activity_id, text
`delete_message()`	Remove message	service_url, conversation_id, activity_id

Authentication

OAuth 2.0 with Bot Framework:

Token URL: https://login.microsoftonline.com/{tenant}/oauth2/v2.0/token
Scope: https://api.botframework.com/.default
Token Caching: 5-minute buffer before expiration

Limitations

Plain text only: No markdown, HTML, or rich formatting
No images: Cannot display inline images
No cards: Bot Framework cards not implemented
No buttons: Action buttons not supported

Use Cases

Enterprise chat integration
Internal helpdesk bots
Team collaboration tools

5. WhatsApp Channel

Webhook Endpoint

POST /api/v1/webhooks/whatsapp

Adapter

File: /core/app/services/channels/whatsapp/adapter.py

Uses Meta Graph API v18.0

Input Format

Meta webhook payload:

{
  "entry": [{
    "changes": [{
      "value": {
        "messages": [{
          "from": "1234567890",
          "id": "wamid.xxx",
          "text": {"body": "Hello"},
          "type": "text"
        }]
      }
    }]
  }]
}

Output Format

Meta Graph API message:

{
  "messaging_product": "whatsapp",
  "recipient_type": "individual",
  "to": "1234567890",
  "type": "text",
  "text": {
    "preview_url": false,
    "body": "Hello! How can I help you?"
  }
}

Response Processing

File: /core/app/api/v1/twilio_whatsapp_webhook.py (lines 115-125)

Extract last AIMessage from LangGraph
Parse and Format:
- Use parse_whatsapp_message() from whatsapp_message_parser.py
- Cleans markdown, formatting headers/lists/bold
- Splits long text into multiple short message bubbles
- Formats links and removes images
Send each bubble as a separate TwiML message

Adapter Methods

Method	Purpose	Input
`send_message()`	Send text to WhatsApp user	to (phone), text
`send_read_receipt()`	Mark message as read	message_id
`resp.message(body)`	Add message bubble to TwiML response	text body

API Configuration

Base URL: https://graph.facebook.com/v18.0/{phone_number_id}
Auth: Bearer token from WhatsApp Business Account
Rate Limits: Per Meta's Cloud API limits

Limitations

Plain text only: No HTML or complex rich text (basic formatting sanitized)
No images: Images are stripped from responses
No cards: Structured content converted to text lists
No buttons: Links provided as text
Markdown stripped: Converted to plain text for mobile readability (e.g. **bold** -> bold)

Use Cases

Customer support
Order notifications
Appointment reminders
Marketing campaigns

6. Voice/SIP Channel (LiveKit)

Architecture

File: /core/livekit/agent.py

Multi-stage voice pipeline:

STT (Speech-to-Text): Deepgram or OpenAI Whisper
AI Processing: LangGraph with voice-optimized prompts
TTS (Text-to-Speech): OpenAI or Deepgram Aura
Transcript Delivery: WebRTC data channel

Voice-Specific Prompt Engineering

Lines 249-293 in agent.py

CRITICAL RULES FOR VOICE CONVERSATION:

1. LANGUAGE MATCHING: Respond in the user's language

2. VOICE OUTPUT FORMAT:
   - NO JSON objects or structured data
   - Natural conversational language only
   - Pretend you're on a phone call

3. BE CONCISE FOR VOICE:
   - SHORT and CONVERSATIONAL
   - NO numbered lists (1, 2, 3...)
   - NO bullet points read aloud
   - Summarize information naturally
   - For products: mention 2-3 options briefly

4. ROUTING: Make silently (don't mention agent names)

Input Format

Audio stream → STT → Text:

User: "What laptops do you have?"

Output Format

Text → TTS → Audio stream:

"We have several great laptops available.
Our top picks are the Dell XPS 13 for $999
and the MacBook Air for $1199.
Would you like more details on either?"

Response Processing

LangGraph produces text response (same as other channels)
Voice prompt removes all formatting:
- No JSON, markdown, or lists
- Converts to natural speech
TTS synthesizes audio
Transcripts sent to frontend via data channel

Transcript Format

{
  "type": "transcript",
  "role": "user",
  "content": "What laptops do you have?",
  "is_final": true,
  "timestamp": "2024-01-15T10:30:00Z"
}

Transcript Handling

Lines 140-176 in agent.py

Message deduplication: Cache last 5 seconds to prevent duplicates
Interim vs. Final: Interim transcripts for live feedback, final for history
WebRTC data channel: Sends transcripts to frontend in real-time

STT/TTS Configuration

Speech-to-Text Options

Deepgram Nova 2: High accuracy, low latency
OpenAI Whisper: Multi-language support

Text-to-Speech Options

OpenAI TTS: Natural voices (alloy, echo, fable, etc.)
Deepgram Aura: Fast streaming synthesis

Limitations

Audio only: No visual content whatsoever
No structured data: All JSON/cards removed
Conversational constraints: Must sound natural when spoken
Latency sensitive: Optimized for real-time interaction

Use Cases

Phone support systems
Voice assistants
IVR (Interactive Voice Response)
Accessibility applications

Message Extraction Utilities

Common Functions

File: /core/app/utils/message_utils.py

`extract_response_text(messages)`

Finds last AIMessage with content:

for msg in reversed(messages):
    if isinstance(msg, AIMessage):
        if msg.content or not (hasattr(msg, "tool_calls") and msg.tool_calls):
            return extract_content_text(msg.content)
return ""

`extract_content_text(content)`

Handles both string and list-based content:

if isinstance(content, list):
    text_parts = [
        str(item) if not isinstance(item, dict) else item.get("text", "")
        for item in content
    ]
    return "".join(text_parts)
else:
    return str(content) if content else ""

`extract_tool_calls(messages)`

Extracts all tool invocations:

tool_calls = []
for msg in messages:
    if isinstance(msg, AIMessage) and hasattr(msg, "tool_calls"):
        for tc in msg.tool_calls:
            tool_calls.append({
                "name": tc.get("name"),
                "args": tc.get("args", {})
            })
return tool_calls

Response Transformation Pipeline

Stage 1: LangGraph Output

All channels receive the same structured output from LangGraph:

final_state = {
    "messages": [
        HumanMessage(content="What products do you have?"),
        AIMessage(content="We have Product A, Product B, Product C")
    ],
    "current_node": "response_node",
    "execution_path": [...],
    "context": {...}
}

Stage 2: Message Extraction

Extract last AIMessage:

result_messages = final_state.get("messages", [])
for msg in reversed(result_messages):
    if isinstance(msg, AIMessage):
        if msg.content:
            final_message = msg
            break

Stage 3: Content Parsing (Channel-Specific)

WebSocket: Parse to Structured Blocks

from app.utils.message_parser_optimized import parse_markdown_to_content_blocks
parsed_content = parse_markdown_to_content_blocks(response_text)

Other Channels: Plain Text

response_text = extract_content_text(final_message.content)

Stage 4: Channel Delivery

REST API

return QueryResponse(response=response_text)

WebSocket

await websocket.send_json({
    "response": response_text,
    "content": parsed_content,
    "done": True
})

Teams

await teams_adapter.send_message(
    service_url=service_url,
    conversation_id=conversation_id,
    text=response_text,
    reply_to_id=activity_id
)

await whatsapp_adapter.send_message(
    to=phone_number,
    text=response_text
)

Voice

# TTS handled by LiveKit session
# Transcript sent via data channel
await send_transcript("assistant", response_text, is_final=True)

Format Conversion Best Practices

1. Sanitization

File: /core/app/utils/sanitizers.py

Always sanitize user input before processing:

from app.utils.sanitizers import sanitize_prompt_input

clean_input = sanitize_prompt_input(user_input, strict=False)

2. Content Type Detection

Check message content type before extraction:

if isinstance(msg.content, list):
    # Handle list-based content
    pass
elif isinstance(msg.content, str):
    # Handle string content
    pass

3. Voice-Specific Formatting

For voice channels, inject special instructions:

if channel == "voice":
    system_message = """
    Respond in natural conversational language.
    NO numbered lists, bullet points, or structured data.
    Keep responses SHORT and CONVERSATIONAL.
    """

4. WebSocket Card Detection

Only parse markdown for WebSocket:

if channel == "websocket":
    parsed_content = parse_markdown_to_content_blocks(response_text)
else:
    parsed_content = None  # Other channels use plain text

Error Handling

Common Error Scenarios

1. Empty Response

if not response_text or response_text.strip() == "":
    response_text = "I apologize, but I couldn't generate a response."

2. Malformed Content

try:
    parsed_content = parse_markdown_to_content_blocks(response_text)
except Exception as e:
    logger.error(f"Failed to parse markdown: {e}")
    parsed_content = [{"type": "text", "text": response_text, "markdown": False}]

3. Channel Delivery Failure

try:
    await adapter.send_message(to=recipient, text=response_text)
except Exception as e:
    logger.error(f"Failed to send message: {e}")
    # Store in database for retry
    await save_failed_message(recipient, response_text, error=str(e))

Performance Considerations

1. WebSocket Parsing Overhead

Markdown parsing adds ~50-100ms latency:

Impact: Minimal for most use cases
Optimization: Cached regex patterns via frozenset

2. Voice Latency

Total voice pipeline: ~300-500ms

STT: 100-200ms
LLM: 100-200ms
TTS: 100-200ms
Optimization: Streaming TTS for faster TTFB

3. Teams/WhatsApp API Limits

External API rate limits:

Teams: ~60 requests/minute per bot
WhatsApp: Varies by tier (1K-10K/day)
Mitigation: Queue messages, implement backoff

Testing Response Formats

Unit Tests

Test Message Extraction

def test_extract_response_text():
    messages = [
        HumanMessage(content="Hello"),
        AIMessage(content="Hi there!")
    ]
    result = extract_response_text(messages)
    assert result == "Hi there!"

Test Markdown Parsing

def test_parse_markdown_to_cards():
    markdown = """
    1. **Product A**
       ![Image](url)
       Price: $99
       [Buy Now](url)
    """
    content = parse_markdown_to_content_blocks(markdown)
    assert content[0]["type"] == "cards"
    assert len(content[0]["cards"]) == 1

Integration Tests

Test WebSocket Response

async def test_websocket_structured_response():
    async with websocket_client("/agent-flows/execute-ws/flow-id") as ws:
        await ws.send_json({"type": "user_message", "content": "Show products"})
        response = await ws.receive_json()
        assert "content" in response
        assert isinstance(response["content"], list)

Test Teams Delivery

async def test_teams_send_message():
    response = await teams_adapter.send_message(
        service_url="https://smba.trafficmanager.net/...",
        conversation_id="conv-id",
        text="Test message"
    )
    assert response.status_code == 200

Future Enhancements

1. Adaptive Cards for Teams

Implement Microsoft Adaptive Cards for rich formatting:

{
  "type": "AdaptiveCard",
  "body": [
    {"type": "TextBlock", "text": "Product A"},
    {"type": "Image", "url": "..."}
  ],
  "actions": [
    {"type": "Action.OpenUrl", "url": "...", "title": "Buy"}
  ]
}

2. WhatsApp Interactive Messages

Add support for buttons and lists:

{
  "type": "interactive",
  "interactive": {
    "type": "button",
    "body": {"text": "Choose an option:"},
    "action": {
      "buttons": [
        {"type": "reply", "reply": {"id": "1", "title": "Option 1"}}
      ]
    }
  }
}

3. Rich Media for REST API

Add optional structured output:

{
  "response": "We have laptops",
  "structured_content": {
    "cards": [...]
  }
}

4. Voice Emotion Detection

Analyze user tone and adjust response style:

if user_emotion == "frustrated":
    response_style = "empathetic_and_calm"

Summary

The chatbot implements channel-specific response formatting with:

WebSocket: Rich structured content with markdown parsing and product cards
REST API: Simple plain text with optional streaming
Teams: Plain text via Bot Framework with typing indicators
WhatsApp: Plain text via Meta Graph API
Voice: Natural conversational speech with all formatting removed

All channels share the same AI engine (LangGraph) but transform output according to each platform's capabilities and constraints. WebSocket is the only channel with full structured content support, while voice channels use specialized prompt engineering to produce natural speech output.

Overview​

Architecture Overview​

Channel Capability Matrix​

1. REST API Channel​

Endpoint​

Input Format​

Output Format​

Response Processing​

Limitations​

Use Cases​

2. REST API Streaming Channel​

Endpoint​

Transport​

Output Format​

Streaming Logic​

Benefits​

Use Cases​

3. WebSocket Channel​

Endpoint​

Input Format​

Output Format​

Response Processing​

Structured Content Parsing​

Card Detection Indicators​

Card Structure​

Action Button Detection​

Content Block Types​

Text Block​

Cards Block​

Use Cases​

4. Microsoft Teams Channel​

Webhook Endpoint​

Adapter​

Input Format​

Output Format​

Response Processing​

Adapter Methods​

Authentication​

Limitations​

Use Cases​

5. WhatsApp Channel​

Webhook Endpoint​

Adapter​

Input Format​

Output Format​

Response Processing​

Adapter Methods​

API Configuration​

Limitations​

Use Cases​

6. Voice/SIP Channel (LiveKit)​

Architecture​

Voice-Specific Prompt Engineering​

Input Format​

Output Format​

Response Processing​

Transcript Format​

Transcript Handling​

STT/TTS Configuration​

Speech-to-Text Options​

Text-to-Speech Options​

Limitations​

Use Cases​

Message Extraction Utilities​

Common Functions​

extract_response_text(messages)​

extract_content_text(content)​

extract_tool_calls(messages)​

Response Transformation Pipeline​

Stage 1: LangGraph Output​

Stage 2: Message Extraction​

Stage 3: Content Parsing (Channel-Specific)​

WebSocket: Parse to Structured Blocks​

Other Channels: Plain Text​

Stage 4: Channel Delivery​

REST API​

WebSocket​

Teams​

WhatsApp​

Voice​

Overview

Architecture Overview

Channel Capability Matrix

1. REST API Channel

Endpoint

Input Format

Output Format

Response Processing

Limitations

Use Cases

2. REST API Streaming Channel

Endpoint

Transport

Output Format

Streaming Logic

Benefits

Use Cases

3. WebSocket Channel

Endpoint

Input Format

Output Format

Response Processing

Structured Content Parsing

Card Detection Indicators

Card Structure

Action Button Detection

Content Block Types

Text Block

Cards Block

Use Cases

4. Microsoft Teams Channel

Webhook Endpoint

Adapter

Input Format

Output Format

Response Processing

Adapter Methods

Authentication

Limitations

Use Cases

5. WhatsApp Channel

Webhook Endpoint

Adapter

Input Format

Output Format

Response Processing

Adapter Methods

API Configuration

Limitations

Use Cases

6. Voice/SIP Channel (LiveKit)

Architecture

Voice-Specific Prompt Engineering

Input Format

Output Format

Response Processing

Transcript Format

Transcript Handling

STT/TTS Configuration

Speech-to-Text Options

Text-to-Speech Options

Limitations

Use Cases

Message Extraction Utilities

Common Functions

`extract_response_text(messages)`

`extract_content_text(content)`

`extract_tool_calls(messages)`

Response Transformation Pipeline

Stage 1: LangGraph Output

Stage 2: Message Extraction

Stage 3: Content Parsing (Channel-Specific)

WebSocket: Parse to Structured Blocks

Other Channels: Plain Text

Stage 4: Channel Delivery

REST API

WebSocket

Teams

WhatsApp

Voice