Voice
- Realtime (speech-to-speech) and streaming (STT → agent → TTS), both with barge-in.
- Bilingual Arabic / English - wake-phrase + filler stripping, whole-utterance verbal interrupts.
- DTMF / IVR keypad input.
- Roster greeting by first name + a “thinking” expression while it works.
- Realtime delegation -
consult(inline) andagent_task(background) hand work to your agent.
Vision
look_at_screenover camera + screen-share frames.- Continuous vision - the latest changed frame is pushed (~6 s) with no forced response.
- Scene-change ambient notices + retroactive lookback (16-keyframe ring, attributed).
- Per-call spend cap via
maxVisionPerMinuteto bound cost.
Avatar (CVI rendering cues)
- Expression cues - neutral / happy / sad / surprised + thinking.
- Viseme
speech.markslip-sync. show_to_caller→ fullscreen or picture-in-picture image, captions, paced slideshow.
Group & meeting
- Per-participant attribution; speaks only when addressed (2+ humans → wake phrase + follow-up window; 1:1 always responds). Race-free on realtime (auto-response off in meetings).
Outbound - “call me back”
call_me_backplaces an outbound call via StandIn’s HMAC, SSRF-guarded endpoint; greets on answer; correlates the pending result with a TTL. Voicemail fallback is handled by StandIn.
Chat & governance
- “Ask about this” message action · voice-message transcription (opt-in) · audit-log channel (opt-in, loop-guarded) · DLP outbound redaction (opt-in) on text, adaptive cards, and captions.
These live in the Teams messaging adapter, not the voice bridge - enable your runtime’s Teams
chat channel/plugin alongside the voice plugin to get them.
Meeting productivity
- End-of-meeting recap (opt-in) + on-demand
post_meeting_minutes, posted to the Teams chat with per-speaker attribution. With SharePoint configured, minutes attach as a Word.docxfile card.
Security
HMAC bridge
Every StandIn connection is HMAC-SHA256 signed with a replay guard and a ±60 s window. The
sharedSecret must byte-match the secret set in StandIn.Caller allowlist
Restrict who the bot answers by AAD object id - closed by default when an allowlist is set.
Recording gate
No media-derived data is processed until the call’s recording status has been signalled.
Your own bot identity
Bring your own Azure AD bot - your tenant owns it, with a per-bot HMAC secret never shared across tenants.