Skip to main content
Both plugins implement the same feature set over the shared CVI bridge.

Voice

  • Realtime (speech-to-speech) and streaming (STT → agent → TTS), both with barge-in.
  • Bilingual Arabic / English - wake-phrase + filler stripping, whole-utterance verbal interrupts.
  • DTMF / IVR keypad input.
  • Roster greeting by first name + a “thinking” expression while it works.
  • Realtime delegation - consult (inline) and agent_task (background) hand work to your agent.

Vision

  • look_at_screen over camera + screen-share frames.
  • Continuous vision - the latest changed frame is pushed (~6 s) with no forced response.
  • Scene-change ambient notices + retroactive lookback (16-keyframe ring, attributed).
  • Per-call spend cap via maxVisionPerMinute to bound cost.

Avatar (CVI rendering cues)

  • Expression cues - neutral / happy / sad / surprised + thinking.
  • Viseme speech.marks lip-sync.
  • show_to_caller → fullscreen or picture-in-picture image, captions, paced slideshow.

Group & meeting

  • Per-participant attribution; speaks only when addressed (2+ humans → wake phrase + follow-up window; 1:1 always responds). Race-free on realtime (auto-response off in meetings).

Outbound - “call me back”

  • call_me_back places an outbound call via StandIn’s HMAC, SSRF-guarded endpoint; greets on answer; correlates the pending result with a TTL. Voicemail fallback is handled by StandIn.

Chat & governance

  • “Ask about this” message action · voice-message transcription (opt-in) · audit-log channel (opt-in, loop-guarded) · DLP outbound redaction (opt-in) on text, adaptive cards, and captions.
These live in the Teams messaging adapter, not the voice bridge - enable your runtime’s Teams chat channel/plugin alongside the voice plugin to get them.

Meeting productivity

  • End-of-meeting recap (opt-in) + on-demand post_meeting_minutes, posted to the Teams chat with per-speaker attribution. With SharePoint configured, minutes attach as a Word .docx file card.

Security

HMAC bridge

Every StandIn connection is HMAC-SHA256 signed with a replay guard and a ±60 s window. The sharedSecret must byte-match the secret set in StandIn.

Caller allowlist

Restrict who the bot answers by AAD object id - closed by default when an allowlist is set.

Recording gate

No media-derived data is processed until the call’s recording status has been signalled.

Your own bot identity

Bring your own Azure AD bot - your tenant owns it, with a per-bot HMAC secret never shared across tenants.