Features - Teams Voice Plugin (Stand in)

Both plugins implement the same feature set over the shared CVI bridge.

Voice

Realtime (speech-to-speech) and streaming (STT → agent → TTS), both with barge-in.
Bilingual Arabic / English - wake-phrase + filler stripping, whole-utterance verbal interrupts.
DTMF / IVR keypad input.
Roster greeting by first name + a “thinking” expression while it works.
Realtime delegation - consult (inline) and agent_task (background) hand work to your agent.

Vision

look_at_screen over camera + screen-share frames.
Continuous vision - the latest changed frame is pushed (~6 s) with no forced response.
Scene-change ambient notices + retroactive lookback (16-keyframe ring, attributed).
Per-call spend cap via maxVisionPerMinute to bound cost.

Avatar (CVI rendering cues)

Expression cues - neutral / happy / sad / surprised + thinking.
Viseme speech.marks lip-sync.
show_to_caller → fullscreen or picture-in-picture image, captions, paced slideshow.

Group & meeting

Per-participant attribution; speaks only when addressed (2+ humans → wake phrase + follow-up window; 1:1 always responds). Race-free on realtime (auto-response off in meetings).

Outbound - “call me back”

call_me_back places an outbound call via StandIn’s HMAC, SSRF-guarded endpoint; greets on answer; correlates the pending result with a TTL. Voicemail fallback is handled by StandIn.

Chat & governance

“Ask about this” message action · voice-message transcription (opt-in) · audit-log channel (opt-in, loop-guarded) · DLP outbound redaction (opt-in) on text, adaptive cards, and captions.

These live in the Teams messaging adapter, not the voice bridge - enable your runtime’s Teams chat channel/plugin alongside the voice plugin to get them.

Meeting productivity

End-of-meeting recap (opt-in) + on-demand post_meeting_minutes, posted to the Teams chat with per-speaker attribution. With SharePoint configured, minutes attach as a Word .docx file card.

Security

HMAC bridge

Every StandIn connection is HMAC-SHA256 signed with a replay guard and a ±60 s window. The sharedSecret must byte-match the secret set in StandIn.

Caller allowlist

Restrict who the bot answers by AAD object id - closed by default when an allowlist is set.

Recording gate

No media-derived data is processed until the call’s recording status has been signalled.

Your own bot identity

Bring your own Azure AD bot - your tenant owns it, with a per-bot HMAC secret never shared across tenants.

​Voice

​Vision

​Avatar (CVI rendering cues)

​Group & meeting

​Outbound - “call me back”

​Chat & governance

​Meeting productivity

​Security

HMAC bridge

Caller allowlist

Recording gate

Your own bot identity

Voice

Vision

Avatar (CVI rendering cues)

Group & meeting

Outbound - “call me back”

Chat & governance

Meeting productivity

Security