1. System Overview

OMANI-Therapist-Voice is a real-time, voice-first mental health conversational chatbot for Omani Arabic speakers. The system provides culturally sensitive, therapeutic-grade conversations using advanced speech processing and dual-model AI validation, with a focus on low-latency, safety, and clinical effectiveness. It uses all the industry standard techstack like Langchain as the LLM framework.


2. High-Level Architecture

Components:

Data Flow:

  1. User speaks into the frontend interface.
  2. Audio is sent to the backend for transcription.
  3. Backend converts audio to wav, transcribes using Azure STT (Omani Arabic + English).
  4. Transcription and chat history are sent to the LLM microservice (LangChain) for response generation and safety validation.
  5. The LLM microservice:
  6. Backend synthesizes the reply to Omani Arabic speech using Azure TTS.
  7. Audio is streamed back to the frontend for playback.

3. Detailed Component Design