Tiago Fortunato
ProjectsOdysAI Layer

WhatsApp AI Intake

The WhatsApp AI intake agent processes inbound messages, uses Groq for tool-calling, and facilitates appointment creation.

WhatsApp AI Intake

The WhatsApp AI Intake agent serves as a crucial interface, enabling clients to interact with professionals and schedule appointments directly through WhatsApp. This system is designed to understand natural language requests, check professional availability, and create bookings using a large language model (LLM) from Groq, integrated with a suite of specialized tools. It represents a significant component of the platform's automated client engagement strategy, streamlining the booking process and reducing manual overhead for professionals.

Overview

At its core, the WhatsApp AI Intake agent operates by receiving inbound messages via a dedicated webhook, processing them through an LLM, and executing specific actions based on the client's intent. The entire process follows a two-pass LLM pattern, similar to the general /api/ai/chat route:

  1. First Pass: The LLM analyzes the incoming message and the conversation history to determine if a specific tool needs to be invoked to fulfill the user's request.
  2. Second Pass: If a tool was called, its results are fed back to the LLM, which then generates a natural language response to the client, incorporating the tool's output.

This architecture relies heavily on several database tables, including professionals for professional details, availability for scheduling rules, appointments for managing bookings, clients for client information, and notifications for alerting professionals. Conversation state and transactional context are managed using Redis, ensuring continuity across interactions. The system also integrates with WhatsApp's messaging capabilities for sending templated responses, such as msgBookingRequest, and uses email for professional notifications.

WhatsApp Webhook

The entry point for all inbound WhatsApp messages is the /api/whatsapp/webhook API route, which handles POST requests. While the specific implementation details of this route were not provided, its purpose is to receive messages from the WhatsApp platform and forward them to the AI intake logic. This webhook is subject to a whatsapp rate limit of 10 requests per minute, preventing abuse and ensuring system stability under high load. Upon receiving a message, the webhook is responsible for extracting the message content, the sender's phone number, and the professional's identifier, then passing this information to the handleIncomingMessage function in src/lib/ai-intake.ts.

AI Intake Agent (src/lib/ai-intake.ts)

The src/lib/ai-intake.ts file encapsulates the core logic for the AI WhatsApp Intake Agent. It orchestrates the interaction between the incoming message, the LLM, and the backend services.

handleIncomingMessage

This asynchronous function is the central orchestrator. It takes the client's message, sender's phone, and the target professionalId as input.

  1. Professional and Conversation Loading: It first retrieves the professional details from the professionals table and loads the ongoing conversation state from Redis using getConversation. If no conversation exists, it attempts to identify the client from the clients table based on their phone number and initializes a new conversation state.
  2. Message History and Context: The user's message is appended to the conversation history. A systemPrompt is constructed using buildSystemPrompt, providing the LLM with its persona and rules. Crucially, if the inbound message is a reply to a recent transactional notification (e.g., a reminder or confirmation), the outboundContext (stored in Redis via setOutboundContext) is used to inject specific details about the referenced appointmentId into the LLM's system messages. This helps the LLM understand the context of replies like "I need to reschedule."
  3. Groq Interaction: The function then interacts with the Groq LLM using the groq-sdk library, specifically the llama-3.3-70b-versatile model.
    • First Call: The initial call to Groq includes the systemPrompt, client context, conversation history, and the TOOLS definitions. The tool_choice: "auto" setting allows the LLM to decide whether to invoke one of the defined functions.
    • Tool Execution: If the LLM decides to call a tool, the handleIncomingMessage function parses the tool call, executes the corresponding local function (e.g., getAvailableSlots, bookAppointment), and captures the result.
    • Second Call: The tool's output is then added back to the message history as a tool message, and a second call is made to the Groq LLM. This time, the LLM generates the final, human-readable response to the client, incorporating the results of the tool execution.
  4. Conversation Saving: After generating a reply, the updated conversation state, including the LLM's response, is saved back to Redis using saveConversation, ensuring continuity for future interactions.

Timezone Helpers

Brazil abolished DST in 2019, making São Paulo consistently UTC-3. The saoPauloDate, formatSaoPauloTime, and formatSaoPauloDate functions ensure that all date and time operations, especially for appointment scheduling, are correctly handled in the America/Sao_Paulo timezone. This prevents common off-by-one errors related to timezones.

Tool Definitions (TOOLS)

The TOOLS array defines the functions that the Groq LLM can invoke. Each tool has a name, a description (in Portuguese, guiding the LLM on when to use it), and parameters schema. The agent defines three tools:

  • get_available_slots: Used to retrieve available appointment times for a specific date. The LLM is instructed to always use this before suggesting times.
  • book_appointment: Creates a new appointment. The LLM is instructed to use this only after confirming the date, time, and client's name.
  • get_professional_info: Fetches details about the professional, such as their name, profession, session duration, price, and general availability schedule.

Tool Implementations

Each tool definition corresponds to an asynchronous function that interacts with the database or other services:

  • getAvailableSlots(professionalId, dateStr): This function queries the professionals table for session duration and the availability table for the professional's weekly schedule. It then fetches existing appointments for the given date, excluding rejected or cancelled ones. It iterates through the professional's availability rules for the specified day, generating potential slots based on sessionDuration, and filters out any slots that conflict with existing appointments or are in the past.
  • bookAppointment(professionalId, dateStr, time, clientName, clientPhone): This is a critical function that handles the creation of a new appointment. It performs an atomic operation within a Drizzle ORM transaction with serializable isolation level. This isolation level is chosen to prevent race conditions where two clients might attempt to book the same slot simultaneously.
    1. It first checks for any conflicting appointments within the desired time slot. If a conflict is detected, it throws a SLOT_TAKEN error, which is caught and translated into a user-friendly message.
    2. It then either finds an existing client by phone number or creates a new entry in the clients table.
    3. Finally, it inserts the new appointment into the appointments table, setting its status to confirmed or pending_confirmation based on the professional's autoConfirm setting. After a successful booking, it triggers several notifications: an in-app notification for the professional, a WhatsApp message to the professional using msgBookingRequest, and an email to the professional if an email address is available.
  • getProfessionalInfo(professionalId): This function retrieves basic information about the professional from the professionals table and their availability rules, formatting the schedule for the LLM to present to the client.

System Prompt Builder (buildSystemPrompt)

The buildSystemPrompt function constructs the initial instructions for the LLM. It defines the AI's persona as the scheduling assistant for a specific professional, sets the primary goal of helping clients schedule appointments, and outlines crucial rules: always respond in Brazilian Portuguese, never invent times, ask for missing information (date, client name), confirm appointments clearly, and politely redirect out-of-scope questions. It also explicitly forbids the LLM from identifying itself as an AI.

Conversation Management (src/lib/conversation.ts)

The src/lib/conversation.ts file manages the state of ongoing conversations using Redis.

  • ConversationState: This interface defines the structure for storing conversation data, including professionalId, clientPhone, clientName, a history of messages, and a context object to track pending booking details (pendingDate, pendingSlot, appointmentCreated).
  • Redis Storage: Functions like getConversation and saveConversation interact with Redis to retrieve and persist ConversationState objects. Conversations are stored with a TTL (Time To Live) of 30 minutes, ensuring that inactive conversations are automatically cleared. A MAX_MESSAGES limit of 20 messages is enforced to keep the conversation history within the LLM's context window and manage memory usage. Phone numbers are normalized using stripDigits to create consistent Redis keys (conv:{professionalId}:{phone}).
  • Client Activity Tracking: The markClientActive and isClientActive functions manage a client_active:{phone} key in Redis with an ACTIVE_TTL of 15 minutes. This mechanism is used by the cron job (/api/cron/reminders) to check if a client is currently engaged in a conversation with the AI. If a client is active, transactional reminders are deferred to avoid interrupting an ongoing scheduling dialogue.
  • Outbound Context: The setOutboundContext and getOutboundContext functions store information about the last transactional message sent to a client (e.g., a reminder or confirmation). This OutboundContext includes the professionalId and optionally an appointmentId, and has a TTL of 2 hours. When a client replies, the handleIncomingMessage function can retrieve this context, allowing the LLM to understand that the reply might be related to a specific, recent appointment.

Design Decisions

The design of the WhatsApp AI Intake agent reflects several deliberate choices to balance functionality, performance, and user experience:

  • Two-Pass LLM Architecture: The decision to use a two-pass approach for LLM interaction (first for tool selection, second for response generation) provides greater control and reliability. It separates the complex task of intent recognition and tool parameter extraction from the final response generation, leading to more accurate tool use and coherent replies.
  • Transactional Booking with SERIALIZABLE Isolation: The bookAppointment function's use of a Drizzle transaction with serializable isolation level is a critical choice for data integrity. This prevents race conditions where multiple concurrent requests might attempt to book the same appointment slot, ensuring that each slot is assigned uniquely. The explicit handling of SLOT_TAKEN and SQLSTATE 40001 errors provides a clear feedback mechanism to the user.
  • Redis for Ephemeral Conversation State: Employing Redis for conversation state management (ConversationState) is a pragmatic choice. Redis's in-memory nature offers low-latency access, which is essential for interactive chat applications. The use of TTL automatically prunes stale conversations, preventing unbounded memory growth, while MAX_MESSAGES keeps the LLM context manageable.
  • Explicit Tool Definitions and Descriptions: Providing the LLM with clear, descriptive tool definitions (including their purpose and parameters) is fundamental to enabling effective tool-calling. The descriptions in Portuguese directly guide the LLM's decision-making process, improving its ability to correctly identify when and how to use each function.
  • Dedicated Timezone Handling: The explicit timezone helper functions (saoPauloDate, etc.) for America/Sao_Paulo demonstrate a commitment to accuracy in scheduling. Given the complexities of timezones and the abolition of DST in Brazil, this dedicated logic prevents common scheduling errors and ensures that appointments are recorded and displayed correctly.
  • Client Activity Key for Reminder Deferral: The client_active key in Redis is a thoughtful design choice to enhance user experience. By deferring automated reminders when a client is actively conversing with the AI, the system avoids sending potentially confusing or redundant messages, maintaining a smoother and more natural dialogue flow.

Potential Improvements

  1. Dynamic Tool Augmentation: The TOOLS array is statically defined. The get_professional_info tool already fetches some professional data. This could be expanded to dynamically augment tool descriptions or the system prompt with more specific details about the professional's services, specialties, or even custom booking policies. For example, if a professional specializes in "therapy for anxiety," this information could be injected to guide the LLM's responses and tool choices more effectively, making the agent feel more personalized.
  2. Structured Error Handling for LLM Interpretation: The getAvailableSlots and bookAppointment functions return simple error messages (e.g., { error: "Data inválida." } or { success: false, reason: "SLOT_TAKEN" }). While the LLM can parse these, providing more structured error objects (e.g., with specific error codes or enumerated types) could allow the LLM to generate more precise, context-aware, and helpful error messages to the client, rather than relying solely on natural language inference.
  3. Advanced Conversation Context Management: The ConversationState.context object currently tracks pendingDate, pendingSlot, and appointmentCreated. For more complex multi-turn dialogues, this context could be enriched to track multiple potential appointment options, client preferences (e.g., "morning appointments"), or explicit user intent states (e.g., "rescheduling_flow", "new_booking_flow"). This would allow the LLM to maintain a deeper understanding of the conversation's trajectory and handle more nuanced requests without losing track.
  4. Proactive Slot Suggestion: The get_available_slots tool requires a date parameter. The system prompt instructs the LLM to ask for a date if not provided. An enhancement could involve modifying get_available_slots or adding a new tool that, when called without a specific date, returns the next few available slots across different days. This would allow the AI to proactively suggest options like "I have openings next Tuesday at 10:00 AM or Wednesday at 2:00 PM" if the client expresses general interest in booking, reducing the number of turns required.

References

  • src/lib/ai-intake.ts
  • src/lib/conversation.ts
  • src/app/api/whatsapp/webhook

On this page