Tiago Fortunato
ProjectsOdysWhatsApp

WhatsApp Watchdog Cron

Daily health-check cron for Evolution API, ensuring the reliability of Odys's WhatsApp messaging integration.

WhatsApp Watchdog Cron

In the intricate ecosystem of Odys, reliable communication is paramount, especially through channels like WhatsApp. This page delves into the whatsapp-watchdog cron job, a critical component designed to maintain the health and operational integrity of Odys's integration with the Evolution API, which powers our WhatsApp messaging capabilities.

Overview

The whatsapp-watchdog cron job serves as a daily health-check mechanism for the Evolution API. Its primary role is to proactively monitor the external WhatsApp service, ensuring that Odys can consistently send and receive messages, which is vital for features like appointment reminders, booking confirmations, and direct client-professional communication.

This cron job is scheduled to run once every day, specifically at 9 AM UTC. This schedule is explicitly defined within the vercel.json configuration file, where you'll find the entry for the /api/cron/whatsapp-watchdog path with its 0 9 * * * cron expression. This daily cadence aims to strike a balance between frequent monitoring and resource consumption, providing a regular pulse check on a critical external dependency.

The Evolution API is fundamental to how Odys leverages WhatsApp. While the watchdog itself is a health check, its existence underscores the importance of the broader WhatsApp integration, which supports a rich set of communication flows. Odys utilizes a total of 19 distinct WhatsApp message templates, ranging from msgBookingRequest and msgReminder24h to msgNewMessageToClient and msgPaymentConfirmedToPro. The smooth operation of the Evolution API, as monitored by this watchdog, directly impacts the ability to deliver these templated messages effectively to both clients and professionals.

Design decisions

The decision to implement a dedicated cron job for a daily health check on the Evolution API stems from the understanding that external services can experience outages or performance degradation. Relying solely on reactive error handling during message sending would mean that issues are only detected when a user attempts to send a message, leading to a poor user experience and potential loss of critical communications.

By scheduling a daily watchdog, Odys adopts a proactive stance. The 0 9 * * * schedule was likely chosen to run early in the day, before peak usage hours, allowing for potential issues to be identified and addressed before they significantly impact users. This daily frequency is a trade-off: it's frequent enough to catch persistent problems within a 24-hour window, but not so frequent as to overwhelm the external API or Odys's own infrastructure with constant pings. A more frequent check might be overkill for an API that is generally stable, while a less frequent one could mean longer periods of undetected downtime.

The choice of a cron job, rather than a real-time monitoring system, reflects a design philosophy focused on simplicity and cost-effectiveness for this particular type of health check. It leverages existing infrastructure (Vercel's cron capabilities) without requiring a separate, more complex monitoring solution for this specific task.

Potential improvements

While the whatsapp-watchdog cron job provides a foundational health check, there are several avenues for enhancement to make it more robust and informative:

  1. Implement comprehensive logging and alerting: Currently, the cron job's output or status isn't explicitly tied to an alerting system. A health check is most valuable when it actively notifies relevant stakeholders upon failure. Consider integrating with a logging service (e.g., Sentry, Datadog) to record the outcome of each check, and, crucially, trigger alerts (e.g., Slack, email, PagerDuty) if the Evolution API is deemed unhealthy. This would transform the check from a passive observation into an actionable monitoring tool. The cron job handler at /api/cron/whatsapp-watchdog should include logic to log its success or failure and dispatch alerts.

  2. Introduce specific API endpoint checks: Instead of a generic "health check," the watchdog could perform more targeted validations. For instance, it could attempt to retrieve a list of available WhatsApp templates or send a test message to a controlled number. This would verify not just the API's general availability, but also the functionality of specific critical paths. This would involve making more specific API calls within the logic of the /api/cron/whatsapp-watchdog handler, potentially leveraging the knowledge of the 19 whatsapp templates defined in Odys.

  3. Dynamic professional status updates: If the Evolution API is consistently failing, it might impact the ability of professionals to use WhatsApp for their services. The watchdog could, upon detecting prolonged unhealthiness, potentially update the active status of relevant professionals in the professionals table or send them an internal notification. This would require adding logic to the /api/cron/whatsapp-watchdog handler to interact with the database, specifically the professionals table, to reflect the operational status of the WhatsApp integration. This would provide a more direct feedback loop to users whose services rely heavily on WhatsApp communication.

References

  • Cron job definition: vercel.json
  • Cron job handler path: src/app/api/cron/whatsapp-watchdog

On this page