Scheduling tasks to run at particular times is a fairly common occurrence when developing web applications. Cleaning out old session data, generating reports, flushing queues, checking job statuses, and sending emails are just a few examples of tasks that either need to run at specific times or might not have another event to trigger them.
The concept of cron works very well in these scenarios: a server monitors the current time and executes a task when the desired time arrives. It’s simple, gets the job done, and works well — until it comes time to scale.
A common scaling approach is to load balance traffic across multiple application servers. With each application server now running its own set of cron jobs, there’s an obvious problem. Notably, the email you send to your users every day at 9AM is now being sent by each application server.
There are some different ways to tackle this situation, and all of them have their pros and cons:
AWS Elastic Beanstalk “leader_only” option
If you’re already using Elastic Beanstalk, this is the method that’s commonly attempted first.
This option instructs Elastic Beanstalk to only execute a particular command on one server at deployment time. Some developers utilize this to create a crontab on that machine which is responsible for all tasks that should only run on one server.
Unfortunately, there’s no guarantee that the machine designated as “leader_only” won’t be terminated during an autoscaling event, losing the machine that was responsible for running those tasks.
AWS Elastic Beanstalk Periodic Tasks
A recent update to Elastic Beanstalk supports the concept of periodic tasks through the worker tier. While this solution does not support running tasks directly on the application servers and launching a worker tier may be cost prohibitive, it may still be a viable alternative for some users.
Dedicated crond server
A more complicated setup involves creating a dedicated server outside of any autoscaling environment that’s responsible for running the cron daemon and instructing each server when to run each task via message queues or other means.
While costly and requiring careful supervision of the single external cron daemon, this approach does provide more flexibility, but running critical tasks this way would not be recommended.
Process lock
A fairly complex, but comprehensive approach involves running all cron jobs on all servers, but incorporating a locking mechanism for tasks that should only run on one server. The machine that acquires the lock the quickest is allowed to run the task, while the others would honor the lock and skip the task.
This offers a lot of flexibility and is fault tolerant, but requires software to support the locking (i.e. Redis, etcd, etc.) and can be complicated to manage.
Cronally
I’d like to introduce Cronally. Cronally eliminates complicated setups, relieves the risks of running a single server, and avoids the costs associated with a dedicated crond machine.
Here’s how it works: using our CLI, create a cron job directly from the command line and Cronally will send a message to an SNS topic of your choosing at the time you define using standard cron syntax.
From there, you can run your task in any number of ways:
– From a signed POST to an HTTP/HTTPS endpoint
– A message from an SQS queue
– Via email
– An AWS Lambda function
Cronally is completely managed through a simple command line interface, so you can create and manage cron jobs without leaving your shell.
Visit https://cronally.com to join the beta.
Leave a Reply