The latest Python DataStax Driver is a welcomed addition to the available Python clients that support CQL3 with Cassandra. One unfortunate pain point with this client (and possibly others) is the amount of time it takes to connect to a cluster. Because each connection requires the client to fetch the entire cluster’s schema, time-sensitive environments (such as web-based requests) need special care taken to ensure these connections are handled outside of the web requests themselves.
The novice approach (which we also originally took) is to initialize the connection to the cluster on each request. Even a single keyspace with one column family took upwards of 400ms to complete the initial connection — this is without any network access. Obviously having a 400ms baseline without even performing any queries is going to be a problem.
As we attempted to resolve this issue, we discovered the newer versions of uWSGI support a @postfork
Python decorator which can be used to execute code after a worker process has forked. This means that when we first start up our uWSGI server and workers are created, we can establish our connection at that point, allowing the Cassandra client to do its business well before any requests are handled.
Here’s an example:
app.wsgi:
from DataStore import Cassandra class Index(object): def GET(self): # Cassandra can be queried from here # without any connection overhead return Cassandra.cassandra.getSession().execute('SELECT `col` FROM `columnfamily`') urls = ( '/', Index )< app = web.application(urls, globals(), autoreload=True) application = app.wsgifunc()
DataStore/Cassandra.py
from uwsgidecorators import * cassandra = None # This is executed when uWSGI forks @postfork def initCassandra(): global cassandra cassandra = CassandraDatastore('keyspace') cassandra.createSession() class CassandraDatastore(object): def __init__(self, keyspace): self.cluster = None self.session = None self.keyspace = keyspace def __del__(self): self.cluster.shutdown() def createSession(self): self.cluster = Cluster(['localhost']) self.session = self.cluster.connect(self.keyspace) def getSession(self): return self.session
This is only a rough example and it could use a bit of tweaking, but this method reduces our request times dramatically.
Leave a Reply