Posted by & filed under Cassandra, Python, Web development.

The latest Python DataStax Driver is a welcomed addition to the available Python clients that support CQL3 with Cassandra. One unfortunate pain point with this client (and possibly others) is the amount of time it takes to connect to a cluster. Because each connection requires the client to fetch the entire cluster’s schema, time-sensitive environments (such as web-based requests) need special care taken to ensure these connections are handled outside of the web requests themselves.

The novice approach (which we also originally took) is to initialize the connection to the cluster on each request. Even a single keyspace with one column family took upwards of 400ms to complete the initial connection — this is without any network access. Obviously having a 400ms baseline without even performing any queries is going to be a problem.

As we attempted to resolve this issue, we discovered the newer versions of uWSGI support a @postfork Python decorator which can be used to execute code after a worker process has forked. This means that when we first start up our uWSGI server and workers are created, we can establish our connection at that point, allowing the Cassandra client to do its business well before any requests are handled.

Here’s an example:

app.wsgi:

from DataStore import Cassandra

class Index(object):
    def GET(self):
        # Cassandra can be queried from here
        # without any connection overhead
        return Cassandra.cassandra.getSession().execute('SELECT `col` FROM `columnfamily`')

urls = (
    '/', Index
)<
app = web.application(urls, globals(), autoreload=True)
application = app.wsgifunc()

DataStore/Cassandra.py

from uwsgidecorators import *

cassandra = None

# This is executed when uWSGI forks
@postfork
def initCassandra():
    global cassandra
    cassandra = CassandraDatastore('keyspace')
    cassandra.createSession()

class CassandraDatastore(object):
    def __init__(self, keyspace):
        self.cluster = None
        self.session = None
        self.keyspace = keyspace

    def __del__(self):
        self.cluster.shutdown()

    def createSession(self):
        self.cluster = Cluster(['localhost'])
        self.session = self.cluster.connect(self.keyspace)

    def getSession(self):
        return self.session

 

This is only a rough example and it could use a bit of tweaking, but this method reduces our request times dramatically.

 

Leave a Reply

Your email address will not be published. Required fields are marked *