Query caching with MySQL Connector/Python

This blog post shows how to create a cursor class for MySQL Connector/Python which will allow you to cache queries. It will hold the query itself and the result in a global variable.

Note: this is a proof of concept and is only meant as a demonstration on how to extend MySQL Connector/Python.

Why query caching?

You are doing lots of queries that have the same result. It would be expensive to always run the same exact query. MySQL has already a query cache, and there is also memcached. But you like MySQL Connector/Python so much you’d like to do it yourself.

A cursor caching queries and their result

To demonstrate a simple implementation of a query cache, we inherit from an existing class: MySQLCursorBuffered. It will save the executed operation with their results in a ‘global’ variable. We call this cursor MySQLCursorQueryCache.

We take the buffered cursor because we’d like to save the result right away. Below you see we only changed two methods for MySQLCursorBuffered:

  • .execute(): it will now first check using an md5 checksum whether we executed the query before. If we did, we set the make the cached result active. If not, we simply executed.
  • ._handle_resultset(): called when .execute() did an operation which has a result set. The result we know save in the QUERY_CACHE global dict.
from hashlib import md5
import mysql.connector

QUERY_CACHE = dict()

class MySQLCursorQueryCache(mysql.connector.cursor.MySQLCursorBuffered):
    def execute(self, operation, params=None):
        self._qmd5 = md5(operation).digest()
        if QUERY_CACHE.has_key(self._qmd5):
            (self._rows, eof) = QUERY_CACHE[self._qmd5]
            self.rowcount = len(self._rows)
            self._handle_eof(eof)
            self._next_row = 0
        else:
            super(MySQLCursorQCache, self).execute(operation, params)
    
    def _handle_resultset(self):
        (self._rows, eof) = self.db().protocol.get_rows()
        self.rowcount = len(self._rows)
        self._handle_eof(eof)
        self._next_row = 0
        QUERY_CACHE[self._qmd5] = (self._rows, eof)
        try:
            self.db().unread_result = False
        except:
            pass
        self._qmd5 = None

The above code is a proof of concept, there is lots of room for improvement. For example, you need something to invalidate entries in the query cache.

How to use MySQLCursorQueryCache

def main():
    
    cnx = mysql.connector.connect(database='test')
    cur = cnx.cursor(cursor_class=MySQLCursorQueryCache)
    
    cur.execute("SELECT NOW()")
    print cur.fetchone()
    time.sleep(2)
    cur.execute("SELECT NOW()")
    print cur.fetchone()
    
    cur.close()
    cnx.close()

When you would use the default cursor, both executed SQL statements would produce a different result. The above produces the following output:

(datetime.datetime(2010, 11, 22, 21, 20, 4),)
(datetime.datetime(2010, 11, 22, 21, 20, 4),)

When you’d like to have some statements cached, and some not, just create a second cursor cursor_class=MySQLCursorBuffered (see Buffering results with MySQL Connector/Python).

Buffering results with MySQL Connector/Python

MySQL Connector/Python doesn’t buffer results by default. This means you have to fetch the rows when you issued a SELECT. This post describes how you can change this behavior.

Why buffering result sets?

Buffering or storing the result set on the client side is handy when you, for example, would like to use multiple cursors per connection and you’de like to traverse each one interleaved.

Keep in mind that with bigger result sets, the client side will use more memory. You just need to find out for yourself what’s best. When you know result sets are mostly small, you might opt to buffer.

MySQLdb by default buffers results and you need to use a different cursor to disable it. oursql does not buffer by default. This is good to know when you start using a different MySQL DB API for Python.

Use case: 1 connection, fetch from 2 cursors

You want to fetch data from two tables and process the data using one connection. If you do it without buffering, you would run into the following:

>>> cnx = mysql.connector.connect(database='test')
>>> cur1 = cnx.cursor()
>>> cur2 = cnx.cursor()
>>> cur1.execute("SELECT c1 FROM t1")
-1
>>> cur2.execute("SELECT c1 FROM t1")
..
mysql.connector.errors.InternalError: Unread result found.

MySQL Connector/Python offers two ways to turn buffering on or off. Either per connection or per cursor using the buffered argument set to True

Set buffering per connection

If you want all the cursors for a particular connection to be buffered, you can turn it on when connecting to MySQL setting the buffered-argument to True.

>>> import mysql.connector
>>> cnx = mysql.connector.connect(database='test',buffered=True)
>>> cur1 = cnx.cursor()
>>> cur1.__class__
<class 'mysql.connector.cursor.MySQLCursorBuffered'>
>>> cur2 = cnx.cursor()
>>> cur1.execute("SELECT c1 FROM t1")
3
>>> cur2.execute("SELECT c1 FROM t1")
3
>>> cur1.fetchone()
(u'Geert',)
>>> cur2.fetchone()

Set buffering per cursor

You can request a buffering cursor buffering from the connection object. Do this use the buffered-argument set to True:

>>> import mysql.connector
>>> cnx = mysql.connector.connect(database='test')
>>> <strong>cur1 = cnx.cursor(buffered=True)</strong>
>>> cur1.__class__
<class 'mysql.connector.cursor.MySQLCursorBuffered'>

Note: the above should work with MySQL Connector/Python v0.2 or greater