Query caching with MySQL Connector/Python

This blog post shows how to create a cursor class for MySQL Connector/Python which will allow you to cache queries. It will hold the query itself and the result in a global variable.

Note: this is a proof of concept and is only meant as a demonstration on how to extend MySQL Connector/Python.

Why query caching?

You are doing lots of queries that have the same result. It would be expensive to always run the same exact query. MySQL has already a query cache, and there is also memcached. But you like MySQL Connector/Python so much you’d like to do it yourself.

A cursor caching queries and their result

To demonstrate a simple implementation of a query cache, we inherit from an existing class: MySQLCursorBuffered. It will save the executed operation with their results in a ‘global’ variable. We call this cursor MySQLCursorQueryCache.

We take the buffered cursor because we’d like to save the result right away. Below you see we only changed two methods for MySQLCursorBuffered:

  • .execute(): it will now first check using an md5 checksum whether we executed the query before. If we did, we set the make the cached result active. If not, we simply executed.
  • ._handle_resultset(): called when .execute() did an operation which has a result set. The result we know save in the QUERY_CACHE global dict.
from hashlib import md5
import mysql.connector

QUERY_CACHE = dict()

class MySQLCursorQueryCache(mysql.connector.cursor.MySQLCursorBuffered):
    def execute(self, operation, params=None):
        self._qmd5 = md5(operation).digest()
        if QUERY_CACHE.has_key(self._qmd5):
            (self._rows, eof) = QUERY_CACHE[self._qmd5]
            self.rowcount = len(self._rows)
            self._handle_eof(eof)
            self._next_row = 0
        else:
            super(MySQLCursorQCache, self).execute(operation, params)

    def _handle_resultset(self):
        (self._rows, eof) = self.db().protocol.get_rows()
        self.rowcount = len(self._rows)
        self._handle_eof(eof)
        self._next_row = 0
        QUERY_CACHE[self._qmd5] = (self._rows, eof)
        try:
            self.db().unread_result = False
        except:
            pass
        self._qmd5 = None

The above code is a proof of concept, there is lots of room for improvement. For example, you need something to invalidate entries in the query cache.

How to use MySQLCursorQueryCache

def main():

    cnx = mysql.connector.connect(database='test')
    cur = cnx.cursor(cursor_class=MySQLCursorQueryCache)

    cur.execute("SELECT NOW()")
    print cur.fetchone()
    time.sleep(2)
    cur.execute("SELECT NOW()")
    print cur.fetchone()

    cur.close()
    cnx.close()

When you would use the default cursor, both executed SQL statements would produce a different result. The above produces the following output:

(datetime.datetime(2010, 11, 22, 21, 20, 4),)
(datetime.datetime(2010, 11, 22, 21, 20, 4),)

When you’d like to have some statements cached, and some not, just create a second cursor cursor_class=MySQLCursorBuffered (see Buffering results with MySQL Connector/Python).

Comments

Marius Gedminas
How about making it not return the same result when query params differ, for starters?
geert

@Marius Gedminas Good point Marius! You’ll need to store the statement after substituting the parameters. Or, you could add the params-argument to the operation string when hashing it:

self._qmd5 = md5(operation + repr(params)).digest()