Update Feb 6 2014: I have edited the example code a bit. Note that this has never been included in any GA release of Connector/Python.

Last week, my colleague Massimo and I discussed how to handle big result sets coming from MySQL in Python. The problem is that MySQL doesn’t support server-side cursors, so you need to select everything and then read it. You can do it either buffered or not. MySQL Connector/Python defaults to non-buffered, meaning that you need to fetch all rows after issuing a SELECT statement. You can also turn on the buffering, mimicking what MySQL for Python (MySQLdb) does.

For big result sets, it’s better to limit your search. You can do this using an integer primary key or some temporal field for example. Or you can use the LIMIT keyword. The latter solution is what is used in the MySQLCursorServerSide cursor-class. Using the SELECT it creates a temporary table from which the fetch-methods will get the information. It is something people have probably implemented in their applications, but I hope this new class will make it easier since it’s done transparently.

The code is not pushed yet, but expect it to be available in next release. Here is an example how you could use it. This code selects cities staring with Z, loops over the result getting the country (yes, this is a simple join made difficult):

    cnx = db.connect(user='root', db='world')
    cur = cnx.cursor()
    curCity = cnx.cursor(db.cursor.MySQLCursorServerSide)

    curCity.execute(
        "SELECT ID,Name,CountryCode FROM City "
        "WHERE NAME LIKE 'Z%' ORDER BY ID")

    query = "SELECT Code, Name FROM Country WHERE CODE = %s"
    for city in curCity:
        cur.execute(query, (city[2],))
        country = cur.fetchone()
        print("{0} ({1})".format(city[1], country[1]))

    cur.close()
    cnx.close()

I guess the main advantage is that you can use two or more cursor objects with the same connection without the need of buffering everything in Python. On the MySQL side, the temporary table could go to disk when to big. It’s maybe slower, but keeping big result sets in memory ain’t good either.

Comments are welcome!