MySQL Connector/Python 2.1.1 took a while to release and that was because we had to add some more packages which contains the optional C Extension. Note that this is still Alpha and we want you guys to report any problems and requests.
The Connector/Python C Extension was added because in certain situations, for example reading a huge result set, can take a long time with pure Python. That’s why we choose to interface with Connector/C (libmysqlclient).
Note: Pure Python is still default and it will be kept that way!
Installing Connector/Python 2.1 didn’t change much:
$ sudo python setup.py install
If you’d like the C Extension, you have to first install MySQL Connector/C or have the MySQL Server development packages available. Careful with mixing 32 and 64-bit: make sure Python matches your MySQL libraries. Connector/Python will try to detect the mismatch and notify you.
For example, on OS X with development tools installed, I would do the following:
$ virtualenv CPYENV
$ source CPYENV/bin/activate
$ tar xzf ~/Downloads/mysql-connector-c-6.1.5-osx10.7-x86_64.tar.gz
$ tar xzf ~/Downloads/mysql-connector-python-2.1.1.tar.gz
$ cd mysql-connector-2.1.1
$ python setup.py install --with-mysql-capi=../mysql-connector-c-6.1.5-osx10.7-x86_64
If all goes well, the above would have compiled and install the C Extension together with the pure Python code inside a virtual environment. Here is how you can check if the C Extension is available:
import mysql.connector
mysql.connector.HAVE_CEXT
# Output == True
If you want to see the speed improvements, you can load up the employees sample database and do the following in the Python interpreter:
import mysql.connector
cnx = mysql.connector.connect(user='root', database='employees')
cnxc = mysql.connector.connect(use_pure=False, user='root', database='employees')
cur = cnx.cursor()
q = "SELECT * FROM salaries"
s=time(); cur.execute(q); r=cur.fetchall(); print("%.2f" % (time()-s))
# Output ~= 65.57
cur = cnxc.cursor()
s=time(); cur.execute(q); r=cur.fetchall(); print("%.2f" % (time()-s))
# Output ~= 13.09
That’s 66 seconds vs. 13 seconds using the C Extension.
If that is not fast enough, and it is not, you can directly load the C Extension and use the wrapper around the MySQL C API (see manual). Here’s an example:
import _mysql_connector
cnx = _mysql_connector.MySQL()
cnx.connect(user='root', database='employees')
cnx.query("SELECT emp_no, last_name, hire_date FROM employees")
print(cnx.fetch_row())
cnx.free_result()
cnx.close()
# Output == (10001, 'Facello', datetime.date(1986, 6, 26))
It is a bit different than using mysql.connector, but notice that result coming from the C Extension is also converted to Python data types.
How fast is using _mysql_connector? Lets say we want the raw data, save the following to a Python script file and execute:
from time import time
import _mysql_connector
cnx = _mysql_connector.MySQL(raw=True)
cnx.connect(user='root', database='employees')
cnx.query("SELECT * FROM salaries")
s = time()
row = cnx.fetch_row()
while row:
row = cnx.fetch_row()
cnx.free_result()
print("All fetched in %.2fs" % (time() - s))
cnx.close()
# Output ~= All fetched in 2.25s
If you put it all together, and this is not scientific, just on my OS X MacBook, SELECT * FORM slaries
:
SELECT * FORM salaries |
Duration |
---|---|
Pure Python, PEP-249 | 66s |
Pure + C Extension, PEP-249 | 13s |
C Extension, direct, with conversion | 12s |
C Extension, direct, raw | 3s |
If you want to dump big sets of data, and you want to do it the Python way, you can use the C Extension to get it faster.
Yes, the C Extension works and compiles on Windows!
Comments
I think the examples used to illastrate the difference of the speed between 2 versions are not meaningfull the second is fast than the first maybe because the data is already in the buffer pool of MySQL (so actually the second fetch with Connector with C Extension just get data from memory and no need for physical I/Os)
I think you should restart the mysql server(so the buffer pool is empty) to run the second test
I actually ran the SELECT * FROM salaries first, then re-run while timing it. So the data was warmed up. Real benchmarking would evolve more work, but that I think it’s an exercise for the users to publish that :)
The point is: it is faster, and it will solve a few things for people who want to stick with Connector/Python and need to process huge result sets.