Using character sets with MySQL Connector/Python

Here is two small examples showing the wonderful world of character sets and unicode using MySQL Connector/Python (using 0.1.2-devel and up) in both Python v2.x and v3.1.

The following table will be used with default character set latin7, i.e. [ISO-8859-13](http://en.wikipedia.org/wiki/ISO/IEC_8859-13). Just setting it to [UTF-8](http://en.wikipedia.org/wiki/UTF-8) would be bit boring!

CREATE TABLE `latin7test` (
  `c1` varchar(60) DEFAULT NULL
) DEFAULT CHARSET=latin7

Things to note for the code listed below are:

We’re using charset='latin7' as connection option. This is important!
We set use_unicode=True so the results coming from MySQL are encoded to unicode. For testing, we disable this later.

Python v2.x

Here is the code which will insert (Polish) latin7 text and selects them again from the table.

db = mysql.connect(user='root',db='test', buffered=True,
                   charset="latin7", use_unicode=True)

latin7 = [
    # Hello in Polish
    'dzie\xf1 dobry!',
    'cze\xfa\xe3!'
]

cur = db.cursor()
stmt = 'INSERT INTO latin7test VALUES (%(c1)s)'
cur.execute(stmt, { 'c1' : latin7[0] } )
stmt = 'INSERT INTO latin7test VALUES (%s)'
cur.execute(stmt, (latin7[1],) )
stmt = 'SELECT * FROM latin7test'
cur.execute(stmt)
rows = cur.fetchall()

print(rows)
db.set_unicode(False)
cur.execute(stmt)rows = cur.fetchall()
print(rows)

cur.close()
db.close()

The result

[(u'dzie\u0144 dobry!',), (u'cze\u015b\u0107!',)]
[('dzie\xf1 dobry!',), ('cze\xfa\xe3!',)]

The above might look weird, but if you put this in a webpage with proper encoding or print it in a terminal which supports UTF8 or latin1, it should look nice.

Python v3.1

db = mysql.connect(user='root',db='test',
                   charset="latin7",use_unicode=True)

latin7 = [
    # Hello in Polish    
    b'dzie\xf1 dobry!',
    b'cze\xfa\xe3!'
]

cur = db.cursor()
stmt = 'INSERT INTO latin7test VALUES (%(c1)s)'
cur.execute(stmt, { 'c1' : latin7[0] } )
stmt = 'INSERT INTO latin7test VALUES (%s)'
cur.execute(stmt, (latin7[1],) )
stmt = 'SELECT * FROM latin7test'
cur.execute(stmt)
rows = cur.fetchall()

print(rows)db.set_unicode(False)
cur.execute(stmt)
rows = cur.fetchall()
print(rows)

cur.close()
db.close()

The result

[('dzień dobry!',), ('cześć!',)]
[(b'dzie\xf1 dobry!',), (b'cze\xfa\xe3!',)]

The above looks nicer than the Python v2.4+ one. That’s because in Python v3.x every string is now unicode. The second line shows the same data, but encoded in latin7 and returned as bytes-objects since use_unicode is set to False.

Python v2.x

The result

Python v3.1

The result

Comments