Here is two small examples showing the wonderful world of character sets and unicode using MySQL Connector/Python (using 0.1.2-devel and up) in both Python v2.x and v3.1.
The following table will be used with default character set latin7
, i.e. [ISO-8859-13](http://en.wikipedia.org/wiki/ISO/IEC_8859-13)
. Just setting it to [UTF-8](http://en.wikipedia.org/wiki/UTF-8)
would be bit boring!
CREATE TABLE `latin7test` (
`c1` varchar(60) DEFAULT NULL
) DEFAULT CHARSET=latin7
Things to note for the code listed below are:
- We’re using
charset='latin7'
as connection option. This is important! - We set
use_unicode=True
so the results coming from MySQL are encoded to unicode. For testing, we disable this later.
Python v2.x
Here is the code which will insert (Polish) latin7
text and selects them again from the table.
db = mysql.connect(user='root',db='test', buffered=True,
charset="latin7", use_unicode=True)
latin7 = [
# Hello in Polish
'dzie\xf1 dobry!',
'cze\xfa\xe3!'
]
cur = db.cursor()
stmt = 'INSERT INTO latin7test VALUES (%(c1)s)'
cur.execute(stmt, { 'c1' : latin7[0] } )
stmt = 'INSERT INTO latin7test VALUES (%s)'
cur.execute(stmt, (latin7[1],) )
stmt = 'SELECT * FROM latin7test'
cur.execute(stmt)
rows = cur.fetchall()
print(rows)
db.set_unicode(False)
cur.execute(stmt)rows = cur.fetchall()
print(rows)
cur.close()
db.close()
The result
[(u'dzie\u0144 dobry!',), (u'cze\u015b\u0107!',)]
[('dzie\xf1 dobry!',), ('cze\xfa\xe3!',)]
The above might look weird, but if you put this in a webpage with proper encoding or print it in a terminal which supports UTF8
or latin1
, it should look nice.
Python v3.1
db = mysql.connect(user='root',db='test',
charset="latin7",use_unicode=True)
latin7 = [
# Hello in Polish
b'dzie\xf1 dobry!',
b'cze\xfa\xe3!'
]
cur = db.cursor()
stmt = 'INSERT INTO latin7test VALUES (%(c1)s)'
cur.execute(stmt, { 'c1' : latin7[0] } )
stmt = 'INSERT INTO latin7test VALUES (%s)'
cur.execute(stmt, (latin7[1],) )
stmt = 'SELECT * FROM latin7test'
cur.execute(stmt)
rows = cur.fetchall()
print(rows)db.set_unicode(False)
cur.execute(stmt)
rows = cur.fetchall()
print(rows)
cur.close()
db.close()
The result
[('dzień dobry!',), ('cześć!',)]
[(b'dzie\xf1 dobry!',), (b'cze\xfa\xe3!',)]
The above looks nicer than the Python v2.4+ one. That’s because in Python v3.x every string is now unicode. The second line shows the same data, but encoded in latin7
and returned as bytes-objects since use_unicode
is set to False
.
Comments
Thankx man It really helped!!!
Live long and have a blessed life! :)
Hafeez, Pakistan