To far fetched (pun inteded), some might think.. Below you’ll find a procedure to get a list of MySQL supported character sets and their collations. The output is Python and can be used to build a (big) tuple.
The problem is that character set IDs in MySQL have ‘gaps’. For example hebrew has ID 16, and the next character set in the COLLATIONS-table, tis620, has ID 18. Not a big problem, just a bit annoying. This list is hardcoded in MySQL Connector/Python and I needed something to easily maintain it.
Solutions were using text editing skills, parsing it in Python to produce a list, etc.. But why not a Stored Routine? The following stored procedure is looping over a cursor, calculating the number of rows the gap has, and inserts blanks as needed.
DROP PROCEDURE IF EXISTS python_getcharsets; DELIMITER // CREATE PROCEDURE python_getcharsets() BEGIN DECLARE i,diff,cid,done INT DEFAULT 0; DECLARE chname,coname VARCHAR(32); DECLARE cur CURSOR FOR SELECT ID,CHARACTER_SET_NAME, COLLATION_NAME FROM INFORMATION_SCHEMA.COLLATIONS ORDER BY ID; DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1; OPEN cur; loop_cur: LOOP FETCH cur INTO cid,chname,coname; IF done THEN LEAVE loop_cur; END IF; SET i = i + 1; SET diff = cid - i; WHILE diff DO SELECT "None,"; SET diff = diff - 1; SET i = i + 1; END WHILE; SELECT CONCAT('("',chname,'","',coname,'"), # ',cid); END LOOP loop_cur; CLOSE cur; END// DELIMITER ; CALL python_getcharsets(); DROP PROCEDURE IF EXISTS python_getcharsets;
You would save the above to a file called getcharsets.sql for example, and execute it like this:
shell> mysql -N test < getcharsets.sql .. ("cp1251","cp1251_bulgarian_ci"), # 14 ("latin1","latin1_danish_ci"), # 15 ("hebrew","hebrew_general_ci"), # 16 None, ("tis620","tis620_thai_ci"), # 18 ("euckr","euckr_korean_ci"), # 19 ..
I’ll spare you the complete output, but as you can see from above sample: the gap has been stuffed with a None-Python value. Taking this output, you’ll inserted it your code:
desc = ( None, ("big5","big5_chinese_ci"), # 1 ("latin2","latin2_czech_cs"), # 2 ("dec8","dec8_swedish_ci"), # 3 ("cp850","cp850_general_ci"), # 4 ..
Silly? Definitely basic stuff, but I got my code a bit faster, and cleaner!