To far fetched (pun inteded), some might think.. Below you’ll find a procedure to get a list of MySQL supported character sets and their collations. The output is Python and can be used to build a (big) tuple
.
The problem is that character set IDs in MySQL have ‘gaps’. For example hebrew
has ID 16, and the next character set in the COLLATIONS
-table, tis620
, has ID 18. Not a big problem, just a bit annoying. This list is hardcoded in MySQL Connector/Python and I needed something to easily maintain it.
Solutions were using text editing skills, parsing it in Python to produce a list, etc.. But why not a Stored Routine? The following stored procedure is looping over a cursor, calculating the number of rows the gap has, and inserts blanks as needed.
DROP PROCEDURE IF EXISTS python_getcharsets;
DELIMITER //
CREATE PROCEDURE python_getcharsets()
BEGIN
DECLARE i,diff,cid,done INT DEFAULT 0;
DECLARE chname,coname VARCHAR(32);
DECLARE cur CURSOR FOR SELECT ID,CHARACTER_SET_NAME,
COLLATION_NAME FROM INFORMATION_SCHEMA.COLLATIONS
ORDER BY ID;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
OPEN cur;
loop_cur: LOOP
FETCH cur INTO cid,chname,coname;
IF done THEN LEAVE loop_cur; END IF;
SET i = i + 1; SET diff = cid - i;
WHILE diff DO
SELECT "None,";
SET diff = diff - 1; SET i = i + 1;
END WHILE;
SELECT CONCAT('("',chname,'","',coname,'"), # ',cid);
END LOOP loop_cur;
CLOSE cur;
END//
DELIMITER ;
CALL python_getcharsets();
DROP PROCEDURE IF EXISTS python_getcharsets;
You would save the above to a file called getcharsets.sql
for example, and execute it like this:
shell> mysql -N test < getcharsets.sql
..
("cp1251","cp1251_bulgarian_ci"), # 14
("latin1","latin1_danish_ci"), # 15
("hebrew","hebrew_general_ci"), # 16
None,
("tis620","tis620_thai_ci"), # 18
("euckr","euckr_korean_ci"), # 19
..
I’ll spare you the complete output, but as you can see from above sample: the gap has been stuffed with a None
-Python value. Taking this output, you’ll inserted it your code:
desc = (
None,
("big5","big5_chinese_ci"), # 1
("latin2","latin2_czech_cs"), # 2
("dec8","dec8_swedish_ci"), # 3
("cp850","cp850_general_ci"), # 4
..
Silly? Definitely basic stuff, but I got my code a bit faster, and cleaner!
Comments
, if( collations.id is null, 'None' , concat(
'(' , '“', character_set_name, '“' , ','
, '“', collation_name, '“'
, ')' )
) , if(ids.id=255, '', ',') , ' #', ids.id order by ids.id separator '' )
, '\n)' )from (
select (t0.id << 0) + (t1.id << 1)
+ (t2.id << 2) + (t3.id << 3)
+ (t4.id << 4) + (t5.id << 5)
+ (t6.id << 6) + (t7.id << 7) id
from (select 0 id union all select 1) t0 , (select 0 id union all select 1) t1 , (select 0 id union all select 1) t2 , (select 0 id union all select 1) t3
, (select 0 id union all select 1) t4 , (select 0 id union all select 1) t5 , (select 0 id union all select 1) t6 , (select 0 id union all select 1) t7
) idsleft join information_schema.collations
on ids.id = collations.id