To far fetched (pun inteded), some might think.. Below you’ll find a procedure to get a list of MySQL supported character sets and their collations. The output is Python and can be used to build a (big) tuple.

The problem is that character set IDs in MySQL have ‘gaps’. For example hebrew has ID 16, and the next character set in the COLLATIONS-table, tis620, has ID 18. Not a big problem, just a bit annoying. This list is hardcoded in MySQL Connector/Python and I needed something to easily maintain it.

Solutions were using text editing skills, parsing it in Python to produce a list, etc.. But why not a Stored Routine? The following stored procedure is looping over a cursor, calculating the number of rows the gap has, and inserts blanks as needed.

DROP PROCEDURE IF EXISTS python_getcharsets;
DELIMITER //
CREATE PROCEDURE python_getcharsets()
BEGIN
  DECLARE i,diff,cid,done INT DEFAULT 0;
  DECLARE chname,coname VARCHAR(32);
  DECLARE cur CURSOR FOR SELECT ID,CHARACTER_SET_NAME,
    COLLATION_NAME FROM INFORMATION_SCHEMA.COLLATIONS
    ORDER BY ID;
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
  OPEN cur;
  loop_cur: LOOP
    FETCH cur INTO cid,chname,coname;
    IF done THEN LEAVE loop_cur; END IF;
    SET i = i + 1; SET diff = cid - i;
    WHILE diff DO
      SELECT "None,";
      SET diff = diff - 1; SET i = i + 1;
    END WHILE;
    SELECT CONCAT('("',chname,'","',coname,'"), # ',cid);
  END LOOP loop_cur;
  CLOSE cur;
END//
DELIMITER ;

CALL python_getcharsets();
DROP PROCEDURE IF EXISTS python_getcharsets;

You would save the above to a file called getcharsets.sql for example, and execute it like this:

shell> mysql -N test < getcharsets.sql
..
("cp1251","cp1251_bulgarian_ci"), # 14
("latin1","latin1_danish_ci"), # 15
("hebrew","hebrew_general_ci"), # 16
None,
("tis620","tis620_thai_ci"), # 18
("euckr","euckr_korean_ci"), # 19
..

I’ll spare you the complete output, but as you can see from above sample: the gap has been stuffed with a None-Python value. Taking this output, you’ll inserted it your code:

desc = (
    None,
    ("big5","big5_chinese_ci"), # 1
    ("latin2","latin2_czech_cs"), # 2
    ("dec8","dec8_swedish_ci"), # 3
    ("cp850","cp850_general_ci"), # 4
..

Silly? Definitely basic stuff, but I got my code a bit faster, and cleaner!

Comments

Roland Bouman: Wed Feb 17 2010, 19:25

Nice, but what about this:select concat( 'desc = (' , group_concat( '\n '
, if( collations.id is null, 'None' , concat(
'(' , '“', character_set_name, '“' , ','
, '“', collation_name, '“'
, ')' )
) , if(ids.id=255, '', ',') , ' #', ids.id order by ids.id separator '' )
, '\n)' )from (
select (t0.id << 0) + (t1.id << 1)
+ (t2.id << 2) + (t3.id << 3)
+ (t4.id << 4) + (t5.id << 5)
+ (t6.id << 6) + (t7.id << 7) id
from (select 0 id union all select 1) t0 , (select 0 id union all select 1) t1 , (select 0 id union all select 1) t2 , (select 0 id union all select 1) t3
, (select 0 id union all select 1) t4 , (select 0 id union all select 1) t5 , (select 0 id union all select 1) t6 , (select 0 id union all select 1) t7
) idsleft join information_schema.collations
on ids.id = collations.id
Geert JM Vanderkelen: Wed Feb 17 2010, 19:39

@Roland: crazyness! But seems to work! :)
Roland Bouman: Thu Feb 18 2010, 00:50

Geert, here's some background to the craziness :) http://rpbouman.blogspot.com/2010/02/mysql-best-stored-routine-is-one-you.html