Wednesday, 30 September 2009
I thought I fixed this few days ago, but apparently I totally forgot to run the tests.

First time I have to go around a DeprecationWarning, and I thought it was cool to share. Maybe I'm totally doing it incorrectly, but at least I used a lambda! Whee!

../mysql/connector/protocol.py:21: DeprecationWarning: the sha module is deprecated; use the hashlib module instead
try:
from hashlib import sha1
except:
import sha
sha1 = lambda s: sha.new(s)

print sha1('geert').digest()
Update 2009-10-01: I've changed the above code yet again to make it simpler and more conform to PEP 8 which says to use ImportError exception (thanks to Marius for this tip!):
try:
from hashlib import sha1
except ImportError:
from sha import new as sha1
Today we received a bug report saying that we shouldn't throw an exception but instead return what MySQLdb is returning. Bit research and MySQLdb is actually returning None for illegal dates: good!
There is now a fix (showing up soon) on Launchpad which will return dates as None where they are inserted as '0000-00-00'.
A few lines of Python:
..
data = [
(datetime.now().date(),datetime.now()),
('0000-00-00','0000-00-00 00:00:00'),
('1000-00-00','9999-00-00 00:00:00'),
]

for d in data:
stmt_insert = "INSERT INTO %s (c1,c2) VALUES (%%s,%%s)" % (tbl)
try:
cursor.execute(stmt_insert, d)
except (mysql.connector.errors.InterfaceError, TypeError) as e:
print "Failed inserting %s\nError: %s\n" % (d,e)

if cursor.warnings:
print cursor.warnings
..

The script outputs the following data, and notice also the warnings (SQL Mode set to NO_ZERO_IN_DATE,NO_ZERO_DATE):
[(u'Warning', 1265L, u"Data truncated for column 'c1' at row 1"),
(u'Warning', 1264L, u"Out of range value for column 'c2' at row 1")]
[(u'Warning', 1265L, u"Data truncated for column 'c1' at row 1"),
(u'Warning', 1264L, u"Out of range value for column 'c2' at row 1")]
(datetime.date(2009, 9, 30), datetime.datetime(2009, 9, 30, 15, 12, 23))
(None, None)
(None, None)
Another change we did today was returning a row as tuple, and rows as list of tuples.

Tip: use STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE SQL modes in MySQL 5.0 and later for new projects to make sure no illegal dates are inserted, or fix your application.
Monday, 28 September 2009
I didn't test MySQL Connector/Python on a Microsoft Windows machine, but after somebody did and filed a bug report, I couldn't resist launching VirtualBox.

Fixing the problem with socket.MSG_WAITALL not being available on Windows wasn't much work. However, adding support for MySQL's Named Pipe on Windows seems to be a bigger challenge.
I couldn't find docs for win32api anymore on Python's doc website for win32api, which is referenced everywhere, so I guess somebody needs to give a hint or write a patch.
For now, Windows users of Connector/Python should use TCP to connect to MySQL.

Thursday, 24 September 2009

Currently, MySQL Connector/Python is only available through Launchpad. Here's a small how-to for installing it using the Bazaar bzr client tool. All you need is a machine with Python installed (v2.3 or higher, but not v3.x), and.. well, that's it!

shell> bzr checkout lp:~mysql/myconnpy/main myconnpy
shell> cd myconnpy
shell> python setup.py install

Please check it out. It's not feature complete yet, and probably can use some code optimizations here and there. I'm looking forward to bug reports! Also, only works with MySQL 4.1 and above.

Here is a little script that shows how it works, save it in file test_myconn.py:


import mysql.connector

if __name__ == "__main__":
db = mysql.connector.Connect(host="localhost",
user="root",password="",database="test")
cursor = db.cursor()
cursor.execute("SHOW ENGINES")

for row in cursor.fetchall():
print row

cursor.close()
db.close()

Execute it like this and you should see the available storage engines:

shell> python test_myconn.py
[u'InnoDB', u'YES', u'Supports transactions, row-level locking, and foreign keys', u'YES', u'YES', u'YES']
[u'MRG_MYISAM', u'YES', u'Collection of identical MyISAM tables', u'NO', u'NO', u'NO']
[u'BLACKHOLE', u'YES', u'/dev/null storage engine (anything you write to it disappears)', u'NO', u'NO', u'NO']
[u'CSV', u'YES', u'CSV storage engine', u'NO', u'NO', u'NO']
[u'MEMORY', u'YES', u'Hash based, stored in memory, useful for temporary tables', u'NO', u'NO', u'NO']
[u'FEDERATED', u'NO', u'Federated MySQL storage engine', None, None, None]
[u'ARCHIVE', u'YES', u'Archive storage engine', u'NO', u'NO', u'NO']
[u'MyISAM', u'DEFAULT', u'Default engine as of MySQL 3.23 with great performance', u'NO', u'NO', u'NO']

Yesterday I gave Google's Picasa again a try. Importing my iPhoto library I ran into the now infamous error message: "corrupted AlbumData.xml". The Apple support website has some tips on how to fix this, but it's too vague. Luckily there are smart people out there and I stumbled upon a good way in the Picasa help forums to debug your AlbumData.xml using TextWrangler.

  1. Install TextWrangler if you haven't already.
  2. Open your iPhoto Library using Finder: right-click (Ctrl-Click) and select 'Show Package Contents'. Copy the AlbumData.xml file to your desktop.
  3. Open the AlbumData.xml from on your Desktop using TextWrangler.
  4. From the Text-menu, choose 'Zap Gremlins'.
  5. Deselect 'Non-ASCII Characters' and select 'Replace with: •'.
  6. Wait for TextWrangler to finish the zapping! Can take a while.
  7. When done, search (Cmd-f) for occurrence of the • (Option-8)
  8. Open iPhoto and fix wherever you find a dot in the copy of AlbumData.xml

If you are lucky, you'll find quickly the character that is bugging Picasa. I had only one of these Gremlins in the XML file.

So, I gave Picasa a try, but it failed importing captions of the pictures, only tags. I haven't found a fix/workaround for that, so it's a no-go for me still.

Wednesday, 23 September 2009

This has been already mentioned in on a few blogs, but I thought it would be good to post here too. Note: this is not using MacPorts!

To get MySQL and Python going on MacOS X 10.6 you need the following:


Install MySQL using the tar ball and make sure you get it up and running.

Compile MySQL-python (leave out setting the $PATH when it's already done):

shell> PATH="/usr/local/mysql/bin:$PATH"
shell> tar xzf MySQL-python-1.2.3c1.tar.gz
shell> cd MySQL-python-1.2.3c1
shell> ARCHFLAGS="-arch x86_64" /usr/bin/python setup.py build
shell> /usr/bin/python setup.py install

I'm giving the full path for python to make sure it does not use the MacPorts one.
EDIT 2009-09-24: If it can't find mysql_config, you did not set your path correctly, but you can update the setup_posix.py fine of MySQL-Python and change it to something like this (as seen on our forums):


mysql_config.path = "/usr/local/mysql-5.1.39-osx10.5-x86_64/bin/mysql_config"

Here a test script test_mysql.py:

import MySQLdb

if __name__ == "__main__":
db = MySQLdb.connect(host="localhost",
user="root",db="test")
cursor = db.cursor()
cursor.execute("SHOW ENGINES")

for row in cursor.fetchall():
print row

cursor.close()
db.close()

Run the above script like:


shell> /usr/bin/python test_mysql.py

It should output the available storage engines.

Monday, 21 September 2009
EDIT: I'm a bit late with upgrading to Snow Leopard (just a few days!) but MacPorts has a package available which should work with Mac OS X 10.6.

If you want to use MacPorts on Mac OS X 10.6 (Snow Leopard) and can't wait for the next release, you best get the latest from the SVN repository (posted on MacRumors):
  1. Get the latest XCode first: the one you installed on Leopard might not work. ./configure will fail not finding a decent compiler.
  2. Open up the Terminal
  3. shell> svn co http://svn.macports.org/repository/macports/trunk/base/ macports-base
  4. shell> cd macports-base
  5. shell> ./configure
  6. shell> make
  7. shell> sudo make install
  8. shell> sudo /opt/local/bin/port -v selfupdate
Note: 'shell>' denotes the prompt in the Terminal application.
Friday, 18 September 2009
Using MySQL Cluster 7.0, if you start the management node it will write a binary, cached version of the configuration. The default location is (currently) a bit tricky, if you compile yourself, you'll find it in: <BASEDIR>/mysql-cluster.
Make sure to specify everytime the --configdir option when starting the ndb_mgmd.
shell> ndb_mgmd -f config.ini --configdir=/path/to/cachedir
I'm using the same value as Datadir for the --configdir option.
An elegant solution it is not, IMHO. Maybe it's good to make your own ndb_mgmd_safe?
Thursday, 17 September 2009
This is a follow-up on my previous blog entry explaining when the LOST_EVENTS incident occurs.

The LOST_EVENTS incident is reported in the MySQL binary log in two cases. Each time the Slave will try to tell you why. Here is an example output of the SHOW SLAVE STATUS done on the Slave MySQL Cluster (bit edited to fit):
 mysql> SHOW SLAVE STATUS\G
Last_Errno: 1590
Last_Error: The incident LOST_EVENTS occured on the master.
Message: mysqld startup
The message can be one of the following (MySQL Cluster 7.0.7):
  • mysqld startup: SQL Node (mysqld) acting as Master was started.
  • cluster disconnect: SQL Node acting as Master lost connection to its data nodes.
You can easily check for LOST_EVENTS in the binary logs on the Master using the following shell script:
cd /where/binlog/are/
for f in `ls binglog.*`;
do
echo $f
mysqlbinlog $f | grep LOST_EVENTS
done

.. it could have been the title for the next James Bond movie, but no.. It's much more exciting!

The LOST_EVENTS incident is reported each time you are starting a MySQL server which is binary logging and is participating as an SQL Node in MySQL Cluster setup. Slaves reading from these binary logs will stop their I/O Thread (the one getting information from the master) and report this incident as an error (see below for example).

It is very important that this incident is reported and is stopping the replication between MySQL Clusters. If not, you could end up with inconsistent data!

A world without LOST_EVENTS incidents

You have a MySQL Cluster running with two SQL Nodes. One mysqld is binary logging, getting all updates happening in the cluster. You have a Slave MySQL Cluster which has a mysqld reading from this Master and all is replicating just fine, until..

Somebody trips over the wire and the SQL Node doing binary logging goes down. *BANG*. Luckily this person's head didn't hit the rack and is still conscious enough for quickly putting back the power cord into the socket. The machine spins up and runs happily the SQL node. All is good. Well, not really.. Next day you found your mobile silent because your girlfriend had a headache and you missed an alarm. Few hours later you're getting angry reports from customers: they are missing data on the slave?

What the..?

If your Master SQL Node doing binary logging fails, and you still have other SQL Nodes doing updates on data in your Cluster, you might end up with the following situation:

On Master:


mysql> SELECT * FROM t1 ORDER BY c1;
+----+----------+---------------------+
| c1 | serverid | c2 |
+----+----------+---------------------+
| 1 | 1 | 2009-09-17 12:50:39 |
| 2 | 2 | 2009-09-17 12:51:03 |
| 3 | 2 | 2009-09-17 12:51:06 |
| 4 | 1 | 2009-09-17 12:51:09 |
| 5 | 2 | 2009-09-17 13:01:46 | Missing on slave!
| 6 | 2 | 2009-09-17 13:01:47 | Missing on slave!
| 7 | 1 | 2009-09-17 13:02:36 |
+----+----------+---------------------+

On Slave:


mysql> SELECT * FROM t1 ORDER BY c1;
+----+----------+---------------------+
| c1 | serverid | c2 |
+----+----------+---------------------+
| 1 | 1 | 2009-09-17 12:50:39 |
| 2 | 2 | 2009-09-17 12:51:03 |
| 3 | 2 | 2009-09-17 12:51:06 |
| 4 | 1 | 2009-09-17 12:51:09 |
| 7 | 1 | 2009-09-17 13:02:36 |
+----+----------+---------------------+

The above is actual output of what could happen when LOST_EVENTS incident is not written in the binary logs due to a bug.

The Master has 2 more rows, which were inserted on the 2nd SQL Node, while the SQL node doing binary logging was down. The replication picked up after the failing SQL Node came back, but now there is data missing: the Slave misses or lost events. What's in a name, huh!?

A world with LOST_EVENTS incidents

If your SQL Node doing binary logging fails, and restarts, it will report in its binary logs that there might be events lost: LOST_EVENTS incident. The Slave's I/O thread will stop and report an error. Good DBA's will of course monitor Slaves and they will have to act upon this.

Here an example of how the Slave looks like when it receives a LOST_EVENTS incident:


mysql> SHOW SLAVE STATUS\G
..
Slave_IO_Running: Yes
Slave_SQL_Running: No
..
Last_Errno: 1590
Last_Error: The incident LOST_EVENTS occurred on the master..
..

Now, if you had only 1 SQL Node in the Master Cluster, you're done for. You'll have re-initialize your Slave Cluster because you can not guarantee the consistency of the data.

To prevent this, you have to have at least 2 SQL Nodes doing binary logging. This way, if one fails, you can switch to another replication channel. This switching over is explained in the MySQL Manual, and will be material for a next article!

Conclusion

If you got a LOST_EVENTS incident on the Slave, consider yourself (kind of) lucky! Because then you know there could be something wrong, and not find out later that your Slave is inconsistent with the Master.

Wednesday, 16 September 2009
What does the following mean?
WHEN: 1:00 pm Pacific Time

To most folks in the US it would mean: after lunch in the west.

To most in rest of the world: "Bah! They forgot there is a world outside their own, again!".

If you specify a time in your own time zone, also specify the corresponding UTC (aka GMT) time so people can easily calculate at what time the meeting is in theirs. Also adding the actual date is good right next to the time. Don't rely on the date of the email itself, or saying something like "Later today..".

The burden is on the writer, not the reader!

BTW, that example comes straight from an "internal only email". GRRrr.
Friday, 4 September 2009
EDIT: This is for those that haven't (yet) upgraded to Mac OS X 10.6 (Snow Leopard) where Java SE 6 is now default and comes with sources and documentation by default.

Mac OS X 10.5 comes with Java 5 and 6 (v1.5 and v1.6 respectively). By default, however, only the documentation and source for Java 5 is provided. If you want to develop for Java 6 you'll need to download the Java for Mac OS X 10.5 Update 5 Developer Documentation (Update 5 was at time of writing most current).
Where you get it? Log into Apple Developer Connection website (using your Apple ID or free registration) and go to Downloads > Java. That section should have always the most current update. Don't make the mistake going Documentation as it will have older updates.
Tuesday, 1 September 2009
In my previous post we discussed how to start MySQL Cluster 7.0 with two management nodes. A nice new feature is that the 2nd ndb_mgmd doesn't need the configuration file: it is fetching it from the other management node.
This article will describe how we start this cluster after a shut down, changing configuration and do a rolling restart.

The configuration
The config.ini stored on the first management node:

[NDBD DEFAULT]
Datadir=/data2/users/geert/cluster/master
NoOfReplicas=2
DataMemory=80M
IndexMemory=10M

[NDB_MGMD DEFAULT]
Datadir=/data2/users/geert/cluster/master

[NDB_MGMD]
Id=1
Hostname = machine-1

[NDB_MGMD]
Id=2
Hostname = machine-2

[NDBD]
Id=3
Hostname = machine-3

[NDBD]
Id=4
Hostname = machine-4

[API]
[API]
[API]
[API]


Starting MySQL Cluster 7.0 with 2 management nodes

We assume that the cluster from previous post was shut down and we need to restart it. Here are the instructions to do so.
We start both management node process the same way, without the options --initial or --reload!

# machine-1
shell> ndb_mgmd --configdir=/path/to/configcache/dir
..NDB Cluster Management Server. mysql-5.1.35 ndb-7.0.7
..Loaded config from '/path/to/configcache/dir/ndb_1_config.bin.1'

# machine-2
shell> ndb_mgmd --configdir=/path/to/configcache/dir
..NDB Cluster Management Server. mysql-5.1.35 ndb-7.0.7
..Loaded config from '/path/to/configcache/dir/ndb_2_config.bin.1'

Data nodes are started on machine-3 and machine-4:

# machine-3
shell> ndbd -c machine-1
..Configuration fetched from 'machine-1', generation: 1

# machine-4
shell> ndbd -c machine-2
..Configuration fetched from 'machine-2', generation: 1

That should bring your cluster back up, ready for some experiments!

Rolling restart after configuration change

Lets assume we want more memory to store data and index information. We change the following in the configuration file config.ini, which you find on the first management node, machine-1:

[NDBD DEFAULT]
DataMemory=160M
IndexMemory=20M

Save your new config.ini and kill the ndb_mgmd process on machine-1, followed by starting it again with the --reload option:

# machine-1
shell> killall ndb_mgmd
shell> ndb_mgmd -f config.ini --reload --configdir=/path/to/configcache/dir
..NDB Cluster Management Server. mysql-5.1.35 ndb-7.0.7
..Loaded config from '/path/to/configcache/dir/ndb_1_config.bin.1'

The above output might be a bit confusing: we started with a changed config.ini but it said it loaded config from the previous cached version. This is normal. It needs to first read the old to know the changes from the new. The real magic is shown in the cluster log ndb_1_cluster.log on machine-1 (simplified for this blog post):

.. Detected change of config.ini on disk, will try to set it
when all ndb_mgmd(s) started. This is the actual diff:
[ndbd(DB)]
NodeId=3
-IndexMemory=10485760
+IndexMemory=20971520

[ndbd(DB)]
NodeId=4
-IndexMemory=10485760
+IndexMemory=20971520
..
Node 2 connected
Starting configuration change, generation: 1
Configuration 2 commited
Config change completed! New generation: 2

Notice that currently in MySQL Cluster 7.0.7 there is bug that when changing 2 parameters, only 1 will show up when the difference is shown in the logs.
On the second management node you'll find in ndb_2_cluster.log something like this:

..Node 2: Node 1 Connected
..
..Configuration 2 commited

Both management nodes have agreed on the same configuration, and both have it binary cached in files named like ndb_2_config.bin.*.
We continue now with restarting the data nodes, while connected to either management node do the following:

# machine-1 or machine-2
shell> ndb_mgm
ndb_mgm> ALL REPORT MEMORY USAGE
Node 3: Data usage is 0%(4 32K pages of total 2560)
Node 3: Index usage is 0%(8 8K pages of total 1312)
Node 4: Data usage is 0%(4 32K pages of total 2560)
Node 4: Index usage is 0%(8 8K pages of total 1312)
ndb_mgm> 3 RESTART
Node 3: Node shutdown initiated
Node 3: Node shutdown completed, restarting, no start.
Node 3 is being restarted
Node 3: Started (version 7.0.7)
ndb_mgm> 4 RESTART
Node 4: Node shutdown initiated
Node 4: Node shutdown completed, restarting, no start.
Node 4 is being restarted
Node 4: Data usage decreased to 0%(0 32K pages of total 5120)
Node 4: Started (version 7.0.7)
ndb_mgm> ALL REPORT MEMORY USAGE
Node 3: Data usage is 0%(6 32K pages of total 5120)
Node 3: Index usage is 0%(8 8K pages of total 2592)
Node 4: Data usage is 0%(6 32K pages of total 5120)
Node 4: Index usage is 0%(8 8K pages of total 2592)

The ALL REPORT MEMORY USAGE output shows that the configuration took effect and a rolling restart was succesful.