The LOST_EVENTS incident

.. it could have been the title for the next James Bond movie, but no.. It’s much more exciting!

The LOST_EVENTS incident is reported each time you are starting a MySQL server which is binary logging and is participating as an SQL Node in MySQL Cluster setup. Slaves reading from these binary logs will stop their I/O Thread (the one getting information from the master) and report this incident as an error (see below for example).

It is very important that this incident is reported and is stopping the replication between MySQL Clusters. If not, you could end up with inconsistent data!

A world without LOST_EVENTS incidents

You have a MySQL Cluster running with two SQL Nodes. One mysqld is binary logging, getting all updates happening in the cluster. You have a Slave MySQL Cluster which has a mysqld reading from this Master and all is replicating just fine, until..

Somebody trips over the wire and the SQL Node doing binary logging goes down. BANG. Luckily this person’s head didn’t hit the rack and is still conscious enough for quickly putting back the power cord into the socket. The machine spins up and runs happily the SQL node. All is good. Well, not really.. Next day you found your mobile silent because your girlfriend had a headache and you missed an alarm. Few hours later you’re getting angry reports from customers: they are missing data on the slave?

What the..?

If your Master SQL Node doing binary logging fails, and you still have other SQL Nodes doing updates on data in your Cluster, you might end up with the following situation:

On Master:

mysql> SELECT * FROM t1 ORDER BY c1;
+----+----------+---------------------+
| c1 | serverid | c2                  |
+----+----------+---------------------+
|  1 |        1 | 2009-09-17 12:50:39 |
|  2 |        2 | 2009-09-17 12:51:03 |
|  3 |        2 | 2009-09-17 12:51:06 |
|  4 |        1 | 2009-09-17 12:51:09 |
|  5 |        2 | 2009-09-17 13:01:46 | **Missing on slave!**
|  6 |        2 | 2009-09-17 13:01:47 | **Missing on slave!**
|  7 |        1 | 2009-09-17 13:02:36 |
+----+----------+---------------------+

On Slave:

mysql> SELECT * FROM t1 ORDER BY c1;
+----+----------+---------------------+
| c1 | serverid | c2                  |
+----+----------+---------------------+
|  1 |        1 | 2009-09-17 12:50:39 |
|  2 |        2 | 2009-09-17 12:51:03 |
|  3 |        2 | 2009-09-17 12:51:06 |
|  4 |        1 | 2009-09-17 12:51:09 |
|  7 |        1 | 2009-09-17 13:02:36 |
+----+----------+---------------------+

The above is actual output of what could happen when LOST_EVENTS incident is not written in the binary logs due to a bug

The Master has 2 more rows, which were inserted on the 2nd SQL Node, while the SQL node doing binary logging was down. The replication picked up after the failing SQL Node came back, but now there is data missing: the Slave misses or lost events. What’s in a name, huh!?

A world with LOST_EVENTS incidents

If your SQL Node doing binary logging fails, and restarts, it will report in its binary logs that there might be events lost: LOST_EVENTS incident. The Slave’s I/O thread will stop and report an error. Good DBA’s will of course monitor Slaves and they will have to act upon this.

Here an example of how the Slave looks like when it receives a LOST_EVENTS incident:

mysql> SHOW SLAVE STATUS\G
  ..
  Slave_IO_Running: Yes
  Slave_SQL_Running: No
  ..
  Last_Errno: 1590
  Last_Error: The incident LOST_EVENTS occurred on the master..
  ..

Now, if you had only 1 SQL Node in the Master Cluster, you’re done for. You’ll have re-initialize your Slave Cluster because you can not guarantee the consistency of the data.

To prevent this, you have to have at least 2 SQL Nodes doing binary logging. This way, if one fails, you can switch to another replication channel. This switching over is explained in the MySQL Manual, and will be material for a next article!

Conclusion

If you got a LOST_EVENTS incident on the Slave, consider yourself (kind of) lucky! Because then you know there could be something wrong, and not find out later that your Slave is inconsistent with the Master.

Comments

torlan
There is a solution?
Geert Vanderkelen
@torlan Depends on what the problem was. The LOST_EVENTS is only there to make sure that you are notified that the Slave might miss data. Why, that’s something you need to look for on the Master.