2014/04/23

mysql Crash-safe Replication

■A common request is to have replication crash-safe in the sense that the replication progress information always is in sync with what has actually been applied to the database, even in the event of a crash. Although transactions are not lost if the server crashes, it could require some tweaking to bring the slaves up again.
=>リアルの情報を覚えなぁきゃ。。

in the latest MySQL 5.6 milestone development release, the replication team has implemented crash-safety for the slave by adding the ability of committing the replication information together with the transaction (see Figure 1)
 Also, some fixes were done on the master to ensure that it recovers correctly
=>いくらtogotherを言っても、(時刻上)現実的は不可能だようね。あくまでもトランザクションの特性を利用する ACID
しかもマスター側の協力も必要


If you're familiar with replication, you know that the replication information is stored in two files: master.info and relay-log.info. The update of these files are arranged so that they are updated after the transaction had been applied. This means that if you have a crash between the transaction commit and the update of the files, the replication progress information would be wrong. In other words, a transaction cannot be lost this way, but there is a risk that a transaction could be applied yet another time. The usual way to avoid this is to have a primary key on all your tables. In that case, a repeated update of the table would cause the slave to stop, and you would have to use SQL_SLAVE_SKIP_COUNTER to skip the transaction and get the slave up and running again. This is better than losing a transaction, but it is nevertheless a nuisance. Removing the primary key to prevent the slave from stopping will only solve the problem partially: it means that the transaction would be applied twice, which would both place a burden on the application to handle dual entries and also require that the tables to be cleaned regularly. Both of these approches require either manual intervention or scripting support to handle.
=>ミクロの話ね。
あの2つのファイルの更新は後だから、トランザクションが無くなうことはなく、
二回を適用するされしまう。まずいけど、テーブル毎にPrimaryKeyがあれば、そのことを検知できる。
SQL_SLAVE_SKIP_COUNTER to skip the transaction はいい。
でも、本当にテーブル毎にPrimaryKeyがあるの?
そのトランザクションの本体を確認しかない。じゃないと適用される部分がある、安心できないね。。


■Crash-safe masters
Two problems related to crash-safe replication has been fixed in the master, both of which could cause some annoyance when the master recovered.

    If the master crashed when a binary log was rotated, it was possible that some orphan binlog files ended up in the binary log index file. This was fixed in 5.1 but is also a piece in the pussle of having crash-safe replication.
    Writing to the binary log is not an atomic operation, and if a crash occured while writing to the binary log, there were a possibility of a partial event at the end of the binary log.

    Now, the master recovers from this by truncating the binary log to the last known good position, removing the partially written transaction and rolling back the outstanding transactions in the storage engines.
=>当たり前のことが。。前の不足を修正した。別にアピールする




■Crash-safe slaves
The MySQL replication team decided to instead implement crash-safety by moving the replication progress information into system tables.
This is a more flexible solution and has several advantages compared to storing the positions in the InnoDB transaction log。

・If the replication information and data is stored in the same storage engine, it will allow both the data and the replication position to be updated as a single transaction, which means that it is crash-safe.
・If the replication information and data is stored in different storage engines, but both support XA, they can still be committed as a single transaction.
・The replication information is flushed to disk together with the transaction data. Hence writing the replication information directly to the InnoDB redo log does not offer a speed advantage, but does not prevent the user from reading the replication progress information easily.
・The tables can be read from a normal session using SQL commands, which also means that it can be incorporated into such things as stored procedures and stored functions.
ーーーーーーーーーーーーーーー
テーブル:
    slave_master_info
Master_id        
Number_of_lines     1    
Master_log_name     2     Master_Log_File
Master_log_pos     3     Read_Master_Log_Pos
Host     3     Master_Host
User_name     4     Master_User
User_password     5    
Port     6     Master_Port
Connect_retry     7     Connect_Retry
Enabled_ssl     8     Master_SSL_Allowed
Ssl_ca     9     Master_SSL_CA_File
Ssl_capath     10     Master_SSL_CA_Path
Ssl_cert     11     Master_SSL_Cert
Ssl_cipher     12     Master_SSL_Cipher
Ssl_key     13     Master_SSL_Key
Ssl_verify_servert_cert     14     Master_SSL_Verify_Server_Cert
Heartbeat     15    
Bind     16     Master_Bind
Ignored_server_ids     17     Replicate_Ignore_Server_Ids
Uuid     18     Master_UUID
Retry_count     19     Master_Retry_Count
   
   
    slave_relay_log_info
Master_id       
Number_of_lines    1   
Relay_log_name    2    Relay_Log_File
Relay_log_pos    3    Relay_Log_Pos
Master_log_name    4    Relay_Master_Log_File
Master_log_pos    5    Exec_Master_Log_Pos
Sql_delay    6    SQL_Delay
ーーーーーーーーーーーーーーー

■エンジン
The new tables are created at installation using the mysql_install_db script, as usual, and the default engine for these tables are are the same as for all system tables: MyISAM. As you know MyISAM is not very transactional, so it is necessary to set this to use InnoDB instead if you really want crash-safety. To change the engine for these tables you can just use a normal ALTER TABLE.

slave> ALTER TABLE mysql.slave_master_info ENGINE = InnoDB;
slave> ALTER TABLE mysql.slave_relay_log_info ENGINE = InnoDB;
Note that this works for these tables because they were designed to allow any storage engine to be used for them, but it does not mean that you can change the storage engine for other system tables and expect it to work.
==>ここはちょっと気になった?大丈夫?MyISAMのシステムテーブルは大丈夫?


■マスター側:
START TRANSACTION;
INSERT INTO articles(user, title, body)
      VALUE (4711, 'Taming the Higgs Boson using Clicker Training', '....');
UPDATE users SET articles = articles + 1 WHERE user_id = 4711;
COMMIT;
スレーブ側:

START TRANSACTION;
INSERT INTO articles(user, title, body)
      VALUE (4711, 'Taming the Higgs Boson using Clicker Training', '....');
UPDATE users SET articles = articles + 1 WHERE user_id = 4711;
UPDATE mysql.slave_relay_log_info
   SET Master_log_pos = @@Exec_Master_Log_Pos,
       Master_log_name = @@Relay_Master_Log_File,
       Relay_log_name = @@Relay_Log_File,
       Relay_log_pos = @@Relay_Log_Pos
COMMIT;


the updates of the slave_master_info table is not updated with each processed event. Depending on the value of sync_master_info there are a few alternatives
If sync_master_info = 0
    In this case, the slave_master_info table is just updated when the slave starts or stops (for any reason, including errors), if the relay log is rotated, or if you execute a CHANGE MASTER command.
If sync_master_info > 0
    Then the slave_master_info table will be updated every sync_master_info event.
   
■問題:
==>the UPDATE statement at the end of each transaction. This means
every transaction needs to lock the same row! This makes it impossible to run
two transactions in parallel on the slave.   
my suggestion it would make sense to include the event group database as the second component for the primary key; this way it is possible to obtain current position for given database/apply thread, as well as globally, depending on the replication method.