前面我们写了btlike的相关搭建教程:
BTLike Golang爬虫 LNMP面板 PHP前端 完整图文教程
但是等我们迁移环境的时候就需要备份Elasticsearch,但是我们这里又没有Elasticsearch,所以只能从数据库同步了。
网上搜了很多资料,只能通过中间插件去做,最后选定了go-mysql-elasticsearch(国人开发的)。下面我们给出图文教程。
go-mysql-elasticsearch 项目地址:https://github.com/siddontang/go-mysql-elasticsearch
这个中间件是用Golang语言写的,所以我们这里需要预先准备好Golang的环境。
安装完成初始化,我们这里就不再写详细步骤,不会的请看前面关于btlike的搭建教程。
恢复方法:https://www.jiloc.com/42711.html
由于Btlike的Elasticsearch结构特殊性导致go-mysql-elasticsearch 不能恢复数据!!!
以下内容仅为go-mysql-elasticsearch 使用实例!!!
Table of Contents
go-mysql-elasticsearch 插件安装
yum install go go get github.com/tools/godep go get github.com/siddontang/go-mysql-elasticsearch cd $GOPATH/src/github.com/siddontang/go-mysql-elasticsearch make
配置mariadb,mysql
官方原文注意事项:
- binlog format must be row.
- binlog row image must be full for MySQL, you may lost some field data if you update PK data in MySQL with minimal or noblob binlog row image. MariaDB only supports full row image.
- Can not alter table format at runtime.
- MySQL table which will be synced must have a PK(primary key), multi columns PK is allowed now, e,g, if the PKs is (a, b), we will use “a:b” as the key. The PK data will be used as “id” in Elasticsearch.
- You should create the associated mappings in Elasticsearch first, I don’t think using the default mapping is a wise decision, you must know how to search accurately.
mysqldump
must exist in the same node with go-mysql-elasticsearch, if not, go-mysql-elasticsearch will try to sync binlog only.- Don’t change too many rows at same time in one SQL.
修改数据库配置文件,默认文件位置为:/etc/my.cnf , 确保有以下配置内容
- 开启bin-log
- binglog_foramt格式必须为row
- 配置server_id 为1001
- binlog-row-image 必须为FULL
代码段如下:
[mysqld] log-bin=mysql-bin binlog_format=row server_id=1001 binlog-row-image=full
修改配置后记得重启mysql服务
/etc/init.d/mysql restart
配置go-mysql-elasticsearch 插件:
vi?etc/river.toml
# MySQL address, user and password # user must have replication privilege in MySQL. my_addr = "127.0.0.1:3306" my_user = "root" my_pass = "数据库密码" # Elasticsearch address es_addr = "Elasticsearch的IP地址:9200" # Path to store data, like master.info, and dump MySQL data data_dir = "./var" # Inner Http status address stat_addr = "127.0.0.1:12800" # pseudo server id like a slave server_id = 1001 #此ID必须与上面的server_id一致 # mysql or mariadb flavor = "mysql" # mysqldump execution path # if not set or empty, ignore mysqldump. mysqldump = "mysqldump" # MySQL data source [[source]] schema = "torrent" #数据库名 # Only below tables will be synced into Elasticsearch. # "test_river_[0-9]{4}" is a wildcard table format, you can use it if you have many sub tables, like table_0000 - table_1023 # I don't think it is necessary to sync all tables in a database. # 这里就是需要添加索引的表 tables = ["torrent[0-9]{1}","torrenta","torrentb","torrentc","torrentd","torrente","torrentf"] # Below is for special rule mapping [[rule]] schema = "torrent" # 数据库名 table = "torrent[0-9]{1}" # 表名 index = "torrent" # 索引名,跟之前程序创立的一致即可 [[rule]] schema = "torrent" table = "torrenta" index = "torrent" [[rule]] schema = "torrent" table = "torrentb" index = "torrent" [[rule]] schema = "torrent" table = "torrentc" index = "torrent" [[rule]] schema = "torrent" table = "torrentd" index = "torrent" [[rule]] schema = "torrent" table = "torrente" index = "torrent" [[rule]] schema = "torrent" table = "torrentf" index = "torrent"
运行同步命令:
cd $GOPATH/src/github.com/siddontang/go-mysql-elasticsearch ./bin/go-mysql-elasticsearch -config=./etc/river.toml
视数据库的量而定,此步需要花费很长时间。同步时可以将其他爬虫程序关闭。
可以将此命令放入screen中执行。
不会Screen ?Linux Screen 简单用法 图文教程
可以通过别的终端执行如下命令查看执行情况:
curl 127.0.0.1:12800/stat
server_current_binlog:(mysql-bin.000020, 343)
read_binlog:(mysql-bin.000018, 0)
insert_num:99397
update_num:0
delete_num:0
如果以上数字没有变化请检查配置选项及文件。