Analysis of two main schemes for SparkSQL nodes accessing dual-master metabases

Open source Spark SQL does not support high availability, but high availability in real applications is significant for users. ZTE's big data platform DAP implements the high availability of Spark SQL in the corresponding ZDH.

The high availability of Spark SQL is to register the SQL when the two Spark SQL services are online. The JDBC URL of the user connection is specified as the Zookeeper list. When connecting, the Spark SQL node information is obtained through the ZooKeeper cluster, and then connected to the Spark SQL service. node.

Spark SQL metadata dual master is mainly implemented in MySQL. MySQL supports one-way and asynchronous replication. One server acts as the primary server and one or more other servers act as slave servers during the replication process. The primary server writes updates to the binary log file and maintains an index of the log files to track the log loop. Receive any updates from the server that have occurred since then, then block and wait for the primary server to notify the next update.

In the actual project, two MySQL databases are installed on the hosts distributed in different places. The two servers are active and standby. When one of the machines fails, the other can take over the application on the server. This requires two. The data of the database should be consistent in real time. Here, the synchronization function of MySQL is used to realize synchronous replication of the two machines.

Implementation plan

Currently, the SparkSQL node accesses the dual-master metabase mainly considering two options:

The SparkSQL node connects directly to the MySQL node:

In the following figure, the SparkSQL nodes are connected to a single MySQL node. The changes made to the metabase by different SparkSQL nodes are synchronized between the MySQL nodes.

Analysis of two main schemes for SparkSQL nodes accessing dual-master metabases

The SparkSQL node connects to the metabase through the MetaStore node:

In the following figure, the SparkSQL nodes are connected to multiple MetaStore nodes. Each MetaStore node is connected to the corresponding MySQL node. The changes to the metabase of different SparkSQL nodes are synchronized between the MySQL nodes.

Analysis of two main schemes for SparkSQL nodes accessing dual-master metabases

In the above two SparkSQL nodes accessing the dual-master metabase scheme, the way the client obtains the SparkSQL service is the same, mainly through the following methods:

Beeline connection

The program is accessed through the JDBC port

The Beeline method first obtains SparkSQL node information through the Zookeeper cluster and then connects to the SparkSQL service node. When the connected SparkSQL node is abnormal, you can get the SparkSQL service by retrying it several times.

If the program connects to the corresponding SparkSQL node through the JDBC port, if the SparkSQL node is connected with an exception, you can reacquire the SparkSQL service by performing exception capture in the code.

The following is mainly to verify the functional feasibility and abnormal conditions of the two schemes.

test environment

MySQL: 10.43.183.121 and 10.43.183.122 two hosts

SparkSQL: 10.43.183.121 and 10.43.183.122 two hosts

Hive MetaStoreServer: 10.43.183.121 and 10.43.183.122 two hosts

testing scenarios

Scenario 1: SparkSQL node directly connects to MySQL high availability verification

Each SparkSQL node is directly connected to a MySQL node. Verify that the metadata is successfully synchronized and that the MySQL node failure can be automatically switched.

The test steps are as follows:

1. Modify the configuration

The SparkSQL configuration is modified as follows:

Analysis of two main schemes for SparkSQL nodes accessing dual-master metabases

10.43.183.121 corresponding JDBC connection configuration is MySQL on 10.43.183.121

10.43.183.122 corresponds to the JDBC connection configured as MySQL on 10.43.183.122

2. Beeline connects to SparkSQL at 10.43.183.121.

3. Create a table test to find the tbls table of the two MySQL hiveomm databases, and you can see the test record. Indicates that the metadata synchronization is successful.

4. Stop the MySQL that SparkSQL is currently connected to.

5. The Beeline interface executes the “show tables” command to query for exceptions.

6. Disconnect the Beeline connection and reconnect the SparkSQL of the 10.43.183.121 node multiple times. The connection is abnormal.

7. Connect the SparkSQL service with the SQL URL! connectjdbc:hive2://10.43.183.121:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=sparkThriftServer Retry several times to connect to the SparkSQL service. You can check the test table with the "show tables" command. .

8. Start the MySQL node, and Beeline reconnects to the 10.43.183.121 node and can connect to the SparkSQL node. Run the show tables command to query the test table information.

Test conclusion:

Metadata between MySQL can be synchronized.

Hanging the MySQL node will cause Beeline to fail to query.

Beeline reconnect cannot connect to the corresponding SparkSQL node.

Beeline connects to the SparkSQL service via a SQL URL and can connect to an available SparkSQL node after a certain number of attempts.

Scenario 2: The SparkSQL node connects to the MySQL High Availability Verifier through the HiveMetaStoreServer node.

The MetaStoreServer node is mainly used for fault tolerance when the MySQL node fails. Each MetaStoreServer node corresponds to one MySQL node, and each SparkSQL node is configured with multiple MetaStoreServer nodes. Verify that the metadata is successfully synchronized and that the MySQL node failure can be automatically switched.

The test steps are as follows:

1. Modify the configuration

Balancing Head Boards

Balancing Head Boards of various types including 2bnds,4bnds head boards,etc, which is specially used for balancing and pulling splited conductors in electric power line transmission project, etc. It is made of high strength steel with small volume, light weight, no-damage to conductors. By high quality steel material and good design, this kind of Conductor Head Boards can be durable and long service life. we are a professional Chinese exporter of Balancing Head Board and we are looking forward to your cooperation.

Yangzhou Qianyuan Electric Equipment Manufacturing & Trade Co. Ltd is specialized in manufacturing and trade of electric power line transmission tools. Our main products are Anti-twisting Steel Wire Rope,Stringing Pulley,Hydraulic Crimping Compressors,Engine Powered Winch,Motorised Winch,Wire Grip,Gin Pole,Cable Stand,Mesh Sock Grips,Cable Conveyor,Lever Chain Hoists and so on,which are mainly supplied to power companies,railroad companies and other industry fields.

All our products are certified by China National Institute.
To assure the quality, we will do 100% inspection for raw material, production procedure, packing before shipment,
so we do have the confidence to supply customers with high-quality and high-efficiency products.
"Customer satisfaction" is our marketing purposes,so we have extensive experience in professional sales force,and strongly good pre-sale, after-sale service to clients. We can completely meet with customers' requirements and cooperate with each other perfectly to win the market.Sincerely welcome customers and friends throughout the world to our company,We strive hard to provide customer with high quality products and best service.

head boards, balancing head boards, balancing running boards

Yangzhou Qianyuan Electric Equipment Manufacturing & Trade Co.Ltd , https://www.qypowerline.com