正文

深入剖析hive的锁和事务机制

明哥的IT随笔  明哥的IT随笔  2022-12-05  576

关键词：

深入剖析 HIVE 的锁和事务机制

1 从调度系统对 HIVE 作业的调度策略聊起

大家知道，依托 HIVE 构建数据仓库时，我们一般会通过调度系统精心编排 HIVE SQL 作业，以避免多个作业并发写同一张表或同一个表分区；同时 AD HOC 的数据查询操作，在时间上一般也会尽量避开常规的 ETL 数据写操作，以避免对同一张表或同一个表分区的并发读写操作。

调度系统之所以对 HIVE SQL 作业采取上述调度策略，究其原因，是因为多作业并发读写同一个表或同一个表分区时，底层会因为 HIVE 的锁机制导致大量锁竞争和等待，此时作业运行效率极低，所以业务应用只能通过调度系统规避掉对同一张表或表分区的并发读写。

2. 一个因为调度系统配置不当导致的 HIVE 作业运行效率低下的线上例子

笔者最近排查的某证券客户的线上应用运行效率低下的原因，就是上述 HIVE SQL 作业调度策略不当，在此跟大家分享下。

问题现象：在进行 HIVE 作业的性能分析时，从 HIVE ON SPARK 作业的 SPARK WEB UI 上，我们观察到，某个 SQL 作业对应的一系列 SPARK 任务执行完毕后，大部分 EXECUTORs 都被动态释放归还给了 YARN，但是后续过了七八分钟又动态地申请了新的一批 EXECUTORS，以执行后续的 SQL 作业，这是不正常的，因为虽然我们开启了 SPARK ON YARN 的动态资源分配机制，但是前后两个 SQL 作业是在同一个 SESSION 会话中顺序提交的，理想状况下，SPARK EXECUTORS 不应该动态释放又重新申请，除非前后两个 SQL 的提交/执行间隔时长，大于参数 spark.dynamicAllocation.executorIdleTimeout 的值（该参数默认值 60s）
问题原因：经过排查应用日志和YARN 日志，发现前后两个 SQL 确实是顺序提交的,其提交间隔并没有大于60s，同时 YARN 集群中对应的资源队列，当时也是有充足的资源的；进一步排查 HIVESERVER2 日志，发现 HS2 虽然在上一个 SQL 执行完毕后，及时收到了客户端提交的新的 SQL 并成功完成了解析和编译，但真正生成 SPARK 任务并提交执行，却是在七八分钟之后！正是由于这七八分钟的时间间隔，触发了SPARK ON YARN 动态资源分配机制下的 executor 的动态释放和申请；
问题根本原因：仔细分析 HS2 中详细的日志，结合业务方编写的 HIVE SQL 业务代码，发现上述 SQL 的解析和对应 SPARK 作业的执行之间的七八分钟的时间间隔，正是由于 HIVE 表分区的锁竞争引起的: 业务方编写了多个 HIVE SQL 作业，这些 SQL 作业通过 insert into 插入到了同一张表的同一个分区，在配置调度系统时，这些 HIVE SQL 作业被配置在了不同的流程中并被并发地调度执行，所以此时这些被并发调度的多个 HIVE SQL 作业在底层会因为对同一个表分区的的锁竞争引起等待，在上述案列中达到了七八分钟，触发了 spark executor 的动态释放和申请；
问题解决方案：更改业务 SQL 和调度逻辑，不再多个任务并发写同一个表分区，比如先分别写数据到多个表例如 report_data1-5，最后再把数据 union 插入到 report_data 表.

3 HIVE 的锁和事务机制

3.1 HIVE 传统的锁机制

HIVE 传统的锁机制，概括如下：

非分区表：For a non-partitioned table:
- When the table is being read, a S lock is acquired; 读取操作会获取S锁
- whereas an X lock is acquired for all other operations (insert into the table, alter table of any kind etc.) 其它操作会获取X锁
分区表：For a partitioned table, the idea is as follows:
- A ‘S’ lock on table and relevant partition is acquired when a read is being performed; 读取操作或获取表和相关分区的S锁；
- For all other operations, an ‘X’ lock is taken on the partition. However, if the change is only applicable to the newer partitions, a ‘S’ lock is acquired on the table, whereas if the change is applicable to all partitions, a ‘X’ lock is acquired on the table. Thus, older partitions can be read and written into, while the newer partitions are being converted to RCFile. Whenever a partition is being locked in any mode, all its parents are locked in ‘S’ mode. 其它操作，会获取分区的X锁；（可能会获取表的S或X锁）
  
  如下官方链接，对此有详细的描述：
  https://cwiki.apache.org/confluence/display/Hive/Locking

3.2 HIVE 的 Transaction/Lock Manager

在传统的锁机制的基础上，为同时支持 HIVE ACID 事务表和非事务表，HIVE 推出了 transaction manager 和 lock manager:

A new logical entity called “transaction manager” was added which incorporated previous notion of “database/table/partition lock manager”,the transaction manager is now additionally responsible for managing of transactions locks;
the default DummyTxnManager emulates behavior of old Hive versions: has no transactions and uses hive.lock.manager property to create lock manager for tables, partitions and databases; 配置 transaction manager 为 DummyTxnManager，以提供对传统锁机制的支持，此时不支持事务；
A newly added DbTxnManager manages all locks/transactions in Hive metastore with DbLockManager (transactions and locks are durable in the face of server failure). This means that previous behavior of locking in ZooKeeper is not present anymore when transactions are enabled. 配置 transaction manager 为 DbTxnManager，以提供对事务的支持，此时底层通过 DbLockManager管理事务和锁；
To avoid clients dying and leaving transaction or locks dangling, a heartbeat is sent from lock holders and transaction initiators to the metastore on a regular basis. If a heartbeat is not received in the configured amount of time, the lock or transaction will be aborted.
Note that the lock manager used by DbTxnManager will acquire locks on all tables, even those without “transactional=true” property. 当使用 DbTxnManager时，读写操作对事务表和非事务表都会加锁；
By default, Insert operation into a non-transactional table will acquire an exclusive lock and thus block other inserts and reads. While technically correct, this is a departure from how Hive traditionally worked (i.e. w/o a lock manger). 默认情况下，对非事务表的写操作，会加 X 锁；
For backwards compatibility, hive.txn.strict.locking.mode (see table below) is provided which will make this lock manager acquire shared locks on insert operations on non-transactional tables. This restores previous semantics while still providing the benefit of a lock manager such as preventing table drop while it is being read. 配置hive.txn.strict.locking.mode为false后,对非事务表的写操作，会加 S 锁；
Note that for transactional tables, insert always acquires share locks since these tables implement MVCC architecture at the storage layer and are able to provide strong read consistency (Snapshot Isolation) even in presence of concurrent modification operations. 对事务表的 INSERT 操作，会获取 S 锁，因为底层存储层的 MVCC 机制确保了 SNAPSHOT ISOLATION，确保了并发度写情况下的强一致性；

4 事务和锁的相关配置参数

事务和锁主要的相关配置参数有：

hive.txn.manager：
- 配置为 org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager 以提供类似 Hive-0.13 版本之前的传统的锁机制，此时不提供对 transactions 事务的支持；
- 配置为 org.apache.hadoop.hive.ql.lockmgr.DbTxnManager 以提供对事务表的支持；
hive.lock.manager
- 配置为 org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager，基于 zk构建锁，提供类似 Hive-0.13 版本之前的传统的锁机制，此时不提供对 transactions 事务的支持；
- 配置为 org.apache.hadoop.hive.ql.lockmgr.DbLockManager，基于 metastore db构建锁，提供了对事务表的支持；
hive.support.concurrency: true/false, 需要配置为 TURE 才能支持并发读写和事务；
hive.txn.strict.locking.mode：true/false:
- In strict mode non-ACID resources use standard R/W lock semantics, e.g. INSERT will acquire exclusive lock. 在 STRICT 模式下，非事务表获取的也是传统的 R/W读写锁，此时 INSERT 获取的时 X排他锁；
- In non-strict mode, for non-ACID resources, INSERT will only acquire shared lock, which allows two concurrent writes to the same partition but still lets lock manager prevent DROP TABLE etc. when the table is being written to (as of Hive 2.2.0). 在非严格模式下，INSERT 非事务表时会获取S锁，即允许多个作业并发写同一个非事务表；
需要说明的是，当 hive.txn.manager 配置为 org.apache.hadoop.hive.ql.lockmgr.DbTxnManager 时，不管 hive.lock.manager 配置为 org.apache.hadoop.hive.ql.lockmgr.DbLockManager还是 org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager，其底层实际使用的都是 org.apache.hadoop.hive.ql.lockmgr.DbLockManager，见源码：

事务和锁，spark 动态资源分配，以及HS2并发性，其它相关参数还有：

hive.txn.timeout
hive.txn.heartbeat.threadpool.size
spark.dynamicAllocation.enabled
spark.dynamicAllocation.initial/min/maxExecutors
spark.dynamicAllocation.executorAllocationRatio
spark.dynamicAllocation.schedulerBacklogTimeout
spark.dynamicAllocation.sustainedSchedulerBacklogTimeout
spark.dynamicAllocation.executorIdleTimeout
spark.dynamicAllocation.cachedExecutorIdleTimeout
hive.metastore.server.min.threads
hive.server2.thrift.min.worker.threads
hive.server2.thrift.max.worker.threads

参考链接：

HIVE-15774：Ensure DbLockManager backward compatibility for non-ACID resources
https://cwiki.apache.org/confluence/display/Hive/Locking
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-LockManager
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowLocks

深入剖析mysql事务和spring事务(代码片段)

本文分享一些关于Mysql如何解决多事务并发的问题和Spring源码是怎么控制事务以及一些事务失效的场景。分享内容Mysql事务隔离机制锁机制MVCC多版本并发控制隔离机制Spring事务应用和源码分析事务失效问题一、Mysql事务数据库的... 查看详情

悲观锁和乐观锁

...数据处于锁定状态。悲观锁的实现，往往依靠数据库提供的锁机制（也只有数据库层提供的锁机制才能真正保证数据访问的排他性，否则，即使在本系统中实现了加锁机制，也无法保证外部系统不会修改数据）。2、乐观锁... 查看详情

深入理解乐观锁与悲观锁(代码片段)

在数据库的锁机制中介绍过，数据库管理系统（DBMS）中的并发控制的任务是确保在多个事务同时存取数据库中同一数据时不破坏事务的隔离性和统一性以及数据库的统一性。乐观并发控制(乐观锁)和悲观并发控制（悲观锁）是并... 查看详情

深入理解乐观锁与悲观锁(代码片段)

java学习---java的锁和mysql的锁机制

【原文】https://www.toutiao.com/i6593861446428262916/Java和数据库的锁机制https://www.toutiao.com/i6593861446428262916/ 查看详情

innodb存储引擎的锁机制

...date或delete操作表级别的意向锁-IntentionLockInnoDB支持多粒度的锁定，允许行锁和表锁共存。通过意向锁来实现。比如，SELECT...LOCKINSHAREMODEsetsanISlockandSELECT...FORUPDATEsets 查看详情

悲观锁乐观锁和数据库的锁之间的关系

...先悲观锁和乐观锁是基于业务逻辑来讲的，他们和数据库的锁是不同的概念，数据库的锁是实现数据库事务的机制。乐观锁就在并发情况下，避免覆盖更新的一种机制，是程序员自己通过添加一个version字段实现，每次load数... 查看详情

mysql-innodb存储引擎的锁和事务

InnoDB存储引擎的关键特性：插入缓冲（性能提升）、两次写（可靠）、自适应哈希索引（查找效率）、异步IO(AIO，提高磁盘操作性能)、刷新邻接页（合并IO）。InnoDB采用一致性的非锁定读和... 查看详情

mysql_innodb的锁和事务模型(代码片段)

...库迁移到MySQL;想要进行MySQL性能调优;那么学习和掌握InnoDB的锁和事务模型就非常有用。文章目录14.7.1InnoDB中的锁共享锁与排他锁意向锁记录锁间隙锁临键锁插入意向锁自增锁空间索引使用的谓词锁14.7.2InnoDB事务模型14.7.2.1事务隔... 查看详情

sqlite剖析之锁和并发控制

在SQLite中，锁和并发控制机制都是由pager.c模块负责处理的，用于实现ACID(Atomic,Consistent,Isolated和Durable)特性。在含有数据修改的事务中，该模块将确保或者所有的数据修改全部提交，或者全部回滚。与此同时，该模... 查看详情

django的锁和事务(代码片段)

Django的锁和事务锁select_for_update(nowait=False,skip_locked=False)返回一个锁住行直到事务结束的查询集，如果数据库支持，它将生成一个SELECT...FORUPDATE语句。举个例子：entries=Entry.objects.select_for_update().filter(author=request.user)所有匹配的行... 查看详情

mysql数据库同步之悲观锁和乐观锁

...数据处于锁定状态。悲观锁的实现，往往依靠数据库提供的锁机制（也只有数据库层提供的锁机制才能真正保证数据访问的排他性，否则即使在本系统中实现了加锁机制，也无法保证外部系统不会修改数据）。一个典型的倚... 查看详情

mysql数据库同步之悲观锁和乐观锁

mysql-innodb存储引擎的锁和事务

继续深入数据库了解一下数据库的锁机制

我们在高并发的场景下，经常会在异常日志中看到“deadlock(死锁)”的错误信息。想了无数的解决方案，都没有能够最终的解决，到底是什么原因引起了死锁呢?要解决这个问题，我们就必须先了解透彻数据库都有哪些锁?他们的工... 查看详情

深入理解乐观锁与悲观锁

数据库事务的悲观锁和乐观锁

转载出处:http://www.hollischuang.com/archives/934在数据库的锁机制中介绍过，数据库管理系统（DBMS）中的并发控制的任务是确保在多个事务同时存取数据库中同一数据时不破坏事务的隔离性和统一性以及数据库的统一性。乐观并发控制... 查看详情