大数据问题排查系列-hive踩坑记

明哥的IT随笔 明哥的IT随笔     2022-12-26     595

关键词:

前言

大家好,我是明哥!

本片博文是“大数据线上问题排查系列”大类别之一,讲述前段时间我司某产品在某券商遇到的一个问题及解决方案,其背后涉及到 hive 的一个 BUG,在 hive 3.0 才修复。以下是正文。

问题现象

cdh6.2.1中,开启 kerberos 和 sentry 的hive 中,使用 dml 语句 insert overwrite 插入数据到分区表的新分区时,会报错。同时注意到:

  • 查看底层 hdfs 发现对应的目录和数据都有了;

  • 使用 show partitions xxx 查看不到对应的新分区;

  • 使用 select 语句也查不到该新分区下的数据;

  • 只要是插入数据到新的分区,不管使用静态分区还是动态分区模式,都会报错;

问题分析

为确定问题,首先尝试了不同场景,发现如下现象:

  • 当使用 insert overwrite 插入数据到分区表的已有分区时,不会报错;(不管该分区下有没有数据);

  • 使用 insert into 插入数据到分区表的新分区时,不会报错;

  • 对于非分区表,insert overwrite 和 insert into 都不会报错;

  • 当使用insert overwrite 插入数据到分区表的新分区报错后,可以使用 msck repair test0317 修复hive metastore中相关元数据,修复后 select, show partitions 等语句可以查询到分区和该分区的底层数据,没有问题;

为进一步分析问题,常看了相关日志,包括客户端日志,hiveserver2 日志和 hive metastore 日志分别如下所示。

客户端日志(beeline)

INFO  : Loading data to table apollo_ods_jzfix.test0317 partition (ptdate=1) from hdfs://dev-dw-nn01:8020/user/hive/warehouse/apollo_ods_jzfix.db/test0317/ptdate=1/.hive-staging_hive_2021-03-17_15-09-13_232_1543365768355672834-7333/-ext-10000
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. org.apache.thrift.TApplicationException: Internal error processing fire_listener_event
INFO  : MapReduce Jobs Launched: 
INFO  : Stage-Stage-1: Map: 1   Cumulative CPU: 2.56 sec   HDFS Read: 5344 HDFS Write: 609 HDFS EC Read: 0 SUCCESS
INFO  : Total MapReduce CPU Time Spent: 2 seconds 560 msec
INFO  : Completed executing command(queryId=hive_20210317150913_9d734c54-f0cf-4dc7-9117-bc7f59c2cb61); Time taken: 17.758 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. org.apache.thrift.TApplicationException: Internal error processing fire_listener_event (state=08S01,code=1)

hive metastore 日志

2021-03-17 15:09:30,039 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-9-thread-122]: ugi=hive/dev-dw-nn01@GYDW.COM	ip=10.2.91.100	cmd=get_table : db=apollo_ods_jzfix tbl=test0317	
2021-03-17 15:09:30,044 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: [pool-9-thread-122]: 121: get_partition : db=apollo_ods_jzfix tbl=test0317[1]
2021-03-17 15:09:30,044 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-9-thread-122]: ugi=hive/dev-dw-nn01@GYDW.COM	ip=10.2.91.100	cmd=get_partition : db=apollo_ods_jzfix tbl=test0317[1]	
2021-03-17 15:09:30,053 ERROR org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-9-thread-122]: NoSuchObjectException(message:partition values=[1])
	at org.apache.hadoop.hive.metastore.ObjectStore.getPartition(ObjectStore.java:2003)
	at sun.reflect.GeneratedMethodAccessor97.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101)
	at com.sun.proxy.$Proxy26.getPartition(Unknown Source)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partition(HiveMetaStore.java:3553)
	at org.apache.hadoop.hive.metastore.events.InsertEvent.<init>(InsertEvent.java:62)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.fire_listener_event(HiveMetaStore.java:6737)
	at sun.reflect.GeneratedMethodAccessor100.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
	at com.sun.proxy.$Proxy28.fire_listener_event(Unknown Source)
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$fire_listener_event.getResult(ThriftHiveMetastore.java:14208)
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$fire_listener_event.getResult(ThriftHiveMetastore.java:14193)
	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
	at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:594)
	at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:589)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
	at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:589)
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

2021-03-17 15:09:30,054 ERROR org.apache.thrift.ProcessFunction: [pool-9-thread-122]: Internal error processing fire_listener_event
org.apache.hadoop.hive.metastore.api.NoSuchObjectException: partition values=[1]
	at org.apache.hadoop.hive.metastore.ObjectStore.getPartition(ObjectStore.java:2003) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at sun.reflect.GeneratedMethodAccessor97.invoke(Unknown Source) ~[?:?]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
	at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at com.sun.proxy.$Proxy26.getPartition(Unknown Source) ~[?:?]
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partition(HiveMetaStore.java:3553) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.metastore.events.InsertEvent.<init>(InsertEvent.java:62) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.fire_listener_event(HiveMetaStore.java:6737) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at sun.reflect.GeneratedMethodAccessor100.invoke(Unknown Source) ~[?:?]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at com.sun.proxy.$Proxy28.fire_listener_event(Unknown Source) ~[?:?]
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$fire_listener_event.getResult(ThriftHiveMetastore.java:14208) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$fire_listener_event.getResult(ThriftHiveMetastore.java:14193) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:594) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:589) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_181]
	at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_181]
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) [hadoop-common-3.0.0-cdh6.2.1.jar:?]
	at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:589) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]

hiveserver2 日志

	2021-03-17 15:09:30,016 INFO  org.apache.hadoop.hive.ql.exec.MoveTask: [HiveServer2-Background-Pool: Thread-183022]: Partition is: ptdate=1
2021-03-17 15:09:30,033 INFO  org.apache.hadoop.hive.common.FileUtils: [HiveServer2-Background-Pool: Thread-183022]: Creating directory if it doesn't exist: hdfs://dev-dw-nn01:8020/user/hive/warehouse/apollo_ods_jzfix.db/test0317/ptdate=1
2021-03-17 15:09:30,057 WARN  org.apache.hadoop.hive.metastore.RetryingMetaStoreClient: [HiveServer2-Background-Pool: Thread-183022]: MetaStoreClient lost connection. Attempting to reconnect.
org.apache.thrift.TApplicationException: Internal error processing fire_listener_event
	at org.apache.thrift.TApplicationException.read(TApplicationException.java:111) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_fire_listener_event(ThriftHiveMetastore.java:4836) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.fire_listener_event(ThriftHiveMetastore.java:4823) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.fireListenerEvent(HiveMetaStoreClient.java:2531) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at sun.reflect.GeneratedMethodAccessor88.invoke(Unknown Source) ~[?:?]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at com.sun.proxy.$Proxy34.fireListenerEvent(Unknown Source) [?:?]
	at sun.reflect.GeneratedMethodAccessor88.invoke(Unknown Source) ~[?:?]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2562) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at com.sun.proxy.$Proxy34.fireListenerEvent(Unknown Source) [?:?]
	at org.apache.hadoop.hive.ql.metadata.Hive.fireInsertEvent(Hive.java:2431) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:1629) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1525) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1489) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:501) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2200) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1843) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1563) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1339) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1334) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256) [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hive.service.cli.operation.SQLOperation.access$600(SQLOperation.java:92) [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345) [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_181]
	at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_181]
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) [hadoop-common-3.0.0-cdh6.2.1.jar:?]
	at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:357) [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_181]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
2021-03-17 15:09:31,063 INFO  hive.metastore: [HiveServer2-Background-Pool: Thread-183022]: Closed a connection to metastore, current connections: 9231
2021-03-17 15:09:31,063 INFO  hive.metastore: [HiveServer2-Background-Pool: Thread-183022]: Trying to connect to metastore with URI thrift://dev-dw-dn01:9083
2021-03-17 15:09:31,065 INFO  hive.metastore: [HiveServer2-Background-Pool: Thread-183022]: Opened a connection to metastore, current connections: 9232
2021-03-17 15:09:31,065 INFO  hive.metastore: [HiveServer2-Background-Pool: Thread-183022]: Connected to metastore.
2021-03-17 15:09:31,196 ERROR org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-183022]: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. org.apache.thrift.TApplicationException: Internal error processing fire_lis
tener_event2021-03-17 15:09:31,196 INFO  org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-183022]: MapReduce Jobs Launched: 
2021-03-17 15:09:31,197 INFO  org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-183022]: Stage-Stage-1: Map: 1   Cumulative CPU: 2.56 sec   HDFS Read: 5344 HDFS Write: 609 HDFS EC Read: 0 SUCCESS
2021-03-17 15:09:31,197 INFO  org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-183022]: Total MapReduce CPU Time Spent: 2 seconds 560 msec
2021-03-17 15:09:31,197 INFO  org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-183022]: Completed executing command(queryId=hive_20210317150913_9d734c54-f0cf-4dc7-9117-bc7f59c2cb61); Time taken: 17.758 seconds
2021-03-17 15:09:31,206 ERROR org.apache.hive.service.cli.operation.Operation: [HiveServer2-Background-Pool: Thread-183022]: Error running hive query: 
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. org.apache.thrift.TApplicationException: Internal error processing fire_listener_event
	at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:329) ~[hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:258) ~[hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hive.service.cli.operation.SQLOperation.access$600(SQLOperation.java:92) ~[hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345) [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_181]
	at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_181]
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) [hadoop-common-3.0.0-cdh6.2.1.jar:?]
	at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:357) [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_181]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.TApplicationException: Internal error processing fire_listener_event
	at org.apache.hadoop.hive.ql.metadata.Hive.fireInsertEvent(Hive.java:2433) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:1629) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1525) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1489) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:501) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2200) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1843) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1563) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1339) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1334) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256) ~[hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	... 11 more
Caused by: org.apache.thrift.TApplicationException: Internal error processing fire_listener_event
	at org.apache.thrift.TApplicationException.read(TApplicationException.java:111) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_fire_listener_event(ThriftHiveMetastore.java:4836) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.fire_listener_event(ThriftHiveMetastore.java:4823) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.fireListenerEvent(HiveMetaStoreClient.java:2531) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at sun.reflect.GeneratedMethodAccessor88.invoke(Unknown Source) ~[?:?]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at com.sun.proxy.$Proxy34.fireListenerEvent(Unknown Source) ~[?:?]
	at sun.reflect.GeneratedMethodAccessor88.invoke(Unknown Source) ~[?:?]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2562) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at com.sun.proxy.$Proxy34.fireListenerEvent(Unknown Source) ~[?:?]
	at org.apache.hadoop.hive.ql.metadata.Hive.fireInsertEvent(Hive.java:2431) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:1629) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1525) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1489) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:501) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2200) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1843) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1563) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1339) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1334) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256) ~[hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]

问题临时解决方法

通过以上不同场景下的尝试和相关日志,可以确定这应该是某些配置项引起的 HIVE 的一个 BUG,但笔者当时并没有发现相关的JIRA。为了快速推进项目,笔者当时提出了以下 workaround:

  1. 业务测问题解决方法一:在 insert overwrite 后 (会报错),增加脚本执行命令 msck repair test0317 修复hive metastore 中相关元数据: 该方法的缺点是不太方便嵌入到调度流程中,因insert overwrite报错后,调度工具就不会继续调度执行后续流程了,无法做到业务代码不侵入;

  2. 业务测问题解决方法二:在执行正常业务流程前,添加一个前置流程,该前置流程的内容是提前创建好正常流程中需要的新分区,其内容实例如下:alter table test0317 add partition (ptdate=10); (ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec;

最终为快速推进项目,我们采用了上述方案2临时解决了问题。

问题解决方法

在有了临时解决方案,确保了项目能顺利推进的同时,为彻底了解背后技术细节,笔者也在 cloudera 社区发了帖子寻求支持,但社区的相应速度没有那么及时,前几日才收到回馈,截图如下:


JIRA截图

根据 cloudra 官方的回复,参考上述 jira 链接,并对照源码,确认这是 HIVE 的一个BUG,且在 HIVE 3.0 才修复完毕,其主要问题概括如下:

在 hive.metastore.dml.events 配置为 true时,向不存在的分区插入数据,比如向新分区 insert overwrite,或以动态分区模式插入数据到新分区时,hive原先的处理有问题,会因为 metastore 找不到分区而断开链接并报错。

所以,在上述业务侧可以采取的方案二之外,还可以采取如下解决方案:

  1. 配置 hive.metastore.dml.events 为false

  2. 或升级到hive3.0.

!关注不迷路~ 各种福利、资源定期分享!欢迎小伙伴们扫码添加明哥微信,后台加群交流学习。

大数据问题排查系列-tdh大数据平台中hive作业长时间无法执行结束

大数据问题排查系列-TDH大数据平台中HIVE作业长时间无法执行结束前言大家好,我是明哥!本片博文是“大数据问题排查系列”之一,讲述某星环TDH大数据平台中,研发同学提交的Hive作业在成功提交后,客户... 查看详情

大数据问题排查系列-大数据集群开启kerberos认证后hive作业执行失败(代码片段)

大数据问题排查系列-大数据集群开启kerberos认证后HIVE作业执行失败1前言大家好,我是明哥!本文是大数据问题排查系列的kerberos问题排查子序列博文之一,讲述大数据集群开启kerberos安全认证后,hive作业执行失败... 查看详情

大数据问题排查系列-tdh大数据平台中hive作业长时间无法执行结束

前言大家好,我是明哥!本片博文是“大数据问题排查系列”之一,讲述某星环TDH大数据平台中,研发同学提交的Hive作业在成功提交后,客户端长时间收不到任何结果信息也收不到任何报错信息问题的排查。... 查看详情

大数据问题排查系列-因hive中元数据与hdfs中实际的数据不一致引起的问题的修复(代码片段)

大数据问题排查系列-因HIVE中元数据与HDFS中实际的数据不一致引起的问题的修复前言大家好,我是明哥!本片博文是“大数据问题排查系列”之一,讲述某HIVESQL作业因为HIVE中的元数据与HDFS中实际的数据不一致引起的... 查看详情

nginx安装踩坑记

...llepel-release-yyumupdateyuminstallnginx-y一般情况下三条命令解决问题如果解决不了那么问题大了,如果解决不了那么问题大了,如果解决不了那么问题大了​下一步有问题找官网​​​​http://nginx.org/en/linux_packages.html​​​根据官网指... 查看详情

rabbitmq踩坑记(代码片段)

...//www.cnblogs.com/little-sheep/p/9934887.html)。在测试环境下没有问题,但上到生产环境后部分用户反映出现了丢消息的情况,针对这个问题我们进行了排查,发现,原本我们的系统是单机的,但用户在之前做了调整,在内外网服务器分... 查看详情

uwsgi踩坑记(代码片段)

...发请求时的协议要和uWSGI监听的协议一致。否则就会出现问题,因为是三者之间的通信,排查起来需要想清楚请求传递的次序:Nginx->uWSGI->app1.1uWSGI异常信息invalidrequestblocksize:21573(max4096)...skip如果按照下面的配置就会出现上述... 查看详情

flutter踩坑记:therearemultipleheroesthatsharethesametagwithinasubtree.(代码片段)

...一个点击回调的事件里做页面跳转,一跳转就是黑屏,这个问题折腾了许久,主要花在了排查问题上,排查了路由设置问题和类名以及页面结构等问题之后还是无果,最后拿控制台的一句话"Therearemultipleheroesthatsharethesametagwithinasubtree... 查看详情

springcloud踩坑记---使用feignclient远程调用服务404

...client就是无法调通,一直报404错误,排查过程如下:一、问题:  服务提供方定义的接口如下:/***黑白名单查询接口**@authorLiJunJun*@since2018/10/18*/@Component(va 查看详情

记一次spring事务配置踩坑记

...SpringBoot+MyBatisPlus)业务逻辑伪代码如下。理论上,插入数据t1后,xxService.getXxx()方法的查询条件会不满足,会查询不到数据。结果事与愿违,后一次的查询,居然查到了数据。voidsaveXxx(){ xxService.getXxx();//查到一条数据data1 ... 查看详情

rgb保存.bmp文件踩坑记

... 解决思路过程:    1. 截屏出来的数据是RGB的,写MP4文件需要经过RGB24->YUV420转换。怀疑RGB数据本身存在点问题。 2.尝试将RBG数据写bmp文件,写完的文件无论如何也打不开,为此我特意研究了BMP包头,... 查看详情

踩坑记

...     更加方便的是,PHP可以直接将图像数据流输出到浏览器。要想在PHP中使用图像处理功能,你需要连带gd库一起来编译PHP。  持续更新。。。。。。  查看详情

大数据问题排查系列-hdfsfilesystemapi的正确打开方式,你get了吗?(代码片段)

大数据问题排查系列-HDFSFileSystemAPI的正确打开方式,你GET了吗?前言大家好,我是明哥!本片博文是“大数据问题排查系列”之一,我们首先会聊聊一个问题的现象原因和解决方法,然后给出HDFSFileSystemAPI... 查看详情

logback日志输出踩坑记

...势,在我公司平台上日充电订单量达到四五十万单。问题描述近日在上午十点左右、夜里十一点左右,这两个时间段频繁接收日志监控系统的接口响应超时告警信息,过了这两个充电高峰期后, 查看详情

rsyslog磁盘辅助(disk-assisted)模式踩坑记

最近公司为方便tracing、排查,搞全链路日志收集,而我手上的10亿+pv的动态前端服务必然在考虑之列。之前呢。都是运维定制的收集方式:如上图,rsyslogpushkafka,优点嘛:稳定,肯定不会丢日志;报点也很明显:性能差,IO略高... 查看详情

实操|hive数据倾斜问题定位排查及解决(代码片段)

Hive数据倾斜怎么发现,怎么定位,怎么解决多数介绍数据倾斜的文章都是以大篇幅的理论为主,并没有给出具体的数据倾斜案例。当工作中遇到了倾斜问题,这些理论很难直接应用,导致我们面对倾斜时还是... 查看详情

大数据问题排查系列-开启kerberos后无法访问hiveserver2等服务的webui

大数据问题排查系列-开启kerberos后无法访问HIVESERVER2等服务的WEBUI1前言大家好,我是明哥!在博文“从技术视角看大数据行业的发展趋势”中,我们提到大数据的一个发展趋势是日益重视数据安全。在数据安全上,... 查看详情

spark踩坑记——数据库(hbase+mysql)转

...中对于计算产生结果的进行持久化时,我们往往需要操作数据库,去统计或者改变一些值。最近一个实时消费者处理任务,在使用sparkstreaming进行实时的数据流处理时,我需要将计算好的数据更新到hbase和mysql中,所以本文对spark... 查看详情