Hbase写操作偶尔有延迟

本文记录了一次HBase性能调优的过程,针对HMaster在进行表分裂时出现的内存溢出错误进行了深入分析,并提出了相应的解决方案。

一、现象

        1、hbase操作偶尔有延迟,写入数据时需要4000+ms。

        2、hbase master日志异常:

2021-09-06 08:30:43,140 INFO  [PEWorker-7] procedure.MasterProcedureScheduler: Took xlock for pid=9516, state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure table=idm:t_user_behavior_api_log_202109, parent=1c9a9a43f5d1bc0581f66fa5ba39f17a, daughterA=81a2bf91af4ffc731e0e734b91be05e1, daughterB=3b7540c5ef2a5ec320e9376038120984
2021-09-06 08:30:43,305 INFO  [PEWorker-7] procedure2.ProcedureExecutor: Initialized subprocedures=[{pid=9517, ppid=9516, state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure table=idm:t_user_behavior_api_log_202109, region=1c9a9a43f5d1bc0581f66fa5ba39f17a, override=true, server=hbaseprodnode03.yummy.tech,16020,1630587697108}]
2021-09-06 08:30:43,347 INFO  [PEWorker-14] procedure.MasterProcedureScheduler: Took xlock for pid=9517, ppid=9516, state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure table=idm:t_user_behavior_api_log_202109, region=1c9a9a43f5d1bc0581f66fa5ba39f17a, override=true, server=hbaseprodnode03.yummy.tech,16020,1630587697108
2021-09-06 08:30:43,388 INFO  [PEWorker-14] assignment.RegionStateStore: pid=9517 updating hbase:meta row=1c9a9a43f5d1bc0581f66fa5ba39f17a, regionState=CLOSING, regionLocation=hbaseprodnode03.yummy.tech,16020,1630587697108
2021-09-06 08:30:43,390 INFO  [PEWorker-14] assignment.RegionTransitionProcedure: Dispatch pid=9517, ppid=9516, state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true; UnassignProcedure table=idm:t_user_behavior_api_log_202109, region=1c9a9a43f5d1bc0581f66fa5ba39f17a, override=true, server=hbaseprodnode03.yummy.tech,16020,1630587697108
2021-09-06 08:30:43,541 INFO  [RSProcedureDispatcher-pool3-t627] procedure.RSProcedureDispatcher: submitting region close {ENCODED => 1c9a9a43f5d1bc0581f66fa5ba39f17a, NAME => 'idm:t_user_behavior_api_log_202109,,1624989656062.1c9a9a43f5d1bc0581f66fa5ba39f17a.', STARTKEY => '', ENDKEY => ''} to hbaseprodnode03.yummy.tech,16020,1630587697108
2021-09-06 08:30:43,662 INFO  [PEWorker-15] assignment.RegionStateStore: pid=9517 updating hbase:meta row=1c9a9a43f5d1bc0581f66fa5ba39f17a, regionState=CLOSED
2021-09-06 08:30:43,830 INFO  [PEWorker-15] procedure2.ProcedureExecutor: Finished subprocedure pid=9517, resume processing parent pid=9516, state=RUNNABLE:SPLIT_TABLE_REGIONS_CHECK_CLOSED_REGIONS, locked=true; SplitTableRegionProcedure table=idm:t_user_behavior_api_log_202109, parent=1c9a9a43f5d1bc0581f66fa5ba39f17a, daughterA=81a2bf91af4ffc731e0e734b91be05e1, daughterB=3b7540c5ef2a5ec320e9376038120984
2021-09-06 08:30:43,830 INFO  [PEWorker-15] procedure2.ProcedureExecutor: Finished pid=9517, ppid=9516, state=SUCCESS; UnassignProcedure table=idm:t_user_behavior_api_log_202109, region=1c9a9a43f5d1bc0581f66fa5ba39f17a, override=true, server=hbaseprodnode03.yummy.tech,16020,1630587697108 in 359msec, unfinishedSiblingCount=0
2021-09-06 08:30:43,928 INFO  [PEWorker-9] assignment.SplitTableRegionProcedure: pid=9516 splitting 3 storefiles, region=1c9a9a43f5d1bc0581f66fa5ba39f17a, threads=3
2021-09-06 08:30:43,928 INFO  [PEWorker-9] util.ByteBufferArray: Allocating buffers total=16 GB, sizePerBuffer=4 MB, count=4096
2021-09-06 08:30:45,499 ERROR [PEWorker-9] util.ByteBufferArray: Buffer creation interrupted
java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Direct buffer memory
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at org.apache.hadoop.hbase.util.ByteBufferArray.createBuffers(ByteBufferArray.java:96)
	at org.apache.hadoop.hbase.util.ByteBufferArray.<init>(ByteBufferArray.java:74)
	at org.apache.hadoop.hbase.io.hfile.bucket.ByteBufferIOEngine.<init>(ByteBufferIOEngine.java:86)
	at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getIOEngineFromName(BucketCache.java:384)
	at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.<init>(BucketCache.java:262)
	at org.apache.hadoop.hbase.io.hfile.CacheConfig.getBucketCache(CacheConfig.java:655)
	at org.apache.hadoop.hbase.io.hfile.CacheConfig.instantiateBlockCache(CacheConfig.java:685)
	at org.apache.hadoop.hbase.io.hfile.CacheConfig.<init>(CacheConfig.java:218)
	at org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.splitStoreFiles(SplitTableRegionProcedure.java:653)
	at org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.createDaughterRegions(SplitTableRegionProcedure.java:574)
	at org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:267)
	at org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:92)
	at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:184)
	at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:958)
	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1836)
	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1596)
	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1200(ProcedureExecutor.java:80)
	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:2141)
Caused by: java.lang.OutOfMemoryError: Direct buffer memory
	at java.nio.Bits.reserveMemory(Bits.java:694)
	at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
	at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
	at org.apache.hadoop.hbase.io.hfile.bucket.ByteBufferIOEngine$1.allocate(ByteBufferIOEngine.java:83)
	at org.apache.hadoop.hbase.util.ByteBufferArray$BufferCreatorCallable.call(ByteBufferArray.java:136)
	at org.apache.hadoop.hbase.util.ByteBufferArray$BufferCreatorCallable.call(ByteBufferArray.java:121)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
2021-09-06 08:30:45,500 ERROR [PEWorker-9] hfile.CacheConfig: Can't instantiate bucket cache
java.io.IOException: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Direct buffer memory
	at org.apache.hadoop.hbase.util.ByteBufferArray.createBuffers(ByteBufferArray.java:102)
	at org.apache.hadoop.hbase.util.ByteBufferArray.<init>(ByteBufferArray.java:74)
	at org.apache.hadoop.hbase.io.hfile.bucket.ByteBufferIOEngine.<init>(ByteBufferIOEngine.java:86)
	at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getIOEngineFromName(BucketCache.java:384)
	at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.<init>(BucketCache.java:262)
	at org.apache.hadoop.hbase.io.hfile.CacheConfig.getBucketCache(CacheConfig.java:655)
	at org.apache.hadoop.hbase.io.hfile.CacheConfig.instantiateBlockCache(CacheConfig.java:685)
	at org.apache.hadoop.hbase.io.hfile.CacheConfig.<init>(CacheConfig.java:218)
	at org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.splitStoreFiles(SplitTableRegionProcedure.java:653)
	at org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.createDaughterRegions(SplitTableRegionProcedure.java:574)
	at org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:267)
	at org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:92)
	at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:184)
	at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:958)
	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1836)
	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1596)
	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1200(ProcedureExecutor.java:80)
	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:2141)
Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Direct buffer memory
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at org.apache.hadoop.hbase.util.ByteBufferArray.createBuffers(ByteBufferArray.java:96)
	... 17 more
Caused by: java.lang.OutOfMemoryError: Direct buffer memory
	at java.nio.Bits.reserveMemory(Bits.java:694)
	at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
	at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
	at org.apache.hadoop.hbase.io.hfile.bucket.ByteBufferIOEngine$1.allocate(ByteBufferIOEngine.java:83)
	at org.apache.hadoop.hbase.util.ByteBufferArray$BufferCreatorCallable.call(ByteBufferArray.java:136)
	at org.apache.hadoop.hbase.util.ByteBufferArray$BufferCreatorCallable.call(ByteBufferArray.java:121)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
2021-09-06 08:30:45,500 ERROR [PEWorker-9] procedure2.ProcedureExecutor: CODE-BUG: Uncaught runtime exception: pid=9516, state=RUNNABLE:SPLIT_TABLE_REGION_CREATE_DAUGHTER_REGIONS, locked=true; SplitTableRegionProcedure table=idm:t_user_behavior_api_log_202109, parent=1c9a9a43f5d1bc0581f66fa5ba39f17a, daughterA=81a2bf91af4ffc731e0e734b91be05e1, daughterB=3b7540c5ef2a5ec320e9376038120984
java.lang.RuntimeException: java.io.IOException: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Direct buffer memory
	at org.apache.hadoop.hbase.io.hfile.CacheConfig.getBucketCache(CacheConfig.java:659)
	at org.apache.hadoop.hbase.io.hfile.CacheConfig.instantiateBlockCache(CacheConfig.java:685)
	at org.apache.hadoop.hbase.io.hfile.CacheConfig.<init>(CacheConfig.java:218)
	at org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.splitStoreFiles(SplitTableRegionProcedure.java:653)
	at org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.createDaughterRegions(SplitTableRegionProcedure.java:574)
	at org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:267)
	at org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:92)
	at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:184)
	at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:958)
	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1836)
	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1596)
	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1200(ProcedureExecutor.java:80)
	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:2141)
Caused by: java.io.IOException: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Direct buffer memory
	at org.apache.hadoop.hbase.util.ByteBufferArray.createBuffers(ByteBufferArray.java:102)
	at org.apache.hadoop.hbase.util.ByteBufferArray.<init>(ByteBufferArray.java:74)
	at org.apache.hadoop.hbase.io.hfile.bucket.ByteBufferIOEngine.<init>(ByteBufferIOEngine.java:86)
	at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getIOEngineFromName(BucketCache.java:384)
	at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.<init>(BucketCache.java:262)
	at org.apache.hadoop.hbase.io.hfile.CacheConfig.getBucketCache(CacheConfig.java:655)
	... 12 more
Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Direct buffer memory
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at org.apache.hadoop.hbase.util.ByteBufferArray.createBuffers(ByteBufferArray.java:96)
	... 17 more
Caused by: java.lang.OutOfMemoryError: Direct buffer memory
	at java.nio.Bits.reserveMemory(Bits.java:694)
	at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
	at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
	at org.apache.hadoop.hbase.io.hfile.bucket.ByteBufferIOEngine$1.allocate(ByteBufferIOEngine.java:83)
	at org.apache.hadoop.hbase.util.ByteBufferArray$BufferCreatorCallable.call(ByteBufferArray.java:136)
	at org.apache.hadoop.hbase.util.ByteBufferArray$BufferCreatorCallable.call(ByteBufferArray.java:121)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
2021-09-06 08:30:45,838 INFO  [PEWorker-5] procedure.MasterProcedureScheduler: Waiting on xlock for pid=9518, state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure table=idm:t_user_behavior_api_log_202109, region=1c9a9a43f5d1bc0581f66fa5ba39f17a, target=hbaseprodnode03.yummy.tech,16020,1630587697108 held by pid=9516
2021-09-06 08:30:46,105 INFO  [PEWorker-9] procedure2.ProcedureExecutor: Rolled back pid=9516, state=ROLLEDBACK, exception=java.lang.RuntimeException via CODE-BUG: Uncaught runtime exception: pid=9516, state=RUNNABLE:SPLIT_TABLE_REGION_CREATE_DAUGHTER_REGIONS, locked=true; SplitTableRegionProcedure table=idm:t_user_behavior_api_log_202109, parent=1c9a9a43f5d1bc0581f66fa5ba39f17a, daughterA=81a2bf91af4ffc731e0e734b91be05e1, daughterB=3b7540c5ef2a5ec320e9376038120984:java.lang.RuntimeException: java.io.IOException: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Direct buffer memory; SplitTableRegionProcedure table=idm:t_user_behavior_api_log_202109, parent=1c9a9a43f5d1bc0581f66fa5ba39f17a, daughterA=81a2bf91af4ffc731e0e734b91be05e1, daughterB=3b7540c5ef2a5ec320e9376038120984 exec-time=3.3590sec
2021-09-06 08:30:46,105 INFO  [PEWorker-10] procedure.MasterProcedureScheduler: Took xlock for pid=9518, state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure table=idm:t_user_behavior_api_log_202109, region=1c9a9a43f5d1bc0581f66fa5ba39f17a, target=hbaseprodnode03.yummy.tech,16020,1630587697108
2021-09-06 08:30:46,146 INFO  [PEWorker-10] assignment.AssignProcedure: Starting pid=9518, state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure table=idm:t_user_behavior_api_log_202109, region=1c9a9a43f5d1bc0581f66fa5ba39f17a, target=hbaseprodnode03.yummy.tech,16020,1630587697108; rit=OFFLINE, location=hbaseprodnode03.yummy.tech,16020,1630587697108; forceNewPlan=false, retain=false target svr=hbaseprodnode03.yummy.tech,16020,1630587697108
2021-09-06 08:30:46,297 INFO  [master/hbaseprodnode01:16000] balancer.BaseLoadBalancer: Reassigned 1 regions. 1 retained the pre-restart assignment. 
2021-09-06 08:30:46,297 INFO  [PEWorker-4] assignment.RegionStateStore: pid=9518 updating hbase:meta row=1c9a9a43f5d1bc0581f66fa5ba39f17a, regionState=OPENING, regionLocation=hbaseprodnode03.yummy.tech,16020,1630587697108
2021-09-06 08:30:46,299 INFO  [PEWorker-4] assignment.RegionTransitionProcedure: Dispatch pid=9518, state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true; AssignProcedure table=idm:t_user_behavior_api_log_202109, region=1c9a9a43f5d1bc0581f66fa5ba39f17a, target=hbaseprodnode03.yummy.tech,16020,1630587697108
2021-09-06 08:30:46,571 INFO  [PEWorker-13] assignment.RegionStateStore: pid=9518 updating hbase:meta row=1c9a9a43f5d1bc0581f66fa5ba39f17a, regionState=OPEN, openSeqNum=1917411, regionLocation=hbaseprodnode03.yummy.tech,16020,1630587697108
2021-09-06 08:30:46,669 INFO  [PEWorker-13] procedure2.ProcedureExecutor: Finished pid=9518, state=SUCCESS; AssignProcedure table=idm:t_user_behavior_api_log_202109, region=1c9a9a43f5d1bc0581f66fa5ba39f17a, target=hbaseprodnode03.yummy.tech,16020,1630587697108 in 776msec

   

二、调查思路

        1、调查异常“java.lang.OutOfMemoryError: Direct buffer memory“,

        参考文章HBase Direct Memory OutOfMemory

        调整了-XX:MaxDirectMemorySize=20480m

        因为ambari里master和regionserver是同一个配置,所以第一次统一修改了这个值。

        hbase服务重启后,master服务异常终止,

        把master的配置还原,去掉-XX:MaxDirectMemorySize后,hmaster服务恢复正常。

        2、异常依然存在,查找hbase内存配置相关资料,都是讲region server如何设置内存,

        当前遇到的这个异常是hmaster报的,所以没有从网上找到有参考价值的内容。

        3、通过异常日志的堆栈,查看源码。

        SplitTableRegionProcedure.splitStoreFiles时,要获取缓存,缓存的大小是二级缓存配置的:hbase.bucketcache.size的大小,配置的是16G,master本身只有16G,所以申请时内存不够导致异常。

        升级master内存到32G以后,还需要配置-XX:MaxDirectMemorySize=18G。

        4、导致操作有延迟的原因就是当读写的表正在做split时,要先lock该表,等结束时unlock,在split有异常的情况下,会有3秒中左右的时间这个表是不能用的,跟延迟时间相符。

三、结论

        hbase.bucketcache.size不止是L2级缓存,在splitTableRegion时也会被HMaster当作缓存来用,所以HMaster的配置也不能太低(为了节省成本,HMaster机器的配置是普通机器,16G内存,而Region server都是高配64G内存)。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值