在flink上提交任务的时候,获取Oracle的数据计算后可以正常回调,但是获取hbase的数据后,只能回调一次后,task就卡住了,最后通过修改元空间 增加启动yjm和ytm搞定
jobmanage日志如下:
7:45:11.443 TKD [main-EventThread] ERROR o.a.f.s.c.o.a.c.ConnectionState - Authentication failed
17:45:20.998 TKD [flink-rest-server-netty-worker-thread-16] ERROR o.a.f.r.r.h.t.TaskManagerDetailsHandler - Unhandled exception.
org.apache.flink.runtime.resourcemanager.exceptions.UnknownTaskExecutorException: No TaskExecutor registered under container_e16_1593566817793_0659_01_000002.
at org.apache.flink.runtime.resourcemanager.ResourceManager.requestTaskManagerInfo(ResourceManager.java:532)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:279)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:194)
at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
at scala.PartialFunction.applyOrElse$(PartialFunction.scala:123)
at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
at akka.actor.Actor.aroundReceive(Actor.scala:517)
at akka.actor.Actor.aroundReceive$(Actor.scala:515)
at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
at akka.actor.ActorCell.invoke(ActorCell.scala:561)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
at akka.dispatch.Mailbox.run(Mailbox.scala:225)
at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
17:46:48.520 TKD [flink-akka.actor.default-dispatcher-20] ERROR o.a.f.r.r.h.t.TaskManagerDetailsHandler - Unhandled exception.
做过如下验证:
1.查看是否是数据源的问题 (排除);
2.查看是否是回调的问题(排除);
3.查看是否是hbase的数据流没有关闭的问题(排除);
4.查看是否是habse数据库的表太大的问题(排除);
5.查看是否算法太复杂问题(排除);
6.查看是否是环境问题(排除);
7.断点发现不取hbase的数据就没有问题,但是运行多次后也出现这个问题;
8.查看是否是内存问题,本地的在远程配置环境 配置:flink/bin/config.sh ,然后本地在jconsole或者jxm里面查看jvm情况
DEFAULT_ENV_PID_DIR="/tmp" # Directory to store *.pid files to
DEFAULT_ENV_LOG_MAX=5 # Maximum number of old log files to keep
DEFAULT_ENV_JAVA_OPTS="" # Optional JVM args
DEFAULT_ENV_JAVA_OPTS_JM="" # Optional JVM args (JobManager)
DEFAULT_ENV_JAVA_OPTS_TM="-Djava.rmi.server.hostname=192.168.10.100 -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=10099 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false" # Optional JVM args (TaskManager)
DEFAULT_ENV_JAVA_OPTS_HS="" # Optional JVM args (HistoryServer)
DEFAULT_ENV_SSH_OPTS="" # Optional SSH parameters running in cluster mode
DEFAULT_YARN_CONF_DIR="" # YARN Configuration Directory, if necessary
DEFAULT_HADOOP_CONF_DIR="" # Hadoop Configuration Directory, if necessary
9.发现是内存问题,然后配置flink的元空间大小,jdk8以后将永久代换为了元空间
Flink托管内存(用于中间结果的缓存):taskmanager.memory.managed.size: 2048m(*)
TaskExecutor的框架堆外内存大小:taskmanager.memory.framework.off-heap.size: 1024m(*)
TaskExecutor的框架堆内存大小:taskmanager.memory.framework.heap.size: 1024m
TaskExecutor的JVM元空间大小:taskmanager.memory.jvm-metaspace.size: 2048m(*)
10.启动的时候,注意不能太小,我的如下
bin/flink run -m yarn-cluster -p 3 -c test.FlinkDemo -yjm 2048m -ytm 8096m /test/testFlink.jar

在Flink作业中,从Oracle获取数据并计算后可以正常回调,但使用HBase作为数据源时,任务仅能成功回调一次,之后task无限重启。经过一系列排查,包括数据源、回调、HBase流关闭、数据表大小、算法复杂度和环境问题,最终确定问题是由于内存不足,尤其是元空间不足导致。解决方法是调整Flink的元空间大小,特别是对于JDK8及以上版本,将永久代替换为元空间,并在启动时确保设置适当的大小。

4705

被折叠的 条评论
为什么被折叠?



