要通过web页面查看运行日志,需要启动两个东西
hadoop启动jobhistoryserver和spark的history-server.
相关配置文件:
etc/hadoop/mapred-site.xml
<!--配置jobhistory的地址和web管理地址-->
<property>
<name>mapreduce.jobhistory.address</name>
<value>spark-master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>spark-master:19888</value>
</property>
yarn-site.xml
<!-- 是否开启聚合日志 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 配置日志服务器的地址,work节点使用 -->
<property>
<name>yarn.log.server.url</name>
<value>http://spark-master:19888/jobhistory/logs/</value>
</property>
<!-- 配置日志过期时间,单位秒 -->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>86400</value>
</property>
spark-defaults.conf
spark.eventLog.enabled=true
spark.eventLog.compress=true
#保存在本地
#spark.eventLog.dir=file://usr/local/hadoop-2.7.6/logs/userlogs
#spark.history.fs.logDirectory=file://usr/local/hadoop-2.7.6/logs/userlogs
#保存在hdfs上
spark.eventLog.dir=hdfs://spark-master:9000/tmp/logs/root/logs
spark.history.fs.logDirectory=hdfs://spark-master:9000/tmp/logs/root/logs
spark.yarn.historyServer.address=spark-master:18080
启动
1.首先启动 hadoop的jobhistory
[root@spark-master hadoop-2.7.6]# sbin/mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /usr/local/hadoop-2.7.6/logs/mapred-root-historyserver-spark-master.out
2.启动spark的history-server
[root@spark-master spark-2.3.0]# sbin/start-history-server.sh
starting org.apache.spark.deploy.history.HistoryServer, logging to /usr/local/spark-2.3.0/logs/spark-root-org.apache.spark.deploy.history.HistoryServer-1-spark-master.out
如果配置正确,启动完成之后,就可以访问18080 和19888
效果图:


运行测试例子
spark运行机制有机制,基于local模式,standalone,和yarn模式.
三种模式的命令有一些不一样.
local
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master local[4] --driver-memory 4g --executor-memory 2g --executor-cores 1 examples/jars/spark-examples_2.11-2.3.0.jar 1
standalone
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://spark-master:6066 --deploy-mode cluster --driver-memory 4g --executor-memory 2g --executor-cores 1 examples/jars/spark-examples_2.11-2.3.0.jar 1
yarn模式
bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --driver-memory 1g --executor-memory 1g examples/jars/spark-examples_2.11-2.3.0.jar 1
我们这里面探讨的是 spark on yarn模式下,查看日志的流程.
本文介绍了如何在YARN上运行Spark并查看日志,包括配置文件如mapred-site.xml、yarn-site.xml和spark-defaults.conf的设置,以及启动Hadoop的jobhistoryserver和Spark的history-server。在配置正确后,可通过18080和19888端口访问日志。文章还讨论了Spark的运行模式,特别是Spark on YARN模式下的日志查看流程。

426

被折叠的 条评论
为什么被折叠?



