详情参考:
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
https://blog.csdn.net/gulugulu_gulu/article/details/105706090
https://blog.csdn.net/yoshubom/article/details/113845190
https://blog.csdn.net/weixin_52918377/article/details/117123969?spm=1001.2101.3001.6650.8&utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7EBlogCommendFromBaidu%7ERate-8-117123969-blog-119000138.pc_relevant_antiscanv3&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7EBlogCommendFromBaidu%7ERate-8-117123969-blog-119000138.pc_relevant_antiscanv3&utm_relevant_index=12

Hive的版本和Spark的版本要匹配;

具体来说,你使用的Hive版本编译时候用的哪个版本的Spark,那么就需要使用相同版本的Spark,可以在Hive的pom.xml中查看spark.version来确定;

Hive root pom.xml’s defines what version of Spark it was built/tested with.

Spark使用的jar包,必须是没有集成Hive的;

也就是说,编译时候没有指定-Phive.

一般官方提供的编译好的Spark下载,都是集成了Hive的,因此这个需要另外编译。

Note that you must have a version of Spark which does not include the Hive jars. Meaning one which was not built with the Hive profile.

如果不注意版本问题,则会遇到各种错误,比如:

Caused by: java.lang.NoClassDefFoundError: org/apache/hive/spark/client/Job
Caused by: java.lang.ClassNotFoundException: org.apache.hive.spark.client.Job

编译很简单,下载spark-1.5.0的源码,使用命令:

mvn -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0-cdh5.0.0 -DskipTests -Dscala-2.10 clean package

Hive3.1.2和Spark3.0.x版本冲突
https://blog.csdn.net/rfdjds/article/details/125389450

Logo

腾讯云面向开发者汇聚海量精品云计算使用和开发经验,营造开放的云计算技术生态圈。

更多推荐