一 Spark运行环境
Spark是Scala写的,运行在JVM上,所以运行环境Java7+
如果使用Python API,需要安装Python 2.6+或者运行Python3.4+
Spark 1.6.2-Scala 2.10 Spark 2.0.0+Scala2.11
二 Spark下载
下载地址:
搭Spark不需要Hadoop,如有hadoop集群,可下载相应的版本。
解压
三 Spark目录
bin包含用来和Spark交互的可执行文件,如Spark shell
core,streaming,python,...包含主要组件的源代码。
examples包含一些单机spark job,你可以研究和运行这些例子。
四 Spark的Shell
Spark的shell使你能够处理分布在集群上的数据。
Spark把数据加载到节点的内存中,因此分布式处理可在秒级完成。
快速使迭代式计算,实时查询、分析一般能够在shells中完成。
Spark提供了Python shells和Scala shells。
五 Python Shell进入方法
[root@master bin]# ./pyspark
Python 2.7.2 (default, Jan 6 2018, 08:58:52)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux3
Type "help", "copyright", "credits" or "license" for more information.
Traceback (most recent call last):
File "/opt/spark-2.0.0-bin-hadoop2.7/python/pyspark/shell.py", line 28, in <module>
import py4j
zipimport.ZipImportError: can't decompress data; zlib not available
>>> exit();
六 Scala Shell进入方法
[root@master bin]# ./spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
18/02/04 18:40:42 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/02/04 18:40:45 WARN SparkContext: Use an existing SparkContext, some configuration may not take effect.
Spark context Web UI available at http://192.168.0.110:4040
Spark context available as 'sc' (master = local[*], app id = local-1517740844632).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.0.0
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_152)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
七 实战
[root@master ~]# cat helloSpark.txt
go to home hello java
so many to hello word kafka java
go to so
scala> val lines = sc.textFile("/root/helloSpark.txt")
lines: org.apache.spark.rdd.RDD[String] = /root/helloSpark.txt MapPartitionsRDD[1] at textFile at <console>:24
scala> lines.count()
res0: Long = 3
scala> lines.first()
res1: String = go to home hello java
八 修改日志级别
[root@master conf]# cat log4j.properties
log4j.rootCategory=WARN, console

2848

被折叠的 条评论
为什么被折叠?



