shell脚本调用spark-sql

最新推荐文章于 2023-07-31 09:26:33 发布

原创

最新推荐文章于 2023-07-31 09:26:33 发布 · 1.5k 阅读

标签

#shell #spark

这篇博客介绍了如何通过shell脚本来调用和执行Spark SQL命令，内容包括将shell脚本与SQL文件打包成zip文件，并详细阐述了具体的调用过程。

#!/bin/sh
#set -x

#########################
#author : robin
#version : v3.0
#
#########################
#$1 : start time for business circle
#$2 : end time for business circle
#$3 : start time of slowly changing dimension for SF orginazation 
#$4 : spark parameter list, splited by ','
#spark parameter comment:
#1. value of queue name
#2. value of spark job name
#3. value of driver-memory
#4. value of num-executors
#5. value of executor-cores
#6. value of executor-memory
#7. value of spark.yarn.executor.memoryOverhead
#8. value of spark.sql.shuffle.partitions  
#             
#deprecated parameters:
#spark.storage.memoryFraction
#spark.shuffle.memoryFraction
#[As of Spark 1.6, execution and storage memory management are unified. All memory fractions used in the old model are now deprecated and no longer read. If you wish to use the old memory management, you may explicitly enable `spark.memory.useLegacyMode` (not recommended).]


#############################
# declare function
#############################

function gen_spark_cmd() {
   
   
#change parameter 4 into an array
ARR_SPARK_PARA=($(echo ${
    
    V_SPARK_PARA_LIST}|tr "," " "));

#count array length
ARR_PARA_LEN=${
   
   #ARR_SPARK_PARA[@]};

#set value of queue name
if [ -n "${ARR_SPARK_PARA[0]}" ]; then 
    V_QUEUE_NAME=${ARR_SPARK_PARA[0]};
else
    V_QUEUE_NAME=default;
fi

#set value of spark job name
if [ -n "${ARR_SPARK_PARA[1]}" ]; then 
    V_JOB_NAME=${ARR_SPARK_PARA[1]};
else
    V_JOB_NAME=spark_script;
fi
    
#set value of driver-memory
if [ -n "${ARR_SPARK_PARA[2]}" ]; then 
    V_DRIVER_MEM=${ARR_SPARK_PARA[2]};
else
    V_DRIVER_MEM=1G;
fi    

#set value of num-executors
if [ -n "${ARR_SPARK_PARA[3]}" ]; then

最低0.47元/天解锁文章