本帖最后由 xiaozuoquan 于 2019-7-5 13:23 编辑
步骤:
- 创建一个包含Spark应用程序的组件,可以部署在Kubernetes(即Docker容器)上
- 让自己处于启动Spark工作的位置
- 提交要在Kubernetes上运行的Spark应用程序
- 在Kubernetes上运行Spark Shell
详细过程涉及更多一点。 以下是本指南涵盖的内容概述:
Apache Spark作业和部署模式
驱动程序本身可以在启动环境的外部或内部运行(“客户端模式”与“集群模式”)。
获取 Apache Spark
[AppleScript] 纯文本查看 复制代码 wget http://archive.apache.org/dist/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.6.tgz
tar xvzf spark-2.4.0-bin-hadoop2.6.tgz
cd spark-2.4.0-bin-hadoop2.6
创建 Spark Docker 镜像
下面代码 使用 ab6539763/xiaozuoquan仓库 在 Docker Hub
[AppleScript] 纯文本查看 复制代码 [root@master spark-2.4.0-bin-hadoop2.6]# export DOCKER_IMAGE=ab6539763/xiaozuoquan:spark-2.4.0-hadoop-2.6
[root@master spark-2.4.0-bin-hadoop2.6]# docker build -t $DOCKER_IMAGE -f kubernetes/dockerfiles/spark/Dockerfile .
Sending build context to Docker daemon 256.2MB
Step 1/15 : FROM openjdk:8-alpine
---> a3562aa0b991
Step 2/15 : ARG spark_jars=jars
---> Using cache
---> dd976e5ec7c2
Step 3/15 : ARG img_path=kubernetes/dockerfiles
---> Using cache
---> 51ea0992c098
Step 4/15 : ARG k8s_tests=kubernetes/tests
---> Using cache
---> a82f513cb245
Step 5/15 : RUN set -ex && apk upgrade --no-cache && apk add --no-cache bash tini libc6-compat linux-pam && mkdir -p /opt/spark && mkdir -p /opt/spark/work-dir && touch /opt/spark/RELEASE && rm /bin/sh && ln -sv /bin/bash /bin/sh && echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && chgrp root /etc/passwd && chmod ug+rw /etc/passwd
---> Using cache
---> d23ff7efb7ef
Step 6/15 : COPY ${spark_jars} /opt/spark/jars
---> Using cache
---> 08989bbc0bc2
Step 7/15 : COPY bin /opt/spark/bin
---> Using cache
---> c190400b4bfe
Step 8/15 : COPY sbin /opt/spark/sbin
---> Using cache
---> 3afb65ed2ddc
Step 9/15 : COPY ${img_path}/spark/entrypoint.sh /opt/
---> Using cache
---> f8e4653fed2d
Step 10/15 : COPY examples /opt/spark/examples
---> Using cache
---> 55f649559fb4
Step 11/15 : COPY ${k8s_tests} /opt/spark/tests
---> Using cache
---> 7487005d56ce
Step 12/15 : COPY data /opt/spark/data
---> Using cache
---> 9999f9a6d47a
Step 13/15 : ENV SPARK_HOME /opt/spark
---> Using cache
---> c62b3ee36171
Step 14/15 : WORKDIR /opt/spark/work-dir
---> Using cache
---> 306e7af338b2
Step 15/15 : ENTRYPOINT [ "/opt/entrypoint.sh" ]
---> Using cache
---> 501ddb9b656c
Successfully built 501ddb9b656c
Successfully tagged ab6539763/xiaozuoquan:spark-2.4.0-hadoop-2.6
[root@master spark-2.4.0-bin-hadoop2.6]#
[root@master spark-2.4.0-bin-hadoop2.6]# docker push $DOCKER_IMAGE
The push refers to repository [docker.io/ab6539763/xiaozuoquan]
211eade71ef8: Pushed
5260b5bec617: Pushed
64d5f2d08a3a: Pushed
aee1e5aa69e9: Pushed
ff17e3b7f9d4: Pushed
b67f640657c6: Pushed
spark 应用使用自己的账户 spark-sa
[AppleScript] 纯文本查看 复制代码 [root@master ~]# kubectl create namespace spark1
namespace/spark1 created
[root@master ~]# kubectl create serviceaccount spark-sa -n spark1
serviceaccount/spark-sa created
[root@master ~]# kubectl create rolebinding spark-sa-rb --clusterrole=edit --serviceaccount=spark1:spark-sa -n spark1
rolebinding.rbac.authorization.k8s.io/spark-sa-rb created
运行Job样例 (集群模式)
设置k8s用户验证环境
[AppleScript] 纯文本查看 复制代码
[root@master ~]# export NAMESPACE=spark1
[root@master ~]# export SA=spark-sa
[root@master ~]# export K8S_CACERT=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
[root@master ~]# export K8S_TOKEN=/var/run/secrets/kubernetes.io/serviceaccount/token
[root@master ~]#
运行spark样例Job
[AppleScript] 纯文本查看 复制代码 bin/spark-submit --name sparkpi-1 \
--master k8s://https://kubernetes.default.svc.cluster.local:443 \
--deploy-mode cluster \
--class org.apache.spark.examples.SparkPi \
--conf spark.kubernetes.driver.pod.name=$DRIVER_NAME \
--conf spark.kubernetes.authenticate.submission.caCertFile=$K8S_CACERT \
--conf spark.kubernetes.authenticate.submission.oauthTokenFile=$K8S_TOKEN \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=$SA \
--conf spark.kubernetes.namespace=$NAMESPACE \
--conf spark.executor.instances=3 \
--conf spark.kubernetes.container.image=$DOCKER_IMAGE \
local:///usr/local/spark/examples/jars/spark-examples_2.11-2.4.0.jar 1000
|