
pxz79
node name: N/A
start time: N/A
container images: N/A
phase: Pending
status: []
.......
The output of the Spark Driver can be viewed by executing
kubectl logs
pod_name
and
looking at the pod's logs. The pod's name is displayed near the top of the console output, as
shown in the preceding example. Execute the
kubectl logs
pod_name
command and
grep
the output, as shown below:
$
kubectl logs spark-pi-1519683406605-driver | grep "is roughly"
Pi is roughly 3.1351356756783786
Spark Executor pods are cleaned up and deleted after they finish running. Therefore, their output
is not accessible.
Running a Spark Pi Example Job
A Pyspark pi example job is very similar to a Scala Spark PI, but information is specified slightly
differently. If there are any JAR files, they should be provided via the
--jars
flag.
$
bin/spark-submit --conf spark.app.name=pyspark-pi \
--jars local:///opt/spark/examples/target/scala-2.11/jars/spark-
examples_2.11-2.2.0-k8s-0.5.0.jar \
local:///opt/spark/examples/src/main/python/pi.py
Execute the
kubectl logs
pod_name
command and
grep
the output, as shown below:
#
kubectl logs pyspark-pi-1519684161476-driver | grep "is roughly"
Pi is roughly 3.141600
Using HDFS
The
HADOOP_CONF_DIR
parameter will automatically be set to the appropriate value for the current user during
Spark start up.
How to Run Jobs and Use the Resource Staging Server
Simply provide the location of the Spark jar and files on the local file system and they will be loaded into the Spark
Resource Staging Server so that resources will be available inside the Spark container.
$ bin/spark-submit --class TriangleCounts --conf
spark.app.name=spark-triangles \
/home/users/builder/nid00006/workspace/socrates-cactus-spark-tests/
target/scala-2.11/spark-tests_2.11-1.0.jar \
/user/builder/datasets/cactus-spark-triangles/small-triangles.txt
Check the results using the pod name.
System Management
S3016
61