Spark workers are not accepting any job Kubernetes-Docker-Spark

0 votes

I'm trying to create a distributed spark cluster on kubernetes. for this, I've created a kubernetes cluster and on top of it i'm trying to create a spark cluster. My docker file is

# Copyright (c) Jupyter Development Team.
# Distributed under the terms of the Modified BSD License
ARG BASE_CONTAINER=jupyter/scipy-notebook
FROM $BASE_CONTAINER

LABEL maintainer="Jupyter Project <jupyter@googlegroups.com>"

USER root

# Spark dependencies
ENV SPARK_VERSION 2.3.2
ENV SPARK_HADOOP_PROFILE 2.7
ENV SPARK_SRC_URL https://www.apache.org/dist/spark/spark-$SPARK_VERSION/spark-${SPARK_VERSION}-bin-hadoop${SPARK_HADOOP_PROFILE}.tgz
ENV SPARK_HOME=/opt/spark
ENV PATH $PATH:$SPARK_HOME/bin

RUN apt-get update && \
     apt-get install -y openjdk-8-jdk-headless \
     postgresql && \
    rm -rf /var/lib/apt/lists/*
ENV JAVA_HOME  /usr/lib/jvm/java-8-openjdk-amd64/

ENV PATH $PATH:$JAVA_HOME/bin


    
RUN wget ${SPARK_SRC_URL}
    
RUN tar -xzf spark-${SPARK_VERSION}-bin-hadoop${SPARK_HADOOP_PROFILE}.tgz   

RUN mv spark-${SPARK_VERSION}-bin-hadoop${SPARK_HADOOP_PROFILE} /opt/spark 

RUN rm -f spark-${SPARK_VERSION}-bin-hadoop${SPARK_HADOOP_PROFILE}.tgz

USER $NB_UID
ENV POST_URL https://jdbc.postgresql.org/download/postgresql-42.2.5.jar
RUN wget ${POST_URL}
RUN mv postgresql-42.2.5.jar $SPARK_HOME/jars
# Install pyarrow
RUN conda install --quiet -y 'pyarrow' && \
    conda install pyspark==2.3.2 && \
    conda clean -tipsy && \
    fix-permissions $CONDA_DIR && \
    fix-permissions /home/$NB_USER


USER root

ADD log4j.properties /opt/spark/conf/log4j.properties
ADD start-common.sh start-worker.sh start-master.sh /
ADD loop.sh $SPARK_HOME/bin/
ADD core-site.xml /opt/spark/conf/core-site.xml
ADD spark-defaults.conf /opt/spark/conf/spark-defaults.conf
RUN chmod +x $SPARK_HOME/bin/loop.sh
RUN chmod +x /start-master.sh
RUN chmod +x /start-common.sh
RUN chmod +x /start-worker.sh
ENV PATH $PATH:/opt/spark/bin/loop.sh

RUN apt-get update
RUN apt-get install curl -y

WORKDIR /

and my master and worker yaml files are

kind: ReplicationController
apiVersion: v1
metadata:
  name: spark-master-controller
spec:
  replicas: 1
  selector:
    component: spark-master
  template:
    metadata:
      labels:
        component: spark-master
    spec:
      hostname: spark-master
      containers:
        - name: spark-master
          image: hrafiq/dockerhub:spark-jovyan-local
          command: ["sh", "/start-master.sh", "run"]
          imagePullPolicy: Always
          ports:
            - containerPort: 7077
              hostPort: 7077
            - containerPort: 8080
              hostPort: 8080
            - containerPort: 6066
              hostPort: 6066
            - containerPort: 7001
              hostPort: 7001
            - containerPort: 7002
              hostPort: 7002
            - containerPort: 7003
              hostPort: 7003
            - containerPort: 7004
              hostPort: 7004
            - containerPort: 7005
              hostPort: 7005
            - containerPort: 4040
              hostPort: 4040
          env:
            - name: SPARK_PUBLIC_DNS
              value: 192.168.1.254
            - name: SPARK_MASTER_IP
              value: 192.168.1.254

And worker file

kind: ReplicationController
apiVersion: v1
metadata:
  name: spark-worker-controller
spec:
  replicas: 2
  selector:
    component: spark-worker
  template:
    metadata:
      labels:
        component: spark-worker
    spec:
      containers:
        - name: spark-worker
          image: hrafiq/dockerhub:spark-jovyan-local
          command: ["sh", "/start-worker.sh","run"]
          imagePullPolicy: Always
          ports:
            - containerPort: 8081
            - containerPort: 7012
            - containerPort: 7013
            - containerPort: 7014
            - containerPort: 5001
            - containerPort: 5003
            - containerPort: 8881

Workers get registered with master but still unable to execute any tasks, no cores are assigned to executors and no job executes. This error is displayed

"Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources"

this is the spark UI

Feb 27, 2019 in Apache Spark by Hamza
• 200 points
2,172 views
hey @Hamza, are your spark slaves running?
You usually get this error when there aren't enough resources. Check master UI for workers and resources and compare them with spark submit and check out what's missing

Are you able to dynamically assign the cores?

refer the below code:

val sc = new SparkContext(new SparkConf())
./bin/spark-submit <all your existing options> --conf spark.driver.cores=1

1 answer to this question.

+1 vote

When kubernetes picks 10.*.*.*/16 network as it's pod network then jobs executes successfully. Otherwise when it picks 192.168.*.*/16 subnet as its pod network then jobs does not execute. Due to 192.168.*.* network, it might be conflicting with existing LAN.

answered Mar 1, 2019 by Hamza
• 200 points
Yeah that's a possibility.

The image that you've attached is very unclear, could you add another, clearer image.

Related Questions In Apache Spark

0 votes
1 answer

what are the spark job and spark task and spark staging ?

In a Spark application, when you invoke ...READ MORE

answered Mar 18, 2019 in Apache Spark by Pavan
11,164 views
0 votes
1 answer

what are the job optimization Technics in spark and scala ?

There are different methods to achieve optimization ...READ MORE

answered Mar 18, 2019 in Apache Spark by Veer
2,321 views
0 votes
1 answer
0 votes
1 answer

Is there any way to check the Spark version?

There are 2 ways to check the ...READ MORE

answered Apr 19, 2018 in Apache Spark by nitinrawat895
• 11,380 points
8,553 views
0 votes
1 answer

Why does sortBy transformation trigger a Spark job?

Actually, sortBy/sortByKey depends on RangePartitioner (JVM). So ...READ MORE

answered May 8, 2018 in Apache Spark by kurt_cobain
• 9,350 points
1,880 views
+1 vote
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,490 points
8,455 views
+1 vote
1 answer
0 votes
3 answers

Error while joining cluster with node

Hi Kalgi after following above steps it ...READ MORE

answered Jan 17, 2019 in Others by anonymous
15,477 views
0 votes
2 answers
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP