Unable to get the Job status and Group ID java- spark standalone program with databricks

0 votes
package com.dataguise.test;

import java.io.IOException;

import java.util.concurrent.CountDownLatch;

import java.util.concurrent.TimeUnit;

import org.apache.spark.SparkContext;

import org.apache.spark.SparkJobInfo;

import org.apache.spark.SparkStageInfo;

import org.apache.spark.SparkStatusTracker;

import org.apache.spark.api.java.JavaSparkContext;

import org.apache.spark.api.java.JavaSparkStatusTracker;

import org.apache.spark.launcher.SparkAppHandle;

import org.apache.spark.launcher.SparkAppHandle.State;

import org.apache.spark.launcher.SparkLauncher;

import org.apache.spark.scheduler.SparkListener;

import org.apache.spark.sql.Dataset;

import org.apache.spark.sql.Row;

import org.apache.spark.sql.SparkSession;

import org.apache.spark.sql.types.DataTypes;

import com.google.gson.Gson;

import com.google.gson.GsonBuilder;

public class Dataframetest {

public static void main(String[] args) throws IOException, InterruptedException {

// TODO Auto-generated method stub

SparkSession sess = SparkSession.builder().appName("dataframetest").master("local[*]").getOrCreate();

sess.conf().set(key, value);

sess.sparkContext().hadoopConfiguration().set(key, value);

Gson gson = new GsonBuilder().setPrettyPrinting().create();

String inputPaths = "abfss://folder1/testing.orc";

String[] inputFiles = inputPaths.split(",");

Dataset<Row> csvRead = sess.read().format("orc").load(inputFiles).

withColumn("dg_filename", org.apache.spark.sql.functions.input_file_name())

.withColumn("dg_metadata", org.apache.spark.sql.functions.lit(null).cast(DataTypes.StringType));

csvRead.show(1000, false);

}


With this program, we are successfully able to submit the job on the cluster and it is completing successfully. But I am not able to get the job status and group ID in the code. I need to get the job status in the program for internal use.

Anyone, please help me with this.

Jul 23, 2020 in Apache Spark by kamboj
• 140 points

recategorized Jul 28, 2020 by Gitika 2,281 views

Hi, @Kamboj,

Are you facing any kind of error on the way? Or you are not able to get the job status and group ID in the code?

Hi @Gitika,

I am not facing any type of error.  Actually I am not sure how to get the job status and job id etc in the code (using spark with databricks). I have used the below mentioned code in my program as well.

 JavaSparkContext jsc = JavaSparkContext.fromSparkContext(sess.sparkContext());
        JavaSparkStatusTracker statusTracker = jsc.statusTracker();
       int[] a=statusTracker.getActiveJobIds();
        for(int jobId: a) {
             SparkJobInfo jobInfo = statusTracker.getJobInfo(jobId);
             System.out.println("Job " + jobId + " status is " + jobInfo.status().name());
             System.out.println("Stages status:");

             for(int stageId: jobInfo.stageIds()) {
                 SparkStageInfo stageInfo = statusTracker.getStageInfo(stageId);

                 System.out.println("Stage id=" + stageId + "; name = " + stageInfo.name()
                            + "; completed tasks:" + stageInfo.numCompletedTasks()
                            + "; active tasks: " + stageInfo.numActiveTasks()
                            + "; all tasks: " + stageInfo.numTasks()
                            + "; submission time: " + stageInfo.submissionTime());
            }
        }

However method "statusTracker.getActiveJobIds()" returns null value.

No Success, can anyone help me out to get the status of spark job.

@Kamboj,

You can run Databricks jobs CLI subcommands by appending them to databricks jobs and job run commands by appending them to databricks runs.

Bash
databricks jobs -h

Is there any option to get the job status by using Databricks Spark  classes or methods like I have been trying in my code snipped, I am using "JavaSparkStatusTracker" class but not getting the job status.

I don't know in this way it will work or not. But if your requirement is to find Job_ID and status then you can use databricks command in a script and run it.

Thanks for the quick response MD.

Could you please elaborate it little more, how I can use it and how it will help me to get the status for a particular ID. I think i will provide the results of all the running ID. What if I need status for a particular job ID.
Hi,

you can track the status of jobs from inside the application by registering a SparkListener with SparkContext.addSparkListener. You can go through the below link for similar kinds of examples.

No answer to this question. Be the first to respond.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.

Related Questions In Apache Spark

0 votes
1 answer

Unable to run the java- spark standalone program

Though there is nothing wrong with the ...READ MORE

answered Jul 30, 2019 in Apache Spark by Lohit
1,268 views
0 votes
1 answer

Unable to submit the spark job in deployment mode - multinode cluster(using ubuntu machines) with yarn master

Hi@Ganendra, As you said you launched a multinode cluster, ...READ MORE

answered Jul 29, 2020 in Apache Spark by MD
• 95,460 points
2,280 views
0 votes
1 answer

Is it possible to run Spark and Mesos along with Hadoop?

Yes, it is possible to run Spark ...READ MORE

answered May 29, 2018 in Apache Spark by Data_Nerd
• 2,390 points
829 views
0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,490 points
13,545 views
0 votes
1 answer

How to get ID of a map task in Spark?

you can access task information using TaskContext: import org.apache.spark.TaskContext sc.parallelize(Seq[Int](), ...READ MORE

answered Nov 20, 2018 in Apache Spark by Frankie
• 9,830 points
3,458 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
11,033 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,540 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
108,853 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
4,616 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP