Unable to get the Job status and Group ID java- spark standalone program with databricks

package com.dataguise.test;

import java.io.IOException;

import java.util.concurrent.CountDownLatch;

import java.util.concurrent.TimeUnit;

import org.apache.spark.SparkContext;

import org.apache.spark.SparkJobInfo;

import org.apache.spark.SparkStageInfo;

import org.apache.spark.SparkStatusTracker;

import org.apache.spark.api.java.JavaSparkContext;

import org.apache.spark.api.java.JavaSparkStatusTracker;

import org.apache.spark.launcher.SparkAppHandle;

import org.apache.spark.launcher.SparkAppHandle.State;

import org.apache.spark.launcher.SparkLauncher;

import org.apache.spark.scheduler.SparkListener;

import org.apache.spark.sql.Dataset;

import org.apache.spark.sql.Row;

import org.apache.spark.sql.SparkSession;

import org.apache.spark.sql.types.DataTypes;

import com.google.gson.Gson;

import com.google.gson.GsonBuilder;

public class Dataframetest {

public static void main(String[] args) throws IOException, InterruptedException {

// TODO Auto-generated method stub

SparkSession sess = SparkSession.builder().appName("dataframetest").master("local[*]").getOrCreate();

sess.conf().set(key, value);

sess.sparkContext().hadoopConfiguration().set(key, value);

Gson gson = new GsonBuilder().setPrettyPrinting().create();

String inputPaths = "abfss://folder1/testing.orc";

String[] inputFiles = inputPaths.split(",");

Dataset<Row> csvRead = sess.read().format("orc").load(inputFiles).

withColumn("dg_filename", org.apache.spark.sql.functions.input_file_name())

.withColumn("dg_metadata", org.apache.spark.sql.functions.lit(null).cast(DataTypes.StringType));

csvRead.show(1000, false);

}

With this program, we are successfully able to submit the job on the cluster and it is completing successfully. But I am not able to get the job status and group ID in the code. I need to get the job status in the program for internal use.

Anyone, please help me with this.

Jul 23, 2020 in Apache Spark by kamboj
• 140 points
recategorized Jul 28, 2020 by Gitika • 2,704 views

Hi, @Kamboj,

Are you facing any kind of error on the way? Or you are not able to get the job status and group ID in the code?

commented Jul 27, 2020 by Gitika
• 65,730 points

Hi @Gitika,

I am not facing any type of error. Actually I am not sure how to get the job status and job id etc in the code (using spark with databricks). I have used the below mentioned code in my program as well.

JavaSparkContext jsc = JavaSparkContext.fromSparkContext(sess.sparkContext());
JavaSparkStatusTracker statusTracker = jsc.statusTracker();
int[] a=statusTracker.getActiveJobIds();
for(int jobId: a) {
SparkJobInfo jobInfo = statusTracker.getJobInfo(jobId);
System.out.println("Job " + jobId + " status is " + jobInfo.status().name());
System.out.println("Stages status:");

for(int stageId: jobInfo.stageIds()) {
SparkStageInfo stageInfo = statusTracker.getStageInfo(stageId);

System.out.println("Stage id=" + stageId + "; name = " + stageInfo.name()
+ "; completed tasks:" + stageInfo.numCompletedTasks()
+ "; active tasks: " + stageInfo.numActiveTasks()
+ "; all tasks: " + stageInfo.numTasks()
+ "; submission time: " + stageInfo.submissionTime());
}
}

However method "statusTracker.getActiveJobIds()" returns null value.

commented Jul 28, 2020 by kamboj
• 140 points
edited Jul 28, 2020 by kamboj

No Success, can anyone help me out to get the status of spark job.

commented Aug 5, 2020 by kamboj
• 140 points

@Kamboj,

You can run Databricks jobs CLI subcommands by appending them to databricks jobs and job run commands by appending them to databricks runs.

Bash

databricks jobs -h

commented Aug 5, 2020 by Sailesh Mishra

Is there any option to get the job status by using Databricks Spark classes or methods like I have been trying in my code snipped, I am using "JavaSparkStatusTracker" class but not getting the job status.

commented Aug 11, 2020 by kamboj
• 140 points

I don't know in this way it will work or not. But if your requirement is to find Job_ID and status then you can use databricks command in a script and run it.

commented Aug 11, 2020 by MD
• 95,460 points

Thanks for the quick response MD.

Could you please elaborate it little more, how I can use it and how it will help me to get the status for a particular ID. I think i will provide the results of all the running ID. What if I need status for a particular job ID.

commented Aug 12, 2020 by kamboj
• 140 points

Hi,

you can track the status of jobs from inside the application by registering a SparkListener with SparkContext.addSparkListener. You can go through the below link for similar kinds of examples.