Trending questions in Apache Spark

0 votes
1 answer

how create distance vector in pyspark (Euclidean distance)

Hi@dani, You can find the euclidean distance using ...READ MORE

Oct 16, 2020 in Apache Spark by MD
• 95,460 points
4,941 views
0 votes
0 answers

What allows spark to periodically persist data about an application such that it can recover from failures? [closed]

What allows spark to periodically persist data ...READ MORE

Nov 26, 2020 in Apache Spark by ritu
• 960 points

closed Nov 26, 2020 by MD 3,086 views
0 votes
1 answer
0 votes
1 answer

4)Spark streaming converts streaming data into DStreams. which one of the given statements about DStreams is True?

Hi@ritu, Spark DStream (Discretized Stream) is the basic ...READ MORE

Nov 23, 2020 in Apache Spark by MD
• 95,460 points
2,922 views
0 votes
1 answer

The number of stages in a job is equal to the number of RDDs in DAG. however, under one of the cgiven conditions, the scheduler can truncate the lineage. identify it.

Hi@ritu, Spark's internal scheduler may truncate the lineage of the RDD graph if ...READ MORE

Nov 25, 2020 in Apache Spark by akhtar
• 38,260 points
2,810 views
0 votes
1 answer

Spark Core How to fetch max n rows of an RDD function without using Rdd.max()

Hi@Prasant, If Spark Streaming is not supporting tuple, ...READ MORE

Dec 3, 2020 in Apache Spark by MD
• 95,460 points
2,324 views
+1 vote
1 answer

How to write Spark DataFrame to Avro Data File?

Hi@akhtar, Since Avro library is external to Spark, ...READ MORE

Nov 4, 2020 in Apache Spark by MD
• 95,460 points
3,501 views
0 votes
1 answer

which one of the following commands is used to see the structure of the Dataframe?

Hi @Ritu If you want to see the ...READ MORE

Nov 25, 2020 in Apache Spark by Gitika
• 65,770 points
2,628 views
0 votes
1 answer

6)What allows spark streaming to provide fault tolerance for network sources of data?

Hi@ritu, Fault tolerance is the property that enables ...READ MORE

Dec 1, 2020 in Apache Spark by MD
• 95,460 points
2,543 views
0 votes
0 answers

ForeachPartition called with python functions doesnot output data to HDFS when executed on YARN mode

Hi, I need help with my code. Trying ...READ MORE

Jan 22, 2021 in Apache Spark by anonymous

edited 4 days ago 6 views
0 votes
1 answer

How to read a dataframe based on an avro schema?

Hi, I am able to understand your requirement. ...READ MORE

Oct 30, 2020 in Apache Spark by MD
• 95,460 points
3,517 views
0 votes
0 answers
0 votes
1 answer

How do you load this multiline data in spark as a single record?

Hi@Ruben, I think you can add an escape ...READ MORE

Nov 23, 2020 in Apache Spark by MD
• 95,460 points
2,355 views
0 votes
1 answer

What does the following code print?

error: expected class or object definition sc.parallelize (Array(1L, ...READ MORE

Nov 25, 2020 in Apache Spark by Gitika
• 65,770 points
2,123 views
0 votes
1 answer

16)What allows spark to periodically persist data about an application such that it can recover from failures?

Hi@Edureka, Checkpointing is a process of truncating RDD ...READ MORE

Nov 26, 2020 in Apache Spark by MD
• 95,460 points
2,037 views
0 votes
1 answer

7)From Schema RDD, data can be cache by which one of the given choices?

Hi, @Ritu, According to the official documentation of Spark 1.2, ...READ MORE

Nov 23, 2020 in Apache Spark by Gitika
• 65,770 points
2,110 views
0 votes
1 answer

Which one of the following commands is used to start python-spark?

Hi@ritu, To start your python spark shell, you ...READ MORE

Nov 26, 2020 in Apache Spark by MD
• 95,460 points
1,585 views
0 votes
0 answers

17)from the given choices, identify the value returned by $"whatever"?

17)from the given choices, identify the value ...READ MORE

Nov 25, 2020 in Apache Spark by ritu
• 960 points
1,673 views
0 votes
1 answer

In AWS, if user wants to run spark, then on top of which one of the following can the user do it?

Hi@ritu, AWS has lots of services. For spark ...READ MORE

Nov 26, 2020 in Apache Spark by MD
• 95,460 points
1,561 views
0 votes
1 answer

From the below code. what is the most appropriate next step in ML process?

Hi@ritu, The most appropriate step according to me ...READ MORE

Nov 25, 2020 in Apache Spark by MD
• 95,460 points
1,358 views
0 votes
1 answer

What does the below code print?

Option d) Run time error. READ MORE

Nov 25, 2020 in Apache Spark by Gitika
• 65,770 points
1,349 views
0 votes
1 answer

How to insert data into Cassandra table using Spark DataFrame?

Hi@akhtar, You can write the spark dataframe in ...READ MORE

Sep 21, 2020 in Apache Spark by MD
• 95,460 points
4,067 views
0 votes
0 answers

What does the below code print? [closed]

What does the below code print? val AgeDs ...READ MORE

Nov 25, 2020 in Apache Spark by ritu
• 960 points

closed Nov 25, 2020 by Gitika 1,257 views
0 votes
1 answer

What class is declared in the blow code?

Option D: String class READ MORE

Nov 26, 2020 in Apache Spark by Gitika
• 65,770 points
1,139 views
0 votes
1 answer

13)Refer the input and identify the output if the below code is run

Option c)  Run time error - A READ MORE

Nov 25, 2020 in Apache Spark by Gitika
• 65,770 points
1,176 views
0 votes
1 answer

How to read Avro Partition Data?

Hi@akhtar, When we try to retrieve the data ...READ MORE

Nov 4, 2020 in Apache Spark by MD
• 95,460 points
2,004 views
0 votes
1 answer

What is the output of the following code?

After executing your code, there is an ...READ MORE

Nov 25, 2020 in Apache Spark by Gitika
• 65,770 points
1,080 views
0 votes
1 answer

From the following graph code ,which code snippet will return the no.of flight routes?

Hey, @Ritu, I am getting error in your ...READ MORE

Nov 25, 2020 in Apache Spark by Gitika
• 65,770 points
1,077 views
0 votes
1 answer

What is the output of the following code?

rror: expected class or object definition sc.parallelize(Array(1L,("SFO")),(2L,("ORD")),(3L,("DFW")))) ^ one error ...READ MORE

Nov 26, 2020 in Apache Spark by Gitika
• 65,770 points
1,003 views
+1 vote
1 answer

How to assign a column in Spark Dataframe (PySpark) as a Primary Key?

spark do not have any concept of ...READ MORE

Jan 12, 2020 in Apache Spark by Sirish
• 160 points
14,634 views
0 votes
0 answers

What is the output of the following code? [closed]

What is the output of the following ...READ MORE

Nov 25, 2020 in Apache Spark by Edureka
• 200 points

closed Nov 26, 2020 by MD 875 views
0 votes
1 answer

Spark - how the solve the below question?

option d, Runtime error READ MORE

Nov 23, 2020 in Apache Spark by Gitika
• 65,770 points
862 views
0 votes
1 answer

2)What will be printed when the below code is executed ?

Hi, @Ritu, List(5,100,10) is printed. The take method returns the first n elements in ...READ MORE

Nov 23, 2020 in Apache Spark by Gitika
• 65,770 points
860 views
0 votes
1 answer

Facing issue while reading tsv file in pyspark

Hi@khyati, You are getting this type of output ...READ MORE

Sep 28, 2020 in Apache Spark by MD
• 95,460 points
2,691 views
0 votes
1 answer

How to implement my clustering algorithm in pyspark (without using the ready library for example k-means)?

Hi@dani, As you said you are a beginner ...READ MORE

Oct 14, 2020 in Apache Spark by MD
• 95,460 points
1,850 views
0 votes
1 answer

File not found exception while processing the spark job in yarn cluster mode with multinode hadoop cluster

Hi@Ganendra, I am not sure what's the issue, ...READ MORE

Jul 30, 2020 in Apache Spark by MD
• 95,460 points
4,846 views
+1 vote
1 answer

How to convert pyspark Dataframe to pandas Dataframe?

Hi@akhtar, To convert pyspark dataframe into pandas dataframe, ...READ MORE

May 7, 2020 in Apache Spark by MD
• 95,460 points
8,305 views
0 votes
1 answer
0 votes
1 answer

Ranger KMS - Curl command

Hi@Shllpa, In general, we get the 401 status code ...READ MORE

Sep 29, 2020 in Apache Spark by MD
• 95,460 points
1,545 views
0 votes
2 answers

Error : split value is not a member of org.apache.spark.sql.Row

var d=rdd2col.rdd.map(x=>x.split(",")) or val names=rd ...READ MORE

Aug 5, 2020 in Apache Spark by Ramkumar Ramasamy.
12,265 views
0 votes
1 answer

Py4JJavaError: An error occurred while calling o310.csv. : java.net.ConnectException: Call From master/192.168.56.101 to master:9000

Hi@akhtar, I think your HDFS cluster is not ...READ MORE

May 7, 2020 in Apache Spark by MD
• 95,460 points
7,667 views
0 votes
1 answer

I am not able to run the apache spark program in mac oc

Hi@Srinath, It seems you didn't set Hadoop for ...READ MORE

Sep 21, 2020 in Apache Spark by MD
• 95,460 points
1,469 views
0 votes
1 answer

Unable to submit the spark job in deployment mode - multinode cluster(using ubuntu machines) with yarn master

Hi@Ganendra, As you said you launched a multinode cluster, ...READ MORE

Jul 29, 2020 in Apache Spark by MD
• 95,460 points
2,481 views
0 votes
1 answer

how can I get all executors' pending jobs and stages of particular sparksession?

Hi@Neha, You can find all the job status ...READ MORE

Aug 19, 2020 in Apache Spark by MD
• 95,460 points
1,317 views
0 votes
0 answers

Unable to get the Job status and Group ID java- spark standalone program with databricks

package com.dataguise.test; import java.io.IOException; import java.util.concurrent.CountDownLatch; import java.util.concurrent.TimeUnit; import org.apache.spark.SparkContext; import org.apache.spark.SparkJobInfo; import ...READ MORE

Jul 23, 2020 in Apache Spark by kamboj
• 140 points

recategorized Jul 28, 2020 by Gitika 2,471 views
0 votes
1 answer

How to create a not null column in case class in spark

Hi@Deepak, In your test class you passed empid ...READ MORE

May 14, 2020 in Apache Spark by MD
• 95,460 points
5,320 views
0 votes
1 answer

Spark: java.sql.SQLException: No suitable driver

The missing driver is the JDBC one ...READ MORE

Jul 24, 2019 in Apache Spark by John
17,891 views
0 votes
1 answer

env: ‘python’: No such file or directory in pyspark.

Hi@akhtar, This error occurs because your python version ...READ MORE

Apr 7, 2020 in Apache Spark by MD
• 95,460 points
6,560 views
0 votes
1 answer

org.apache.spark.sql.AnalysisException: cannot resolve "`id`" given input columns

I have used a header-less csv file ...READ MORE

Jul 14, 2019 in Apache Spark by Puneet
17,825 views