Trending questions in Apache Spark

0 votes

1 answer

how create distance vector in pyspark (Euclidean distance)

Hi@dani, You can find the euclidean distance using ...READ MORE

Oct 16, 2020 in Apache Spark by MD
• 95,460 points • 5,740 views

0 votes

1 answer

9)In scala, which one of the following will give the top 10 resolutions to the console, assuming that sfpdDF is the DataFrame registered as a table - sfpd?

Hey, @Ritu, According to the question, the answer ...READ MORE

Nov 23, 2020 in Apache Spark by Gitika
• 65,730 points • 3,692 views

0 votes

1 answer

4)Spark streaming converts streaming data into DStreams. which one of the given statements about DStreams is True?

Hi@ritu, Spark DStream (Discretized Stream) is the basic ...READ MORE

Nov 23, 2020 in Apache Spark by MD
• 95,460 points • 3,539 views

0 votes

0 answers

What allows spark to periodically persist data about an application such that it can recover from failures? [closed]

What allows spark to periodically persist data ...READ MORE

Nov 26, 2020 in Apache Spark by ritu
• 960 points
closed Nov 26, 2020 by MD • 3,443 views

+1 vote

1 answer

How to write Spark DataFrame to Avro Data File?

Hi@akhtar, Since Avro library is external to Spark, ...READ MORE

Nov 4, 2020 in Apache Spark by MD
• 95,460 points • 4,232 views

0 votes

1 answer

which one of the following commands is used to see the structure of the Dataframe?

Hi @Ritu If you want to see the ...READ MORE

Nov 25, 2020 in Apache Spark by Gitika
• 65,730 points • 3,297 views

0 votes

1 answer

The number of stages in a job is equal to the number of RDDs in DAG. however, under one of the cgiven conditions, the scheduler can truncate the lineage. identify it.

Hi@ritu, Spark's internal scheduler may truncate the lineage of the RDD graph if ...READ MORE

Nov 25, 2020 in Apache Spark by akhtar
• 38,260 points • 3,281 views

0 votes

1 answer

Spark Core How to fetch max n rows of an RDD function without using Rdd.max()

Hi@Prasant, If Spark Streaming is not supporting tuple, ...READ MORE

Dec 3, 2020 in Apache Spark by MD
• 95,460 points • 2,909 views

0 votes

1 answer

How to read a dataframe based on an avro schema?

Hi, I am able to understand your requirement. ...READ MORE

Oct 30, 2020 in Apache Spark by MD
• 95,460 points • 4,295 views

0 votes

1 answer

6)What allows spark streaming to provide fault tolerance for network sources of data?

Hi@ritu, Fault tolerance is the property that enables ...READ MORE

Dec 1, 2020 in Apache Spark by MD
• 95,460 points • 3,040 views

0 votes

0 answers

ForeachPartition called with python functions doesnot output data to HDFS when executed on YARN mode

Hi, I need help with my code. Trying ...READ MORE

Jan 22, 2021 in Apache Spark by anonymous

edited Mar 4 • 287 views

0 votes

1 answer

How do you load this multiline data in spark as a single record?

Hi@Ruben, I think you can add an escape ...READ MORE

Nov 23, 2020 in Apache Spark by MD
• 95,460 points • 2,856 views

0 votes

0 answers

A Dataframe can be created from an existing RDD. You would create the Dataframe from the existing RDD by inferring schema using case classes in which one of the given classes? [closed]

A Dataframe can be created from an ...READ MORE

Nov 25, 2020 in Apache Spark by Edureka
• 200 points
closed Nov 26, 2020 by MD • 2,745 views

0 votes

1 answer

What does the following code print?

error: expected class or object definition sc.parallelize (Array(1L, ...READ MORE

Nov 25, 2020 in Apache Spark by Gitika
• 65,730 points • 2,599 views

0 votes

1 answer

7)From Schema RDD, data can be cache by which one of the given choices?

Hi, @Ritu, According to the official documentation of Spark 1.2, ...READ MORE

Nov 23, 2020 in Apache Spark by Gitika
• 65,730 points • 2,635 views

0 votes

1 answer

16)What allows spark to periodically persist data about an application such that it can recover from failures?

Hi@Edureka, Checkpointing is a process of truncating RDD ...READ MORE

Nov 26, 2020 in Apache Spark by MD
• 95,460 points • 2,536 views

+1 vote

1 answer

How to assign a column in Spark Dataframe (PySpark) as a Primary Key?

spark do not have any concept of ...READ MORE

Jan 12, 2020 in Apache Spark by Sirish
• 160 points • 16,185 views

0 votes

1 answer

In AWS, if user wants to run spark, then on top of which one of the following can the user do it?

Hi@ritu, AWS has lots of services. For spark ...READ MORE

Nov 26, 2020 in Apache Spark by MD
• 95,460 points • 2,136 views

0 votes

1 answer

Which one of the following commands is used to start python-spark?

Hi@ritu, To start your python spark shell, you ...READ MORE

Nov 26, 2020 in Apache Spark by MD
• 95,460 points • 2,121 views

0 votes

0 answers

17)from the given choices, identify the value returned by $"whatever"?

17)from the given choices, identify the value ...READ MORE

Nov 25, 2020 in Apache Spark by ritu
• 960 points • 2,015 views

0 votes

1 answer

What does the below code print?

Option d) Run time error. READ MORE

Nov 25, 2020 in Apache Spark by Gitika
• 65,730 points • 1,838 views

0 votes

1 answer

From the below code. what is the most appropriate next step in ML process?

Hi@ritu, The most appropriate step according to me ...READ MORE

Nov 25, 2020 in Apache Spark by MD
• 95,460 points • 1,836 views

0 votes

1 answer

How to insert data into Cassandra table using Spark DataFrame?

Hi@akhtar, You can write the spark dataframe in ...READ MORE

Sep 21, 2020 in Apache Spark by MD
• 95,460 points • 4,577 views

0 votes

1 answer

13)Refer the input and identify the output if the below code is run

Option c) Run time error - A READ MORE

Nov 25, 2020 in Apache Spark by Gitika
• 65,730 points • 1,623 views

0 votes

1 answer

What is the output of the following code?

After executing your code, there is an ...READ MORE

Nov 25, 2020 in Apache Spark by Gitika
• 65,730 points • 1,618 views

0 votes

1 answer

What class is declared in the blow code?

Option D: String class READ MORE

Nov 26, 2020 in Apache Spark by Gitika
• 65,730 points • 1,537 views

0 votes

1 answer

What is the output of the following code?

rror: expected class or object definition sc.parallelize(Array(1L,("SFO")),(2L,("ORD")),(3L,("DFW")))) ^ one error ...READ MORE

Nov 26, 2020 in Apache Spark by Gitika
• 65,730 points • 1,524 views

0 votes

1 answer

How to read Avro Partition Data?

Hi@akhtar, When we try to retrieve the data ...READ MORE

Nov 4, 2020 in Apache Spark by MD
• 95,460 points • 2,446 views

0 votes

0 answers

What does the below code print? [closed]

What does the below code print? val AgeDs ...READ MORE

Nov 25, 2020 in Apache Spark by ritu
• 960 points
closed Nov 25, 2020 by Gitika • 1,582 views

0 votes

1 answer

From the following graph code ,which code snippet will return the no.of flight routes?

Hey, @Ritu, I am getting error in your ...READ MORE

Nov 25, 2020 in Apache Spark by Gitika
• 65,730 points • 1,474 views

0 votes

1 answer

2)What will be printed when the below code is executed ?

Hi, @Ritu, List(5,100,10) is printed. The take method returns the first n elements in ...READ MORE

Nov 23, 2020 in Apache Spark by Gitika
• 65,730 points • 1,335 views

0 votes

1 answer

Spark - how the solve the below question?

option d, Runtime error READ MORE

Nov 23, 2020 in Apache Spark by Gitika
• 65,730 points • 1,293 views

0 votes

0 answers

What is the output of the following code? [closed]

What is the output of the following ...READ MORE

Nov 25, 2020 in Apache Spark by Edureka
• 200 points
closed Nov 26, 2020 by MD • 1,173 views

0 votes

1 answer

How to implement my clustering algorithm in pyspark (without using the ready library for example k-means)?

Hi@dani, As you said you are a beginner ...READ MORE

Oct 14, 2020 in Apache Spark by MD
• 95,460 points • 2,832 views

0 votes

1 answer

Facing issue while reading tsv file in pyspark

Hi@khyati, You are getting this type of output ...READ MORE

Sep 28, 2020 in Apache Spark by MD
• 95,460 points • 3,233 views

0 votes

1 answer

File not found exception while processing the spark job in yarn cluster mode with multinode hadoop cluster

Hi@Ganendra, I am not sure what's the issue, ...READ MORE

Jul 30, 2020 in Apache Spark by MD
• 95,460 points • 5,760 views

+1 vote

1 answer

How to convert pyspark Dataframe to pandas Dataframe?

Hi@akhtar, To convert pyspark dataframe into pandas dataframe, ...READ MORE

May 7, 2020 in Apache Spark by MD
• 95,460 points • 8,773 views

0 votes

2 answers

Error : split value is not a member of org.apache.spark.sql.Row

var d=rdd2col.rdd.map(x=>x.split(",")) or val names=rd ...READ MORE

Aug 5, 2020 in Apache Spark by Ramkumar Ramasamy.
• 13,315 views

0 votes

1 answer

Error: sql.out:Error: org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 381610 tasks (4.0 GB) is bigger than spark.driver.maxResultSize (4.0 GB)

This type of error tends to occur ...READ MORE

Apr 29, 2020 in Apache Spark by MD
• 95,460 points • 9,032 views

0 votes

1 answer

How to index one csv file with no header , after converting the csv to a dataframe, i need to name the columns in order to normalize in minmaxScaler.

Hi@Manas, You can read your dataset from CSV ...READ MORE

Sep 10, 2020 in Apache Spark by MD
• 95,460 points • 3,185 views

0 votes

1 answer

Py4JJavaError: An error occurred while calling o310.csv. : java.net.ConnectException: Call From master/192.168.56.101 to master:9000

Hi@akhtar, I think your HDFS cluster is not ...READ MORE

May 7, 2020 in Apache Spark by MD
• 95,460 points • 8,302 views

0 votes

1 answer

Ranger KMS - Curl command

Hi@Shllpa, In general, we get the 401 status code ...READ MORE

Sep 29, 2020 in Apache Spark by MD
• 95,460 points • 1,967 views

0 votes

1 answer

I am not able to run the apache spark program in mac oc

Hi@Srinath, It seems you didn't set Hadoop for ...READ MORE

Sep 21, 2020 in Apache Spark by MD
• 95,460 points • 1,967 views

0 votes

1 answer

Unable to submit the spark job in deployment mode - multinode cluster(using ubuntu machines) with yarn master

Hi@Ganendra, As you said you launched a multinode cluster, ...READ MORE

Jul 29, 2020 in Apache Spark by MD
• 95,460 points • 3,532 views

0 votes

0 answers

Unable to get the Job status and Group ID java- spark standalone program with databricks

package com.dataguise.test; import java.io.IOException; import java.util.concurrent.CountDownLatch; import java.util.concurrent.TimeUnit; import org.apache.spark.SparkContext; import org.apache.spark.SparkJobInfo; import ...READ MORE

Jul 23, 2020 in Apache Spark by kamboj
• 140 points
recategorized Jul 28, 2020 by Gitika • 3,508 views

0 votes

1 answer

How to create a not null column in case class in spark

Hi@Deepak, In your test class you passed empid ...READ MORE

May 14, 2020 in Apache Spark by MD
• 95,460 points • 6,009 views

0 votes

1 answer

Spark: java.sql.SQLException: No suitable driver

The missing driver is the JDBC one ...READ MORE

Jul 24, 2019 in Apache Spark by John
• 18,698 views

0 votes

1 answer

how can I get all executors' pending jobs and stages of particular sparksession?

Hi@Neha, You can find all the job status ...READ MORE

Aug 19, 2020 in Apache Spark by MD
• 95,460 points • 1,753 views

0 votes

1 answer

env: ‘python’: No such file or directory in pyspark.

Hi@akhtar, This error occurs because your python version ...READ MORE

Apr 7, 2020 in Apache Spark by MD
• 95,460 points • 7,151 views

0 votes

1 answer

org.apache.spark.sql.AnalysisException: cannot resolve "`id`" given input columns

I have used a header-less csv file ...READ MORE

Jul 14, 2019 in Apache Spark by Puneet
• 18,342 views

Page:

« prev
1
2
3
4
5
6
7
8
...
12
next »

Subscribe to our Newsletter, and get personalized recommendations.