Can anyone explain what is RDD in Spark

+1 vote

I am new to Apache Spark and when I was going through RDD, it said RDD is an immutable distributed collection of objects. I am confused about this statement. Next, I read that RDD can be created by 2 ways: 1st is by loading from external data source & 2nd is by distributing object collection from the driver program.

Can anyone explain to me what is RDD? 

May 24, 2018 in Apache Spark by coder_jazz
2,694 views

1 answer to this question.

0 votes

RDD is a fundamental data structure of Spark. Each dataset in RDD is divided into logical partitions, which may be computed on different nodes of the cluster. RDDs can contain any type of Python, Java, or Scala objects, including user-defined classes.

There are two ways to create RDDs − parallelizing an existing collection in your driver program, or referencing a dataset in an external storage system, such as a shared file system, HDFS, HBase, or any data source offering a Hadoop Input Format.

answered May 24, 2018 by Shubham
• 13,490 points

Related Questions In Apache Spark

0 votes
1 answer

How is RDD in Spark different from Distributed Storage Management? Can anyone help me with this ?

Some of the key differences between an RDD and ...READ MORE

answered Jul 26, 2018 in Apache Spark by zombie
• 3,790 points
1,520 views
0 votes
3 answers

Can anyone explain fold() operation in Spark?

Fold in spark Fold is a very powerful ...READ MORE

answered Aug 23, 2018 in Apache Spark by samarth295
• 2,220 points
12,655 views
+1 vote
3 answers

What is the difference between rdd and dataframes in Apache Spark ?

Comparison between Spark RDD vs DataFrame 1. Release ...READ MORE

answered Aug 28, 2018 in Apache Spark by shams
• 3,670 points
43,074 views
0 votes
1 answer

What is RDD in Apache spark?

Hi, RDD in spark stands for REsilient distributed ...READ MORE

answered Jul 1, 2019 in Apache Spark by Gitika
• 65,770 points
1,429 views
0 votes
1 answer

How can I compare the elements of the RDD using MapReduce?

You have to use the comparison operator ...READ MORE

answered May 24, 2018 in Apache Spark by Shubham
• 13,490 points
3,441 views
0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,490 points
13,542 views
0 votes
1 answer

Convert the given Spar rdd object to Spark DataFrame.

You can create a DataFrame from the ...READ MORE

answered Jun 6, 2018 in Apache Spark by Shubham
• 13,490 points
1,104 views
+1 vote
2 answers

How can I convert Spark Dataframe to Spark RDD?

Assuming your RDD[row] is called rdd, you ...READ MORE

answered Jul 9, 2018 in Apache Spark by zombie
• 3,790 points
20,643 views
+1 vote
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,490 points
8,455 views
0 votes
2 answers

In a Spark DataFrame how can I flatten the struct?

// Collect data from input avro file ...READ MORE

answered Jul 4, 2019 in Apache Spark by Dhara dhruve
6,110 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP