Join in RDD using keys

Hi Team,

How can I join two rdd without converting into dataframe?

rdd_x=(k1, V_x)
rdd_y=(k1, V_y)

Result should be like this: (k1(V_x, V_y)

Aug 2, 2019 in Apache Spark by Jishan
• 8,730 views

1 answer to this question.

Suppose you have two dataset results( id, result) and student(name, id). Now, you can join the RDD by using the below commands in Spark on the basis of the common key id.

case class results (roll_id: Int, result: String)

case class students (name: String, roll_id: Int)

val a = sc.textFile("file:///home/edureka/Desktop/all-files/datsets/f1").map(_.split("\t" ))  // mention complete path for input dataset

val b = sc.textFile("file:///home/edureka/Desktop/all-files/datsets/f2").map(_.split("\t"))


val class_a = a.map( z => (z(0).toInt , results(z(0).toInt , z(1))))

val class_b = b.map( z => (z(1).toInt , students (z(0), z(1).toInt)))


val v_join = class_a.join(class_b) 

v_join.foreach(println)

answered Aug 2, 2019 by Trisha

Related Questions In Apache Spark

0 votes

1 answer

Which query to use for better performance, join in SQL or using Dataset API?

DataFrames and SparkSQL performed almost about the ...READ MORE

answered Apr 19, 2018 in Apache Spark by kurt_cobain
• 9,350 points • 2,517 views

0 votes

1 answer

How to create paired RDD using subString method in Spark?

Hi, If you have a file with id ...READ MORE

answered Aug 2, 2019 in Apache Spark by Gitika
• 65,730 points • 3,145 views

0 votes

0 answers

A Dataframe can be created from an existing RDD. You would create the Dataframe from the existing RDD by inferring schema using case classes in which one of the given classes?

A Dataframe can be created from an ...READ MORE

Nov 25, 2020 in Apache Spark by Edureka
• 200 points
closed Nov 26, 2020 by MD • 2,623 views

0 votes

1 answer

A Dataframe can be created from an existing RDD. You would create the Dataframe from the existing RDD by inferring schema using case classes in which one of the given classes?

if you have two sets of users ...READ MORE

answered Sep 27, 2021 in Apache Spark by anonymous

edited Mar 5 • 5,267 views

+1 vote

2 answers

How do I get number of columns in each line from a delimited file??

Instead of spliting on '\n'. You should ...READ MORE

answered Aug 7, 2019 in Apache Spark by ashish
• 6,336 views

+1 vote

1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points • 11,783 views

0 votes

1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points • 3,153 views

+2 votes

11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points • 113,641 views

+1 vote

8 answers

How to print the contents of RDD in Apache Spark?

Save it to a text file: line.saveAsTextFile("alicia.txt") Print contains ...READ MORE

answered Dec 10, 2018 in Apache Spark by Akshay
• 63,584 views

0 votes

1 answer

How is RDD in Spark different from Distributed Storage Management? Can anyone help me with this ?

Some of the key differences between an RDD and ...READ MORE

answered Jul 26, 2018 in Apache Spark by zombie
• 3,790 points • 2,073 views

Subscribe to our Newsletter, and get personalized recommendations.

REGISTER FOR FREE WEBINAR

Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP