How to use RDD filter with other function

0 votes
I am working on Spark RDD. I know how to filter a RDD like val y = rdd.filter(e => e%2==0), but I do not know how to combine filter with other function like Row.

In val rst = rdd.map(ab => Row(ab.a, ab.b)), I want to filter out ab.b > 0, but I tried put filter at multiple place and they do not work.

Can someone help.
Jul 5, 2018 in Apache Spark by Shubham
• 13,490 points
9,680 views

2 answers to this question.

0 votes

I'm not sure about the "out" part in "filter out": do you want to keep those entries, or do you want to get rid of them? If you want to drop all entries with ab.b > 0, then you need

val result = rdd.filterNot(_.b > 0).map(ab => Row(ab.a, ab.b))
If you want to retain only the entries with ab.b > 0, then try

val result = rdd.filter(_.b > 0).map(ab => Row(ab.a, ab.b))
The underscore _ is simply the shorter form of

val result = rdd.filter(ab => ab.b > 0).map(ab => Row(ab.a, ab.b))

Hope this will help.

answered Jul 5, 2018 by nitinrawat895
• 11,380 points
0 votes

val x = sc.parallelize(1 to 10, 2)
 
// filter operation 
val y = x.filter(e => e%2==0) 
y.collect
// res0: Array[Int] = Array(2, 4, 6, 8, 10)
 
// RDD y can be re written with shorter syntax in scala as 
val y = x.filter(_ % 2 == 0)
y.collect
// res1: Array[Int] = Array(2, 4, 6, 8, 10)

answered Aug 17, 2018 by zombie
• 3,790 points

Related Questions In Apache Spark

0 votes
1 answer

How to remove the elements with a key present in any other RDD?

Hey, You can use the subtractByKey () function to ...READ MORE

answered Jul 22, 2019 in Apache Spark by Gitika
• 65,770 points
4,178 views
0 votes
1 answer

How to use nested function in Scala?

Hey, With Scala, we can define a Scala ...READ MORE

answered Jul 26, 2019 in Apache Spark by Gitika
• 65,770 points

edited Jun 1, 2023 by Srinath 1,037 views
0 votes
1 answer

Spark Core How to fetch max n rows of an RDD function without using Rdd.max()

Hi@Prasant, If Spark Streaming is not supporting tuple, ...READ MORE

answered Dec 3, 2020 in Apache Spark by MD
• 95,460 points
2,195 views
0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,490 points
13,546 views
+1 vote
1 answer
0 votes
1 answer

Writing File into HDFS using spark scala

The reason you are not able to ...READ MORE

answered Apr 6, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
17,338 views
0 votes
1 answer

Is there any way to check the Spark version?

There are 2 ways to check the ...READ MORE

answered Apr 19, 2018 in Apache Spark by nitinrawat895
• 11,380 points
8,556 views
0 votes
1 answer

What's the difference between 'filter' and 'where' in Spark SQL?

Both 'filter' and 'where' in Spark SQL ...READ MORE

answered May 23, 2018 in Apache Spark by nitinrawat895
• 11,380 points
34,397 views
+2 votes
14 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

answered Apr 5, 2019 in Apache Spark by anonymous

edited Apr 5, 2019 by Omkar 88,767 views
0 votes
1 answer

How to find max value in pair RDD?

Use Array.maxBy method: val a = Array(("a",1), ("b",2), ...READ MORE

answered May 26, 2018 in Apache Spark by nitinrawat895
• 11,380 points
7,987 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP