Spark Error StackOverflowError Exception in thread main java lang StackOverflowError at org apache spark rdd UnionRDD anonfun 1 apply

0 votes

I have a Scala program which uses a REST API to get data batch by batch. I guess when the number of batches are high then the program is throwing this error.

This is the sample code:

def main(args: Array[String]) = {
     val conf = new SparkConf().setAppName("Union test").setMaster("local[1]")
     val sc = new SparkContext(conf)
     val limit = 1000;
     var rdd = sc.emptyRDD[Int]
     for (x <- 1 to limit) {
       val currentRdd = sc.parallelize(x to x + 3)
       rdd = rdd.union(currentRdd)
     }
     println(rdd.sum())
   }

Can anyone give the solution to this.

Jul 31, 2019 in Apache Spark by Disha
4,408 views

1 answer to this question.

0 votes

Hey,

It already has SparkContent.union and it does know how to compute a union of multiple RDD's. You can use this below lines:

val rdds = List.tabulate(limit + 1)(x => sc.parallelize(x to x + 3))
val rdd = sc.union(rdds)

Otherwise, you can see one more way to do this:

val rdds = List.tabulate(limit + 1)(x => sc.parallelize(x to x + 3))
val rdd = balancedReduce(rdds)(_ union _)

answered Jul 31, 2019 by Gitika
• 65,770 points

Related Questions In Apache Spark

0 votes
1 answer

"main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream

1. We will check whether master and ...READ MORE

answered Jul 29, 2019 in Apache Spark by Yogi
6,208 views
0 votes
1 answer

Ways to create RDD in Apache Spark

There are two popular ways using which ...READ MORE

answered Jun 19, 2018 in Apache Spark by nitinrawat895
• 11,380 points
4,059 views
+1 vote
8 answers

How to print the contents of RDD in Apache Spark?

Save it to a text file: line.saveAsTextFile("alicia.txt") Print contains ...READ MORE

answered Dec 10, 2018 in Apache Spark by Akshay
61,795 views
+1 vote
2 answers
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
11,028 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,535 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
108,830 views
0 votes
1 answer

What is RDD in Apache spark?

Hi, RDD in spark stands for REsilient distributed ...READ MORE

answered Jul 1, 2019 in Apache Spark by Gitika
• 65,770 points
1,429 views
+1 vote
1 answer

Error: value textfile is not a member of org.apache.spark.SparkContext

Hi, Regarding this error, you just need to change ...READ MORE

answered Jul 4, 2019 in Apache Spark by Gitika
• 65,770 points
4,358 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP