How to read more than one files in Apache Spark

Question

I have a directory in which there are multiple files

$ pwd
/home/user/student
$ ls
a.csv b.csv c.csv

I want to read all these files at once.

I am trying to do it like this:

val text = sc.wholeTextFiles("student")
text.collect()

But I am getting this error:

java.lang.ArrayIndexOutOfBoundsException: 0
    at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat$OneFileInfo.<init>(CombineFileInputFormat.java:591)
    at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:283)
    at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:243)
    at org.apache.spark.rdd.WholeTextFileRDD.getPartitions(NewHadoopRDD.scala:267)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1779)
    at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:885)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
    at org.apache.spark.rdd.RDD.collect(RDD.scala:884)

How to solve this?

Omkar · Answer 1 · Dec 11, 2018

Try this:

val text = sc.wholeTextFiles("student/*")
text.collect()

answered Dec 11, 2018 by Omkar
• 69,180 points

How to read more than one files in Apache Spark

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Big Data Hadoop

Hey ,I am install Hadoop and on single node and run on one file in HDFS, but I want to run more of file I work upload files but I know to running all

How will the Fair Scheduler handle more than one Job?

Spark Scala: How to list all folders in directory

How to save Spark dataframe as dynamic partitioned table in Hive?

How do I get number of columns in each line from a delimited file??

Hadoop Mapreduce word count Program

hadoop.mapred vs hadoop.mapreduce?

hadoop fs -put command?

How to read HDFS and local files with the same code in Java?

How to read multiple files in hdfs?

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES