Pyspark is taking default path

0 votes
from pyspark import SparkFiles
rdd=sc.textFile("emp/employees/part-m-00000")
rdd.map(lambda line: line.upper()).collect()

This code is executing with no issues . But my file is present in 
/user/edureka_536711/emp/employees/ part-m-00000

I am not sure how the path /user/edureka_536711/ is passing by default and below code is failing :

def get_hdfspath(filename):
my_hdfs="user/{0}".format(user_id.lower())
return os.path.join(my_hdfs,filename)
rdd=sc.textFile(sample)
rdd.map(lambda line: line.upper()).collect()

Can you help here?

Jul 16, 2019 in Apache Spark by Will
1,490 views

1 answer to this question.

0 votes

The HDFS path for MyLab is /user/edureka_id. So, by default, it will take that path even if you do not mention it. As for example if a textfile abc.txt is present the Hadoop path. Then if you mention /user/edureka_id/abc.txt or only abc.txt, both will be the same.

Regarding the code

def get_hdfspath(filename):
my_hdfs="user/{0}".format(user_id.lower())
return os.path.join(my_hdfs,filename)
rdd=sc.textFile(sample)
rdd.map(lambda line: line.upper()).collect()


Hope this helps!

Join PySpark Training online today to know more about Pyspark.

Thanks.

Learn how to set checkpiont dir PySpark Data Science Experience?

answered Jul 16, 2019 by Khushi

Related Questions In Apache Spark

0 votes
1 answer
0 votes
1 answer

Is there any way to check the Spark version?

There are 2 ways to check the ...READ MORE

answered Apr 19, 2018 in Apache Spark by nitinrawat895
• 11,380 points
8,550 views
0 votes
1 answer

Why is Spark faster than Hadoop Map Reduce

Firstly, it's the In-memory computation, if the file ...READ MORE

answered Apr 30, 2018 in Apache Spark by shams
• 3,670 points
1,364 views
+1 vote
2 answers
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
11,028 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,536 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
108,831 views
0 votes
5 answers

How to change the spark Session configuration in Pyspark?

You aren't actually overwriting anything with this ...READ MORE

answered Dec 14, 2020 in Apache Spark by Gitika
• 65,770 points
125,596 views
+1 vote
3 answers

What is the difference between rdd and dataframes in Apache Spark ?

Comparison between Spark RDD vs DataFrame 1. Release ...READ MORE

answered Aug 28, 2018 in Apache Spark by shams
• 3,670 points
43,074 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP