ForeachPartition called with python functions doesnot output data to HDFS when executed on YARN mode

0 votes
Hi,

I need help with my code. Trying to understand the issue and fix here.

I am using foreachpartition is used to call python modules and out put  is expected to be written to HDFS when run on yarn mode. This works fine in local mode. 
Any inputs on this will help.

def **getJSON**(iterator):
ctx = TaskContext()
P_collection = []
for data in [row for row in iterator]:
 jsondict= dict()
 class_json =populatejsondata(str(data["Pid"]),str(data["Rid"]),str(data["Mid"]),str(data["AM_id"]), 
 str(data["H_data"]), str(data["CL_data"])

 class_data=class_json
 jsondict["class_data"] = class_data
 P_collection.append(jsondict) 
with open('/tmp/p/class'+str(ctx.partitionId())+'.json', 'w') as FILE:
json.dump(P_collection, FILE)

#-----calling function
data_df=data.repartition('C_ID')
data_new=data_df.rdd
jsondata=data_new.foreachPartition(**getJSON**)
Jan 22, 2021 in Apache Spark by anonymous

edited 4 days ago 6 views

No answer to this question. Be the first to respond.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP