How to tune Spark jobs optimize the performance

0 votes
Can anyone help me out in optimizing the Spark job which is deployed on YARN cluster?

I want to know the changes from the configuration level. What are the approaches that can be taken to optimize the Spark streaming & Spark SQL jobs?
Apr 18, 2018 in Big Data Hadoop by Shubham
• 13,490 points
2,457 views

1 answer to this question.

0 votes
You need to know the cluster properly on which you are deploying the jobs. I can give you some important approaches which will help you in optimizing your job.

First understand the default block size which is configured in the cluster and also try to understand the size of file that will be stored in the cluster. This will help you change your default block size.

Also check the maximum memory limit configured for your executor. Check the VCores that are allocated to your cluster.

The rate of data  all needs to be checked and optimized for streaming jobs (in your case Spark streaming).

The Garbage collector should also be optimized.

I would also say that code level optimization are very necessary and should always be considered.

The most important part is improving your cluster performance by experience. And for this you first have to estimate the records that you’ll be processing & the requirements of your application. You have to tweak your configurations multiple times and check the throughput that you are getting.
answered Apr 18, 2018 by coldcode
• 2,090 points

Related Questions In Big Data Hadoop

0 votes
11 answers
0 votes
1 answer

How to set the number of Map & Reduce tasks?

The map tasks created for a job ...READ MORE

answered Apr 18, 2018 in Big Data Hadoop by Shubham
• 13,490 points
2,638 views
+1 vote
2 answers

How to authenticate username & password while using Connector for Cloudera Hadoop in Tableau?

Hadoop server installed was kerberos enabled server. ...READ MORE

answered Aug 21, 2018 in Big Data Hadoop by Priyaj
• 58,020 points
2,422 views
0 votes
1 answer

Apache Hadoop Yarn example program

You can go to this location $Yarn_Home/share/hadoop/mapreduce . You'll ...READ MORE

answered Apr 4, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
1,905 views
0 votes
1 answer

What do we exactly mean by “Hadoop” – the definition of Hadoop?

The official definition of Apache Hadoop given ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by Shubham
2,793 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
13,560 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
116,583 views
0 votes
2 answers

How can I list NameNode & DataNodes from any machine in the Hadoop cluster?

You can browse hadoop page from any ...READ MORE

answered Jan 23, 2020 in Big Data Hadoop by MD
• 95,460 points
12,444 views
0 votes
1 answer

How to get started with Hadoop?

Well, hadoop is actually a framework that ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by coldcode
• 2,090 points
1,969 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP