How to add third party java jars for use in PySpark

0 votes

For my project, I required some third party Database client libraries in Java. I want to access them through

java_gateway.py
E.g: to make the client class (not a jdbc driver!) available to the python client via the java gateway:

java_import(gateway.jvm, "org.mydatabase.MyDBClient")
It is not clear where to add the third party libraries to the jvm classpath. I tried to add to compute-classpath.sh but that did nto seem to work: I get

 Py4jError: Trying to call a package
Also, when comparing to Hive: the hive jar files are NOT loaded via compute-classpath.sh so that makes me suspicious. There seems to be some other mechanism happening to set up the jvm side classpath.

Can someone help?

Thanks in advance!

Jul 4, 2018 in Apache Spark by Shubham
• 13,490 points
8,693 views

1 answer to this question.

0 votes

You can add external jars as arguments to PySpark.

pyspark --jars file1.jar,file2.jar

Hope this helps!

To know more about Pyspark, it's recommended that you join Pyspark course online.

Thanks.

answered Jul 4, 2018 by nitinrawat895
• 11,380 points

edited Nov 19, 2021 by Sarfaraz

Related Questions In Apache Spark

0 votes
1 answer

How to use Spark jars for Yarn distribution?

First, store upload this archive to hdfs and ...READ MORE

answered Mar 28, 2019 in Apache Spark by Raj
1,518 views
0 votes
1 answer

How can you use "for" statement in scala to print list from collection?

Hi, You can use for loop in scala using ...READ MORE

answered Jul 5, 2019 in Apache Spark by Gitika
• 65,770 points
802 views
0 votes
1 answer
0 votes
1 answer

Which query to use for better performance, join in SQL or using Dataset API?

DataFrames and SparkSQL performed almost about the ...READ MORE

answered Apr 19, 2018 in Apache Spark by kurt_cobain
• 9,350 points
1,825 views
+1 vote
1 answer
0 votes
1 answer

Writing File into HDFS using spark scala

The reason you are not able to ...READ MORE

answered Apr 6, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
17,333 views
0 votes
1 answer

Is there any way to check the Spark version?

There are 2 ways to check the ...READ MORE

answered Apr 19, 2018 in Apache Spark by nitinrawat895
• 11,380 points
8,550 views
0 votes
1 answer

What's the difference between 'filter' and 'where' in Spark SQL?

Both 'filter' and 'where' in Spark SQL ...READ MORE

answered May 23, 2018 in Apache Spark by nitinrawat895
• 11,380 points
34,396 views
0 votes
1 answer

How to find max value in pair RDD?

Use Array.maxBy method: val a = Array(("a",1), ("b",2), ...READ MORE

answered May 26, 2018 in Apache Spark by nitinrawat895
• 11,380 points
7,984 views
0 votes
1 answer

How to convert rdd object to dataframe in spark

SqlContext has a number of createDataFrame methods ...READ MORE

answered May 30, 2018 in Apache Spark by nitinrawat895
• 11,380 points
3,952 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP