Trending questions in Apache Spark

0 votes
5 answers

How to change the spark Session configuration in Pyspark?

You aren't actually overwriting anything with this ...READ MORE

Dec 14, 2020 in Apache Spark by Gitika
• 65,770 points
126,509 views
0 votes
3 answers

Filtering a row in Spark DataFrame based on matching values from a list

Use the function as following: var notFollowingList=List(9.8,7,6,3,1) df.filter(col("uid").isin(notFollowingList:_*)) You can ...READ MORE

Jun 6, 2018 in Apache Spark by Shubham
• 13,490 points
93,026 views
+2 votes
14 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

Apr 5, 2019 in Apache Spark by anonymous

edited Apr 5, 2019 by Omkar 89,446 views
+1 vote
6 answers

groupByKey vs reduceByKey in Apache Spark.

ReduceByKey is the best for production. READ MORE

Mar 3, 2019 in Apache Spark by anonymous
77,352 views
+1 vote
8 answers

How to replace null values in Spark DataFrame?

Hi, In Spark, fill() function of DataFrameNaFunctions class is used to replace ...READ MORE

Dec 15, 2020 in Apache Spark by MD
• 95,460 points
75,800 views
+1 vote
8 answers

How to print the contents of RDD in Apache Spark?

Save it to a text file: line.saveAsTextFile("alicia.txt") Print contains ...READ MORE

Dec 10, 2018 in Apache Spark by Akshay
62,186 views
+5 votes
11 answers

Concatenate columns in apache spark dataframe

its late but this how you can ...READ MORE

Mar 21, 2019 in Apache Spark by anonymous
72,836 views
0 votes
0 answers

How to import pyspark in Jupyter Notebook

When I tried to import Pyspark I am getting ...READ MORE

Apr 3, 2023 in Apache Spark by Navyasilpa

edited 4 days ago 123 views
0 votes
0 answers

How to import pyspark in Jupyter

I tried to import pyspark in jupyter ...READ MORE

Apr 3, 2023 in Apache Spark by Navyasilpa

edited 4 days ago 93 views
+1 vote
2 answers

Spark: Dataframe vs Dataset

Recently, there are two new data abstractions ...READ MORE

Jul 29, 2019 in Apache Spark by Jackie
46,001 views
0 votes
0 answers

How to read a nested avro file format in spark dataframe

The avro file format contains nested data. ...READ MORE

Nov 16, 2022 in Apache Spark by Devang

edited 4 days ago 93 views
+1 vote
3 answers

map() vs flatMap() in Spark

Spark map function expresses a one-to-one transformation. ...READ MORE

Jun 17, 2019 in Apache Spark by vishal
• 180 points
38,996 views
0 votes
1 answer

1)Given sfpd RDD, to create a pair RDD consisting of tuples of the form (Category. 1) in scala ,which of the following is used?

C would be an answer which shows ...READ MORE

Mar 30, 2023 in Apache Spark by anonymous

edited 3 days ago 6,565 views
0 votes
0 answers

How can i implement corss apply function of TSQL in pyspark

How can i implement corss apply function ...READ MORE

May 30, 2022 in Apache Spark by anonymous

edited 4 days ago 107 views
+1 vote
1 answer

Is there any efficient way of dealing null values during concat functionality of pyspark.sql version 2.3.4?

When you concatenate any string with a ...READ MORE

Nov 6, 2019 in Apache Spark by Rishi
40,390 views
0 votes
0 answers

Pyspark: Aggregate and filtering code error

Hi guys, I am a beginner at pyspark ...READ MORE

Apr 22, 2022 in Apache Spark by Saadat

edited 4 days ago 97 views
0 votes
0 answers

Pyspark: Finding top three countries with covid confirmed covid cases

Hi guys, I have a beginner at pyspark ...READ MORE

Apr 22, 2022 in Apache Spark by Saadat

edited 4 days ago 83 views
0 votes
1 answer

org.apache.spark.sql.AnalysisException: cannot resolve given input columns

The string Productivity has to be enclosed between single ...READ MORE

Jul 10, 2019 in Apache Spark by Tina
43,214 views
0 votes
0 answers

Scala / SparkSQL dataframes filter issue "data type mismatch"

My probleme is i have a code ...READ MORE

Mar 24, 2022 in Apache Spark by Hamza

edited 4 days ago 11 views
0 votes
0 answers

Access value in arrays of structs spark scala

Hi, I have a dataset with the ...READ MORE

Mar 24, 2022 in Apache Spark by anonymous

edited 4 days ago 12 views
0 votes
0 answers

What should I pay attention to when installing smart curtains fabrics?

What should I pay attention to when ...READ MORE

Mar 23, 2022 in Apache Spark by qiansifang

edited 4 days ago 10 views
0 votes
1 answer

What will be printed when the below code is executed?

Option a) 443 READ MORE

Mar 8, 2023 in Apache Spark by anonymous

edited 3 days ago 2,541 views
0 votes
0 answers

The Batman Movie Online Free HD

dfgsdfg READ MORE

Mar 4, 2022 in Apache Spark by anonymous

edited 4 days ago 6 views
0 votes
1 answer

What will be printed when the below code is executed ?

List 5 100 10 READ MORE

Feb 7, 2023 in Apache Spark by Subbu

edited 3 days ago 1,593 views
0 votes
0 answers
0 votes
1 answer

12)Which one of the given flows correctly describe the Spark Streaming Architecture?

C.  Data streams divided into batches > ...READ MORE

Jul 3, 2022 in Apache Spark by anonymous

edited 3 days ago 3,934 views
+2 votes
2 answers

py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getEncryptionEnabled does not exist in the JVM

Using findspark is expected to solve the ...READ MORE

Jun 21, 2020 in Apache Spark by suvasish
22,917 views
0 votes
0 answers

Execute Spark.sql query within withColumn clause is Spark Scala

I have a dataframe which has one ...READ MORE

Sep 14, 2021 in Apache Spark by Pinksrider

edited 4 days ago 17 views
0 votes
1 answer

Error: No module named 'findspark'

Hi@akhtar, To import this module in your program, ...READ MORE

May 6, 2020 in Apache Spark by MD
• 95,460 points
20,738 views
0 votes
0 answers

Aws logs are not writing in cloud watch after certain steps

i have an aws job which reads ...READ MORE

Jul 30, 2021 in Apache Spark by Anjali

edited 4 days ago 11 views
0 votes
1 answer

What is the difference between persist() and cache() in apache spark?

Using cash technique we can save intermediate ...READ MORE

Dec 27, 2022 in Apache Spark by Deepthi

edited 3 days ago 3,785 views
0 votes
0 answers

Create Hive table using Dataframe getting error

Code: srcDF.write.mode(tblmode).saveAsTable(s"${dbName}.${tgtHiveTableName}") error: 21/06/04 22:11:45 ERROR pa.TrxNbrx: org.apache.spark.SparkException: ...READ MORE

Jun 5, 2021 in Apache Spark by Rajesh

edited 4 days ago 7 views
0 votes
0 answers

OI JANA TESTE LIVE

OI JANA TESTE LIVE READ MORE

Jun 5, 2021 in Apache Spark by Eufrasia

edited 4 days ago 8 views
0 votes
0 answers

what parameters are required for a "windowed" operation such as reduceByKeyAndWindow?

a) Window length b) sliding interval c) Window Length ...READ MORE

Jun 4, 2021 in Apache Spark by anonymous

edited 4 days ago 7 views
+2 votes
4 answers

use length function in substring in spark

You can use the function expr val data ...READ MORE

May 3, 2018 in Apache Spark by kurt_cobain
• 9,350 points
43,429 views
0 votes
1 answer

How to create dataframe for the comma delimited file?

.option("sep", delimeter) READ MORE

Oct 28, 2022 in Apache Spark by anonymous

edited 3 days ago 3,670 views
0 votes
1 answer

What are some of the things you can monitor in the Spark Web UI?

The stages which are running slow READ MORE

Apr 29, 2021 in Apache Spark by anonymous

edited 3 days ago 3,947 views
+1 vote
3 answers

What is the difference between rdd and dataframes in Apache Spark ?

Comparison between Spark RDD vs DataFrame 1. Release ...READ MORE

Aug 28, 2018 in Apache Spark by shams
• 3,670 points
43,319 views
0 votes
1 answer

ImportError: No module named 'pyspark'

Hi@akhtar, By default pyspark in not present in ...READ MORE

May 6, 2020 in Apache Spark by MD
• 95,460 points
15,780 views
0 votes
1 answer

How to select all columns with group by?

Try  df.select(df("*")).groupby("id").agg(sum("salary")) READ MORE

Sep 17, 2021 in Apache Spark by Parimi Pavan

edited 3 days ago 14,374 views
+1 vote
1 answer

is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [51, 53, 10, 10]

Hi@akhtar, Here you are trying to read a ...READ MORE

Feb 3, 2020 in Apache Spark by MD
• 95,460 points
18,670 views
0 votes
0 answers

Real time Project challenges in Spark Data pipeline

Can anybody highlights some challenges they have ...READ MORE

Apr 6, 2021 in Apache Spark by anonymous

edited 4 days ago 12 views
+1 vote
1 answer

Reading a text file through spark data frame

Try this: val df = sc.textFile("HDFS://nameservice1/user/edureka_168049/Structure_IT/samplefile.txt") df.collect() val df = ...READ MORE

Jul 24, 2019 in Apache Spark by Suri
26,616 views
0 votes
0 answers
0 votes
1 answer

Why Partitions are immutable in Spark?

Partitions use HDFS API. READ MORE

Aug 25, 2022 in Apache Spark by anonymous

edited 3 days ago 2,251 views
0 votes
2 answers

5)Using which one of the given choices will you create an RDD with specific partitioning?

Hi, @Ritu, option b for you, as Hash Partitioning ...READ MORE

Nov 23, 2020 in Apache Spark by Gitika
• 65,770 points
4,581 views
0 votes
1 answer

The number of stages in a job is equal to the number of RDDs in DAG. however, under one of the cgiven conditions, the scheduler can truncate the lineage. identify it.

Hi@Edureka, Spark's internal scheduler may truncate the lineage of the RDD graph ...READ MORE

Nov 26, 2020 in Apache Spark by MD
• 95,460 points
4,270 views