questions/apache-spark/page/12
Spark revolves around the concept of a ...READ MORE
Spark has various persistence levels to store ...READ MORE
As parquet is a column based storage ...READ MORE
With mapPartion() or foreachPartition(), you can only ...READ MORE
I can list some but there can ...READ MORE
Whenever a node goes down, Spark knows ...READ MORE
Just do the following: Edit your conf/log4j.properties file ...READ MORE
Spark SQL is capable of: Loading data from ...READ MORE
Spark is agnostic to the underlying cluster ...READ MORE
No, it doesn’t provide storage layer but ...READ MORE
There are two popular ways using which ...READ MORE
The full form of RDD is a ...READ MORE
Can you share the screenshots for the ...READ MORE
Spark 2.0+ Spark 2.0 provides native window functions ...READ MORE
According to me, start with a standalone ...READ MORE
In your log4j.properties file you need to ...READ MORE
Some of the key differences between an RDD and ...READ MORE
Let's first look at mapper side differences Map ...READ MORE
SqlContext has a number of createDataFrame methods ...READ MORE
org.apache.spark.mllib is the old Spark API while ...READ MORE
There are a bunch of functions that ...READ MORE
Parquet is a columnar format supported by ...READ MORE
Mainly, we use SparkConf because we need ...READ MORE
You have to use the comparison operator ...READ MORE
Your error is with the version of ...READ MORE
I guess you need provide this kafka.bootstrap.servers ...READ MORE
RDD is a fundamental data structure of ...READ MORE
Sliding Window controls transmission of data packets ...READ MORE
map(): Return a new distributed dataset formed by ...READ MORE
Here are some of the important features of ...READ MORE
Caching the tables puts the whole table ...READ MORE
Minimizing data transfers and avoiding shuffling helps ...READ MORE
There are two methods to persist the ...READ MORE
Spark provides a pipe() method on RDDs. ...READ MORE
Spark uses Akka basically for scheduling. All ...READ MORE
Spark is a framework for distributed data ...READ MORE
I would recommend you create & build ...READ MORE
A Spark driver (aka an application’s driver ...READ MORE
spark-submit \ class org.apache.spark.examples.SparkPi \ deploy-mode client \ master spark//$SPARK_MASTER_IP:$SPARK_MASTER_PORT ...READ MORE
It's not the collect() that is slow. ...READ MORE
RDD can be uncached using unpersist() So. use ...READ MORE
Either you have to create a Twitter4j.properties ...READ MORE
No, it is not mandatory, but there ...READ MORE
You can create a DataFrame from the ...READ MORE
You can use the following command. This ...READ MORE
Shark is a tool, developed for people ...READ MORE
sbin/start-master.sh : Starts a master instance on ...READ MORE
rdd.mapPartitions(iter => Array(iter.size).iterator, true) This command will ...READ MORE
By default a partition is created for ...READ MORE
Parquet is a columnar format file supported ...READ MORE
OR
At least 1 upper-case and 1 lower-case letter
Minimum 8 characters and Maximum 50 characters
Already have an account? Sign in.