Spark memory processing on a not temporary table

Question

When not using a temporary table, I am assuming the data is written in hdfs file. Spark does in memory processing, so this will be different from Spark's regular approach since the data will now be read from the file for further processing. Is this assumption correct?

score 0 · Answer 1 · Jul 14, 2019

Temporary table is more like an index for which the spark doesn't even create meta-data that is the reason why its called temporary and we always create a temp table from a dataframe. Now once temp table data is stored in to a hive table its stores it in hive warehouse which is hdfs only so that part is absolutely correct. And whenever you are reading it the sqlcontext object reads it from HDFS instead of in-memory. Now since you are only reading the data like "select * from table_hive" there won't be much difference in processing time whether it reads from in-memory or from hdfs but if we consider the difference in microseconds then we will find the difference and in-memory processing takes the lead here.