When we talk about ETL, ETL means extract, transform & load (ETL)
A typical ETL pipeline consists of a data source, followed by a transformation, used for filtering or cleaning data, ending in a data sink.
So in case of Hadoop and Spark an ETL flow can be defined as:
Data is coming from various sources such as databases, Kafka, Twitter, etc.
To get some meaningful insights we need to filter out or clean the data using spark, mapreduce, hive, pig, etc.
Finally after processing(transformation) the data, it is stored in a data sink such as HDFS, table, etc.
Hope this will help you.