The following code that I wrote for One-Hot encoding in Spark is not working and is giving me errors like value not found : value encoder, etc. What I want this is - import csv data to create a dataframe, do one-hot encoding and create a new dataframe with the new encoded columns. The column that I want to encode is called SignalType so where in the code below I specify the column name?
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder().getOrCreate()
val df =
spark.read.option("header","true").option("inferSchema","true")
.csv("myfile.csv")
for(line <- df.head(5)){
println(line)
}
import org.apache.spark.ml.feature.{OneHotEncoder, StringIndexer}
val df = spark.createDataFrame(Seq(
(1, "1"),
(2, "2"),
(3, "3"),
(4, "4"),
)).toDF("categoryIndex1", "categoryIndex2")
val encoder = new OneHotEncoderEstimator()
.setInputCols(Array("categoryIndex1", "categoryIndex2"))
.setOutputCols(Array("categoryVec1", "categoryVec2"))
val model = encoder.fit(df)
val encoded = model.transform(df)
encoded.show()