from pyspark.sql.types import StructType,StructField,StringType,IntegerType
schema = StructType([StructField("id_code", IntegerType()),StructField("description", StringType())])
df=spark.read.csv("C:/Users/HP/Downloads/`connection_type`.tsv",schema=schema)
df.show();
+-------+-----------+
|id_code|description|
+-------+-----------+
| null| null|
| null| null|
| null| null|
| null| null|
| null| null|
+-------+-----------+
If i read it simply without applying any schema.
df=spark.read.csv("C:/Users/HP/Downloads/connection_type.tsv",sep="/t")
df.show()
+----------------+
| _c0|
+----------------+
| 0 Not Specified|
| 1 Modem|
| 2 LAN/Wifi|
| 3 Unknown|
4 Mobile Carrier|
+----------------+
It is not coming in a proper way.Can anyone please help me on this.My sample file is .tsv file and it has below records.
0 Specified
1 Modemwifi
2 LAN/Wifi
3 Unknown
4 Mobile user