There are three complex types in hive,
arrays: It is an ordered collection of elements.The elements in the array must be of the same type.
map: It is an unordered collection of key-value pairs.Keys must be of primitive types.Values can be of any type.
struct: It is a collection of elements of different types.
Examples: complex datatypes
ARRAY:
$ cat >arrayfile
1,abc,40000,a$b$c,hyd
2,def,3000,d$f,bang
hive> create table tab7(id int,name string,sal bigint,sub array<string>,city string)
> row format delimited
> fields terminated by ','
> collection items terminated by '$';
hive>select sub[2] from tab7 where id=1;
hive>select sub[0] from tab7;
MAP:
$ cat >mapfile
1,abc,40000,a$b$c,pf#500$epf#200,hyd
2,def,3000,d$f,pf#500,bang
hive>create table tab10(id int,name string,sal bigint,sub array<string>,dud map<string,int>,city string)
row format delimited
fields terminated by ','
collection items terminated by '$'
map keys terminated by '#';
hive> load data local inpath '/home/training/mapfile' overwrite into table tab10;
hive>select dud["pf"] from tab10;
hive>select dud["pf"],dud["epf"] from tab10;
STRUCT:
cat >mapfile
1,abc,40000,a$b$c,pf#500$epf#200,hyd$ap$500001
2,def,3000,d$f,pf#500,bang$kar$600038
hive> create table tab11(id int,name string,sal bigint,sub array<string>,dud map<string,int>,addr struct<city:string,state:string,pin:bigint>)
> row format delimited
> fields terminated by ','
> collection items terminated by '$'
> map keys terminated by '#';
hive> load data local inpath '/home/training/structfile' into table tab11;
hive>select addr.city from tab11;
I have attached one script file(complex_table.script) containing hive create table command to create a table with complex fields like map, array of strings, structures, and attached one sample dataset(complex.dat) containing complex data.
Now I have used comma "," to separate the fields, colon ":" to separate the collection items, and question mark "?" to separate the map keys.
First load the dataset(attached here) in HDFS and then please go through the script and execute the commands one after another.
I have uploaded the dataset(only two rows) into HDFS, see in the screenshot below,
Now see I have created table ,then load the data from HDFS and executed the select query.
Master data integration and transformation with our comprehensive Informatica Course.