Follow these steps:
A. Create Database
------------------
create database retail123;
B. Select Database
------------------
use retail123;
C. Create table for storing transactional records
-------------------------------------------------
create table txnrecords(txnno INT, txndate STRING, custno INT, amount DOUBLE,
category STRING, product STRING, city STRING, state STRING, spendby STRING)
row format delimited
fields terminated by ','
stored as textfile;
D. Load the data into the table
-------------------------------
LOAD DATA LOCAL INPATH 'txns1.txt' OVERWRITE INTO TABLE txnrecords;
E. Describing metadata or schema of the table
---------------------------------------------
describe txnrecords;
F. Counting no of records
-------------------------
select count(*) from txnrecords;
G. Counting total spending by category of products
--------------------------------------------------
select category, sum(amount) from txnrecords group by category;
H. 10 customers
--------------------
select custno, sum(amount) from txnrecords group by custno limit 10;
I. Create partitioned table
---------------------------
create table txnrecsByCat(txnno INT, txndate STRING, custno INT, amount DOUBLE,
product STRING, city STRING, state STRING, spendby STRING)
partitioned by (category STRING)
clustered by (state) INTO 10 buckets
row format delimited
fields terminated by ','
stored as textfile;
J. Configure Hive to allow partitions
-------------------------------------
However, a query across all partitions could trigger an enormous MapReduce job if the table data and number of partitions are large. A highly suggested safety measure is putting Hive into strict mode, which prohibits queries of partitioned tables without a WHERE clause that filters on partitions. You can set the mode to nonstrict, as in the following session:
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.enforce.bucketing=true;
K. Load data into partition table
----------------------------------
from txnrecords txn INSERT OVERWRITE TABLE txnrecsByCat PARTITION(category)
select txn.txnno, txn.txndate,txn.custno, txn.amount,txn.product,txn.city,txn.state,
txn.spendby, txn.category DISTRIBUTE BY category;
select * from txnrecsByCat;