Pdf to csv file format conversions

Question

I am having a requirement to load the multiple .pdf files from hdfs location. Now I want to convert all pdf file formats to csv file formats and load into hive tables. I tried to Google it but I am not getting how to convert pdf to csv format code. Can you please help me how to load all pdf files and convert to csv if possible please share the code?

score 0 · Answer 1 · Jul 9, 2019

You can convert the pdf files with the help of some external tools that you can find online. After this, upload the files on hdfs and create the hive table to store these files in the hive table. Refer to the below example to create a hive table and load a CSV file in it.

create table users_data(userid varchar(10), location varchar(100), age varchar(5)) row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde' with serdeproperties("separatorChar" = "\;","quoteChar" = "\"") stored as textfile;

Below is the query to load the CSV file in the above table,

load data inpath 'BX-Users.csv' into table users_data;

Refer to the below screenshot for the same:

answered Jul 9, 2019 by Esha

Pdf to csv file format conversions

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Big Data Hadoop

How can we transfer a PDF file to HDFS?

How to change file format using Sqoop?

Is there a way to copy data from one one Hadoop distributed file system(HDFS) to another HDFS?

Copy file from HDFS to the local file system

Hadoop Mapreduce word count Program

hadoop.mapred vs hadoop.mapreduce?

hadoop fs -put command?

Hadoop dfs -ls command?

How to convert .txt file to Hadoop's sequence file format

How to run a jar file in hadoop?

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES