What is the best way to integrate SAS with Hadoop without losing the parallel processing capacity of Hadoop

0 votes
I am trying to understand the integration between SAS and Hadoop. From what I understand, SAS processes like proc sql can only work against a SAS data set, I cannot issue proc sql against a text file on a hadoop node. Is it correct?

If yes, then I need to uses some ETL jobs to first take the data out of HDFS and convert it to SAS tables. But if I do that, I will lose the parallel processing capabilties of Hadoop.

So what is the ideal way of integrating SAS and Hadoop and still use the parallel processing power of Hadoop?

I understand you can call a map reduce job from inside SAS, but can a map reduce job be written in SAS? I think not.
Oct 16, 2018 in Big Data Hadoop by Neha
• 6,300 points
2,309 views

1 answer to this question.

0 votes

One of the major pushes at SAS Global Forum 2015 was actually the new options for connections to Hadoop and Teradata. FEDSQL and DS2, new in SAS 9.4, exist in part specifically to enable SAS to better work with Hadoop. You can execute code directly in your Hadoop node, as well as do a lot more efficient processing in SAS directly.

Assuming you have the most recent release of SAS (9.4 TS1M3), you can look at the SAS Release Notes (Current as of 9/3/2015; in the future this will point to later versions). That includes information like the following:

In the second maintenance release for SAS 9.4, the SAS In-Database Code Accelerator for Hadoop runs the DS2 data program as well as the thread program inside the database. Several new functions have been added. The HTTP package enables you to construct an HTTP client to access web services and a new logger enables logging of HTTP traffic. A connection string parameter is available when instantiating an SQLSTMT package.

SAS FedSQL is a SAS proprietary implementation of the ANSI SQL:1999 core standard. It provides support for new data types and other ANSI 1999 core compliance features and proprietary extensions. FedSQL provides data access technology that brings a scalable, threaded, high-performance way to access, manage, and share relational data in multiple data sources. FedSQL is a vendor-neutral SQL dialect that accesses data from various data sources without submitting queries in the SQL dialect that is specific to the data source. In addition, a single FedSQL query can target data in several data sources and return a single result table. The FEDSQL procedure enables you to submit FedSQL language statements from a Base SAS session. The first maintenance release for SAS 9.4 adds support for Memory Data Store (MDS), SAP HANA, and SASHDAT data sources.

In the second maintenance release for SAS 9.4, SAS FedSQL supports Hive, HDMD, and PostgreSQL data sources. Data types can be converted to another data type. You can add DBMS-specific clauses to the end of the CREATE INDEX statement, and you can write a SASHDAT file in compressed format.

In the third maintenance release of SAS 9.4, FedSQL has added support for HAWQ and Impala distributions of Hadoop, enhanced support for Impala, new data types, and more.

Hadoop Support

The first maintenance release for SAS 9.4 enables you to use the SPD Engine to read, write, and update data in a Hadoop cluster through the HDFS. In addition, you can now use the HADOOP procedure to submit configuration properties to the Hadoop server.

In the second maintenance release for SAS 9.4, performance has been improved for the SPD Engine access to Hadoop. The SAS Hadoop Configuration Guide for Base SAS and SAS/ACCESS is available from the support.sas.com third-party site for Hadoop.

In the third maintenance release of SAS 9.4, access to data stored in HDFS is enhanced with a new distributed lock manager and therefore easier access to Hadoop clusters using Hadoop configuration files.

Beyond this, there is extensive documentation and papers written on the subject; documentation for the SAS Connector for Hadoop, for example.

answered Oct 16, 2018 by Frankie
• 9,830 points

Related Questions In Big Data Hadoop

0 votes
1 answer

I have to ingest in hadoop cluster large number of files for testing , what is the best way to do it?

Hi@sonali, It depends on what kind of testing ...READ MORE

answered Jul 8, 2020 in Big Data Hadoop by MD
• 95,460 points
1,297 views
0 votes
1 answer

Best way of starting & stopping the Hadoop daemons with command line

First way is to use start-all.sh & ...READ MORE

answered Apr 15, 2018 in Big Data Hadoop by Shubham
• 13,490 points
10,198 views
+1 vote
3 answers

What is the best way to merge multi-part HDFS files into single file?

1. In order to merge two or ...READ MORE

answered Jul 29, 2019 in Big Data Hadoop by Tina
31,221 views
0 votes
1 answer

What is the use of sequence file in Hadoop?

Sequence files are binary files containing serialized ...READ MORE

answered Apr 6, 2018 in Big Data Hadoop by Ashish
• 2,650 points
9,644 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
11,152 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,644 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
109,425 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
4,675 views
0 votes
1 answer

What is the best functional language to do Hadoop Map-Reduce?

down voteacceptedBoth Clojure and Haskell are definitely ...READ MORE

answered Sep 4, 2018 in Big Data Hadoop by Frankie
• 9,830 points
934 views
0 votes
1 answer

What is the standard way to create files in your hdfs file-system?

Well, it's so easy. Just enter the below ...READ MORE

answered Sep 23, 2018 in Big Data Hadoop by Frankie
• 9,830 points
2,643 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP