1. Can Kx work with HDFS?
Yes. However, it is unlikely to be chosen as an approach. The reasons the analytics industry is moving away from HDFS as a construct for analytics applies to Kx also. Throughput and latency of read/write operations using HDFS is much less efficient than with embedded storage or a distributed object or file system, even when using the same volume of storage equipment. Some contributors to the performance degradation of HDFS for Kx can be slightly mitigated by layering traditional file systems under HDFS, such as with the Lustre, GPFS or MapR file systems. Note that if the HDFS layer is implemented on top of another distributed file system, this throws up the possibility of using its perhaps more beneficial methods to read/write data into Kx, which somewhat makes the HDFS layer unnecessary.
2. Can Kx ingest data directly from HDFS sources?
Yes. This is a much more likely scenario for a sophisticated user of Kx’s kdb+ database. Kdb+ has interfaces for a wide range of ingest sources and languages, including the ability to ingest from HDFS files via the Hadoop utilities. For example “Hadoop FS” could be piped into a FIFO within the named-pipe support of q.
3.What about MapReduce with kdb+?
Use of the MapReduce model is inherent within kdb+. It can manifest not only across a distributed networked architecture but also can efficiently span shared memory when running many threads on one server
4. Can Kx work alongside Hive or Spark?
Yes. This is the best use case for Kx/Hadoop interoperation. For example, runtime data being generated and stored in Spark/HBase or Spark can be interoperated with Kx using a number of public interfaces e.g. The operating functions found within Kx are a superset of the functions offered in Spark. We envisage the requirement for an ETL (batch) process extracting data from a Hive or HBase database into kdb+, followed by q syntax data analytics. Performance and function of this will depend on the data model and the type of data being transformed.
5. Can I port from kdb+ to one of the other toolsets in Hadoop?
Nothing prevents this, but you will almost certainly end up with a slower solution in terms of latency, throughput and query time metrics. If this is acceptable to the user of the application it could be considered. For any time-series or similarly structured data, the data could be exported and reimported. The target system will lack some of the capabilities built into kdb+.
You can read more about it on the following link: https://bit.ly/2LDMLWC