An Introduction to Apache Kudu


Apache Kudu : Fast Analytics on Fast Data




                       Over the past several years, the Hadoop ecosystem has made great strides in its real-time access capabilities, narrowing the gap compared to traditional database technologies. With systems such as Impala and Spark, analysts can now run complex queries or jobs over large datasets within a matter of seconds. With systems such as Apache HBase and Apache Phoenix, applications can achieve millisecond-scale random access to arbitrarily-sized datasets.



                         Despite these advances, some important gaps remain that prevent many applications from transitioning to Hadoop-based architectures. Users are often caught between a rock and a hard place: columnar formats such as Apache Parquet offer extremely fast scan rates for analytics, but little to no ability for real-time modification or row-by-row indexed access. Online systems such as HBase offer very fast random access, but scan rates that are too slow for large scale data warehousing workloads.



 Apache Kudu (incubating), a new addition to the open source Hadoop ecosystem that fills the gap described above, complementing HDFS and HBase to provide a new option to achieve fast scans and fast random access from a single API. 







Characteristics:



    Strong performance for both scan and random access to help customers simplify complex hybrid architectures

  High CPU efficiency in order to maximize the return on investment that our customers are making in modern processors

   High IO efficiency in order to leverage modern persistent storage

   The ability to update data in place, to avoid extraneous processing and data movement

   The ability to support active-active replicated clusters that span multiple data centers in geographically distant locations



http://getkudu.io/



http://vision.cloudera.com/wp-content/uploads/2015/09/kude4.png

Comments

  1. You are doing a great job by sharing useful information about Apache spark course. It is one of the post to read and improve my knowledge in Apache spark.You can check our Apache spark Introduction Tutorial,for more information about Apache Spark Introduction.

    ReplyDelete

Post a Comment

Popular Posts