ADLS Query Acceleration

May 08, 2020

ADLS Query Acceleration

ADLS:

Azure Datalake Store , as we all know is an enterprise scale big data lake store which make it easy for developers, data scientists, and analysts to store data of any size, shape, and speed, and do all types of processing and analytics across platforms and languages.

Query Acceleration:

Recently announced by the ADLS team is an exciting feature to run ANSI style SQL queries against blobs stored in ADLS from java and .net code currently. Until now developer has to either download the large data locally to perfrom quick data exploration or to mount it with a SQL layer such as synapse/Databricks to query the data. But with Query acceleration the cost of additional data processing layer is saved.

Here is the high level flow of Query acceleration from client app
How a typical application uses Query Acceleration to process data:

How a typical application uses Query Acceleration to process data:

The client application requests file data by specifying predicates and column projections.
Query Acceleration parses the specified query and distributes work to parse and filter data.
Processors read the data from the disk, parses the data by using the appropriate format, and then filters data by applying the specified predicates and column projections.
Query Acceleration combines the response shards to stream back to client application.
The client application receives and parses the streamed response. The application doesn't need to filter any additional data and can apply the desired calculation or transformation directly.

This feature is currently in public preview in Canada Central and France Central regions with csv and json files support using .NET and Java currently with more file types and languages support in the pipeline . Enroll here for a public preview

I have personally tried this on a sample airbnb data for amsterdam rentals. Data of size ~300MB and its awesome. Though it doesnt currently support wide level aggregate functions such as group by , but it comes with decent count,avg,min,max etc.

Link to my Github project :https://github.com/rockssk/adls-query-accelerator
Learn how to use Query Acceleration for Java and .NET.
Refer here for the SQL usage SQL Reference

Search This Blog

BigData & Advanced Analytics Tips&Tricks

ADLS Query Acceleration

Comments

Post a Comment

Popular Posts

How to resolve Parquet File issue

Curl WebAPI call to an ssl and kerberos enabled Solr instance