ADLS Query Acceleration
ADLS:
Azure Datalake Store , as we all know is an enterprise scale big data lake store which make it easy for developers, data scientists, and analysts to store data of any size, shape, and speed, and do all types of processing and analytics across platforms and languages.
Query Acceleration:
Recently announced by the ADLS team is an exciting feature to run ANSI style SQL queries against blobs stored in ADLS from java and .net code currently. Until now developer has to either download the large data locally to perfrom quick data exploration or to mount it with a SQL layer such as synapse/Databricks to query the data. But with Query acceleration the cost of additional data processing layer is saved.
Here is the high level flow of Query acceleration from client app
I have personally tried this on a sample airbnb data for amsterdam rentals. Data of size ~300MB and its awesome. Though it doesnt currently support wide level aggregate functions such as group by , but it comes with decent count,avg,min,max etc.
Link to my Github project :https://github.com/rockssk/adls-query-accelerator
Learn how to use Query Acceleration for Java and .NET.
Refer here for the SQL usage SQL Reference
Azure Datalake Store , as we all know is an enterprise scale big data lake store which make it easy for developers, data scientists, and analysts to store data of any size, shape, and speed, and do all types of processing and analytics across platforms and languages.
Query Acceleration:
Recently announced by the ADLS team is an exciting feature to run ANSI style SQL queries against blobs stored in ADLS from java and .net code currently. Until now developer has to either download the large data locally to perfrom quick data exploration or to mount it with a SQL layer such as synapse/Databricks to query the data. But with Query acceleration the cost of additional data processing layer is saved.
Here is the high level flow of Query acceleration from client app
- The client application requests file data by specifying predicates and column projections.
- Query Acceleration parses the specified query and distributes work to parse and filter data.
- Processors read the data from the disk, parses the data by using the appropriate format, and then filters data by applying the specified predicates and column projections.
- Query Acceleration combines the response shards to stream back to client application.
- The client application receives and parses the streamed response. The application doesn't need to filter any additional data and can apply the desired calculation or transformation directly.
I have personally tried this on a sample airbnb data for amsterdam rentals. Data of size ~300MB and its awesome. Though it doesnt currently support wide level aggregate functions such as group by , but it comes with decent count,avg,min,max etc.
Link to my Github project :https://github.com/rockssk/adls-query-accelerator
Learn how to use Query Acceleration for Java and .NET.
Refer here for the SQL usage SQL Reference
Comments
Post a Comment