Azure Synapse Analytics reached general availability in December 2020. It combines a dedicated SQL pool (formerly SQL Data Warehouse), Apache Spark, and serverless SQL query over data lake storage in a single workspace.
The unified workspace
Azure Synapse Studio provides a single development environment for SQL analytics, Spark (PySpark, Scala, .NET Spark), pipelines (similar to Azure Data Factory), and Power BI integration. The workspace model collapses what was previously multiple separate Azure services (SQL DW, HDInsight, Data Factory, Azure Machine Learning) into a single integrated product.
Serverless SQL for ad-hoc analytics
The serverless SQL pool allows T-SQL queries against data in Azure Data Lake Storage without provisioning a SQL server. The query engine reads Parquet and CSV files from ADLS and executes SQL queries with no pre-loading. For exploratory analytics and reporting on data lake data, serverless SQL provides a familiar SQL interface with pay-per-query pricing and no cluster management.
Dedicated SQL pool for high-concurrency workloads
The dedicated SQL pool (formerly SQL Data Warehouse) uses massively parallel processing across many compute nodes for high-concurrency analytical queries on large datasets. The 60-distribution architecture distributes data and queries across nodes. Choosing the right distribution key (the column used to hash-distribute data across nodes) is the primary performance tuning decision for dedicated SQL pools.
The Spark integration
Spark notebooks in Synapse share the same metadata service as the SQL pools, tables registered in Spark are queryable from serverless SQL and vice versa. Spark is appropriate for large-scale data transformation, ML model training, and streaming analytics. Synapse Spark uses Azure Databricks' optimised Spark runtime (Delta Lake-compatible) and auto-scales the cluster based on job requirements.