Real-time Analytics

What is Virtual Warehouse?

A virtual warehouse can scale out on demand, providing resources such as CPU, memory, and storage.

Dec 6, 2021

What's a Virtual Warehouse? A virtual data warehouse refers to compute clusters powering modern data warehouses such as ByteHouse.

A virtual warehouse can scale out on demand. It can provide resources such as CPU, memory, and temporary storage. These resources are key in performing the following operations:

  • Executing SQL statements that require compute resources (e.g. retrieving rows from tables and views) for short and long-running queries.
  • Performing Data Manipulation Language (DML) operations, such as:
    • Updating rows in tables (DELETE , INSERT , UPDATE).
  • Loading data into tables
  • Unloading data from tables

You can register a free ByteHouse account here and try creating a virtual warehouse on ByteHouse.

Advantages of Virtual Warehouse

Virtual warehouses have the following advantages:

Read-write separation and resource separation. Separate Virtual Warehouses can be created to perform data-loading tasks, or run queries in real-time. Creating Virtual Warehouses also allow different departments or business lines to have clear separation of resources, avoiding potential impact on performances between different queries.

One-click scaling. Virtual Warehouses can easily scale up or down according to business requirements for cost-effective resource utilisation.

Access control. Virtual warehouses can also be granted to different permissions into various roles, thus users can achieve fine-grained access control.

Find out how to create and manage virtual data warehouse on Bytehouse here.

Enterprise warehouse and virtual warehouse

There are three models of data warehouses from the architecture point of view: enterprise warehouse, data mart, and virtual warehouse.

  • The enterprise warehouse gathers all of the data on topics that cover the whole business. It necessitates in-depth business modeling and might take years to develop and construct.
  • A data mart is a subset of a company's data that is useful to a certain group of people. Its application is limited to a few chosen functions, such as marketing data marts.
  • A virtual warehouse refers to a collection of practical database views. It is simple to set up, but it necessitates more capacity for operating database servers.

Data virtualisation vs. data warehouse

The key benefit of data virtualisation is that it allows us to construct a solution in a fraction of the time it takes to establish a data warehouse.

Data virtualization

Data virtualisation has alternative names such as physical data warehouse, data federation, virtual database, and decentralised data warehouse. Data virtualisation enables organizations to integrate data from various sources, keeping the data in-place so that you can generate reports and dashboards to create business value from the data. It is an alternative approach to building a data warehouse, where you collect data from various sources and store a copy of the data in a new data store.

Unlike a data warehouse, data virtualisation does not require data to be stored in a single database repository. It virtualises an organisation’s siloed data from multiple sources into a single, unified data view from which a variety of BI tools can draw insights. Data is accessed in its native or original form “as is” but appears as a unified data warehouse to users.

The birth of the relational data warehouse in the early 1990s drove an enormous amount of innovation and revolutionised the way businesses managed data. Making data available to users in one single database repository with a query interface (SQL) was a game-changer. The ability to draw insights from different physical systems created new opportunities to improve business operations and create value.

Difference between distributed and virtual data warehouse

Virtual data warehouse or data virtualisation refers to a layer that sits on top of existing databases and enables the user to query all of them as if they were one database (although they are logically and physically separated).

Distributed data warehouse refers to the physical architecture of a single database. The data in the warehouse is integrated across the enterprise, and an integrated view is used only at central location of the enterprise. The enterprise operates on a centralised business model.

Register a free ByteHouse account here and try creating a virtual warehouse.

About the author: Sudarsan is a customer solution architect at ByteDance; and, he is a CDMP (DAMA) certified data architect with prior experiences with National University of Singapore, Development Bank of Singapore, Standard Chartered Bank and BNP Paribas.

Related articles

ClickHouse

Exploring Cloud-Native ClickHouse

ClickHouse

Shared-Everything Architecture vs. Shared-Nothing Architecture

Real-time Analytics

Event Tracking in Real-Time Data Analytics Introduction