Services

Careers

Products

Contact

Schedule a Call

About

Services

Blogs

Careers

Products

Contact

Schedule a Call

Home
> Blogs
> Clickhouse
> Speeding up ClickHouse Queries: Materialized Views vs. Refreshable Views Explained

Speeding up ClickHouse Queries: Materialized Views vs. Refreshable Views Explained

Learn how materialized views and their refreshable counterparts simplify complex analytics by precomputing results, boosting query performance, and enabling near real-time decision-making.

Vignesh T K March 31, 2025

Subscribe for email updates

Introduction:

In the world of big data analytics, crunching through raw datasets repeatedly can feel like running in circles—slow, resource-heavy, and far from efficient. Enter Materialized Views (MVs) and Refreshable Materialized Views (RMVs), the game-changers that transform raw data into instant insights. Think of them as your fast-track bridge between massive data lakes and lightning-fast analytics, delivering results in record time with minimal effort.

What is Materialized View?

Materialized views shift heavy computation from query time to insert time, delivering lightning-fast SELECT queries. In ClickHouse, they act like real-time triggers, running queries on incoming data and storing the results in a target table. These precomputed results—whether aggregated, filtered, or transformed—reduce data size and simplify queries, making analytics incredibly efficient. Think of materialized views as constantly updating indexes, keeping your data ready for quick insights as soon as it lands!

How ClickHouse store precomputed query result?

When you create a materialized view, ClickHouse computes the query result and stores it on disk. When queried, it retrieves the precomputed data directly from storage, avoiding recalculations.

Materialized views can be created using SQL queries on one or more tables. They support updates through operations like INSERT, UPDATE, and DELETE. When the source table changes, the materialized view automatically updates to keep the results consistent.

Creating Materialized View in ClickHouse:

The votes table stores voting records, including details like vote type, creation date, and user ID.

CREATE TABLE votes
(
`Id` UInt32,
`PostId` Int32,
`VoteTypeId` UInt8,
`CreationDate` DateTime64(3, 'UTC'),
`UserId` Int32,
`BountyAmount` UInt8
)
ENGINE = MergeTree
ORDER BY (VoteTypeId, CreationDate, PostId)

To analyze the dataset, let's count the total number of records in the votes table:

SELECT count()
FROM votes
FINAL

┌─count()──┐
│ 238984011│
└──────────┘

To optimize vote analysis, we aggregate daily upvotes and downvotes:

SELECT toStartOfDay(CreationDate) AS day,
countIf(VoteTypeId = 2) AS UpVotes,
countIf(VoteTypeId = 3) AS DownVotes
FROM votes
GROUP BY day
ORDER BY day ASC
LIMIT 10

┌─────────────────day─┬─UpVotes─┬─DownVotes─┐
│ 2008-07-31 00:00:00 │ 6       │ 0         │
│ 2008-08-01 00:00:00 │ 182     │ 50        │
│ 2008-08-02 00:00:00 │ 436     │ 107       │
│ 2008-08-03 00:00:00 │ 564     │ 100       │
│ 2008-08-04 00:00:00 │ 1306    │ 259       │
│ 2008-08-05 00:00:00 │ 1368    │ 269       │
│ 2008-08-06 00:00:00 │ 1701    │ 211       │
│ 2008-08-07 00:00:00 │ 1544    │ 211       │
│ 2008-08-08 00:00:00 │ 1241    │ 212       │
│ 2008-08-09 00:00:00 │ 576     │ 46        │
└─────────────────────┴─────────┴───────────┘

10 rows in set. Elapsed: 0.133 sec. Processed 238.98 million rows, 2.15 GB (1.79 billion rows/s., 16.14 GB/s.)
Peak memory usage: 363.22 MiB.

While this query efficiently retrieves insights, executing it frequently on a large dataset is resource-intensive. To speed up analytics, we create a materialized view.

Creating a target table ,it will store precomputed daily vote counts.

CREATE TABLE up_down_votes_per_day
 ( 
   `Day` Date,
  `UpVotes` UInt32,
  `DownVotes` UInt32
)
ENGINE = SummingMergeTree
ORDER BY Day

To initialize the materialized view, we insert aggregated data:

INSERT INTO up_down_votes_per_day 
SELECT CAST(toStartOfDay(CreationDate), 'Date') AS Day,
       countIf(VoteTypeId = 2) AS UpVotes,
       countIf(VoteTypeId = 3) AS DownVotes
FROM votes
GROUP BY Day

Instead of scanning millions of rows, we can now query the precomputed table:

SELECT count() FROM up_down_votes_per_day

   ┌─count()─┐
 1.│    5723 │
   └─────────┘

SELECT 
   Day,
   UpVotes,
   DownVotes 
FROM up_down_votes_per_day FINAL 
ORDER BY Day ASC 
LIMIT 10

     ┌────────Day─┬─UpVotes─┬─DownVotes─┐
  1. │ 2008-07-31 │       6 │         0 │
  2. │ 2008-08-01 │     182 │        50 │
  3. │ 2008-08-02 │     436 │       107 │
  4. │ 2008-08-03 │     564 │       100 │
  5. │ 2008-08-04 │    1306 │       259 │
  6. │ 2008-08-05 │    1368 │       269 │
  7. │ 2008-08-06 │    1701 │       211 │
  8. │ 2008-08-07 │    1544 │       211 │
  9. │ 2008-08-08 │    1241 │       212 │
 10. │ 2008-08-09 │     576 │        46 │
     └────────────┴─────────┴───────────┘
10 rows in set. Elapsed: 0.006 sec. Processed 5.72 thousand rows, 57.23 KB (1.02 million rows/s., 10.19 MB/s.)
Peak memory usage: 65.70 KiB.

Performance Comparison:

Query	Rows Processed	Data Scanned	Execution Time	Memory Used	Storage Space
Direct Query on votes	238.98M rows	2.15 GB	0.491 sec	3.79 MiB	1.7GB
Query on Materialized View (up_down_votes_per_day)	5.72K rows	57.23 KB	0.006 sec	65.70 KiB	52.13KB

Refreshable Materialized View:

Refreshable Materialized Views (RMVs) provide a more flexible approach to data updates by allowing periodic refreshes instead of real-time updates triggered by new inserts. This makes them particularly useful when data needs to be recalculated at scheduled intervals rather than continuously, ensuring efficient resource utilization while maintaining up-to-date analytics.

Creating Refreshable Materialized View:

Imagine you have a table called orders, storing e-commerce transactions. You want to maintain a daily summary of total sales per product but refresh it every hour instead of updating in real-time.

First, let's define the table that will store the raw transaction data.

 CREATE TABLE orders
 (
     `order_id` UInt32,
     `product_id` UInt32,
     `order_date` DateTime,
     `amount` Float32
 )
 ENGINE = MergeTree
 ORDER BY order_date

The target table will store precomputed aggregated data for better query performance.

CREATE TABLE daily_sales (
    product_id UInt32,
    order_date DateTime,
    total_sales Float32,
) ENGINE = MergeTree()
ORDER BY (product_id, order_date);

Now, we create a refreshable materialized view that updates every hour:

CREATE MATERIALIZED VIEW daily_sales_mv  
REFRESH EVERY 1 HOUR  
TO daily_sales AS  
SELECT  
    product_id
    toDate(order_date) AS order_date,  
    SUM(amount) AS total_sales,  
    COUNT(*) AS total_orders  
FROM orders
GROUP BY product_id, order_date;

To begin synchronization, we manually insert the aggregated data into daily_sales:

INSERT INTO daily_sales
SELECT
    product_id,
    toDate(order_date) AS order_date,
    SUM(amount) AS total_sales,
    COUNT(*) AS total_orders
FROM orders 
GROUP BY product_id,order_date

After executing the above steps, you can now query daily_sales to see the precomputed results

SELECT * FROM orders

    ┌─order_id─┬─product_id─┬──────────order_date─┬─amount─┐
 1. │        1 │        101 │ 2024-02-05 10:15:00 │   25.5 │
 2. │        2 │        102 │ 2024-02-05 10:30:00 │     40 │
 3. │        3 │        103 │ 2024-02-05 11:00:00 │  15.75 │
 4. │        4 │        101 │ 2024-02-05 11:15:00 │  30.25 │
 5. │        5 │        102 │ 2024-02-05 12:00:00 │     50 │
 6. │        6 │        103 │ 2024-02-05 12:30:00 │     20 │
 7. │        7 │        101 │ 2024-02-05 13:00:00 │   27.5 │
 8. │        8 │        102 │ 2024-02-05 13:15:00 │     60 │
 9. │        9 │        103 │ 2024-02-05 14:00:00 │     18 │
10. │       10 │        101 │ 2024-02-05 14:30:00 │  35.75 │
    └──────────┴────────────┴─────────────────────┴────────┘
10 rows in set. Elapsed: 0.003 sec.

Now you can see the pre-computed data in the target table

SELECT * 
FROM daily_sales
   ┌─product_id─┬──────────order_date─┬─total_sales─┬─total_orders─┐
1. │        101 │ 2024-02-05 00:00:00 │         119 │            4 │
2. │        102 │ 2024-02-05 00:00:00 │         150 │            3 │
3. │        103 │ 2024-02-05 00:00:00 │       53.75 │            3 │
   └────────────┴─────────────────────┴─────────────┴──────────────┘
3 rows in set. Elapsed: 0.003 sec.

As new transactions arrive in the orders table, they won’t be reflected in daily_sales until the next refresh.

If needed, you can trigger a manual refresh:

SYSTEM REFRESH VIEW daily_sales_mv;

Once the refresh is complete, the latest aggregated sales data will be available in daily_sales.

SELECT *
FROM daily_sales

   ┌─product_id─┬──────────order_date─┬─total_sales─┬─total_orders─┐
1. │        101 │ 2024-02-05 00:00:00 │       151.5 │            5 │
2. │        102 │ 2024-02-05 00:00:00 │         225 │            4 │
3. │        103 │ 2024-02-05 00:00:00 │       76.55 │            4 │
4. │        104 │ 2024-02-05 00:00:00 │          45 │            1 │
5. │        105 │ 2024-02-05 00:00:00 │       55.25 │            1 │
   └────────────┴─────────────────────┴─────────────┴──────────────┘

5 rows in set. Elapsed: 0.003 sec.

Scenarios Where Each Type is More Beneficial:

Use Materialized View When:

Real-time updates are critical – Data should be updated immediately as new records are inserted.
Incremental aggregations are needed – Aggregates like SUM() or AVG() over a single table benefit from incremental updates.
High-volume streaming data is involved – The view is updated automatically without requiring manual refreshes.

Use Refreshable Materialized Views When:

Data doesn’t need real-time updates – Periodic refreshes are sufficient for reporting and analytics.
Complex joins or transformations are required – Some queries don’t support incremental updates, making a periodic full refresh necessary.
You want more control over refresh timing – Instead of updating continuously, the view updates at scheduled intervals or on demand.

Performance Benefits in Analytics:

How Materialized Views Reduce Query Time:

Queries run against precomputed results rather than recalculating them each time.
Ideal for frequent queries that involve expensive computations.

Reducing System Usage and Resource Usage:

Materialized Views: Optimize storage and CPU usage by processing only new inserts rather than recalculating the entire dataset.
Refreshable Materialized Views: Avoid excessive updates by refreshing only when needed, reducing unnecessary processing.

Comparison: Materialized View vs Refreshable Materialized View:

Feature	Materialized View	Refreshable Materialized View
Update Method	Real-time on insert	Manual or scheduled refresh
Best for	High-frequency queries, real-time analytics	Periodic reports, complex joins
Performance	Faster for incremental aggregations	Reduces system load for large datasets
Use Case	Continuous monitoring dashboards	Batch analytics, periodic data recomputation

Conclusion:

Choosing between Materialized Views and Refreshable Materialized Views depends on your specific analytics needs. If you require real-time updates and incremental processing, Materialized Views are the best option. However, if your queries involve complex joins, periodic recomputation, or require greater control over refresh timing, Refreshable Materialized Views provide the flexibility you need.

By leveraging these powerful features in ClickHouse, businesses can optimize query performance, reduce system load, and accelerate data analytics, ensuring fast, efficient, and scalable insights for modern data-driven applications.