Huge data blocks resides under single MDF file might cause a performance of the query and impact the application services. This blog help you to understand how we can the split the single MDF file into multiple data files.

Murali March 15, 2021

Summarize with AI: ChatGPT Google AI Perplexity Claude Grok

Subscribe for email updates

Summarize with AI: ChatGPT Google AI Perplexity Claude Grok

Introduction:-

Most of us do not think about using multiple files for the single database until data grows larger. Query performance starts to perform poorly on the PROD environment because the data blocks reside on a single MDF file. Let’s discuss what are all the techniques in SQL server to improve the performance by creating multiple data files and evenly distribute the data from old data files in this blog. There are two ways to achieve this,

Splitting the single data files eventually to multiple files.
Adding a filegroup by moving the clustered index.

Splitting the single data files eventually to multiple files:-

Step 1: Add required new empty data files to the file group.
Step 2: Disable the autogrowth on new files.
Step 3: Need to empty an existing single data file (original).
Step 4: Re-enable autogrowth.
Step 5: Shrink the original datafile to the same size of the new files.

The reason we disabled autogrowth on the five new files is to prevent the original file from getting too empty. In order to balance all the data files we need to disable autogrowth on the newly created files.

Process behind shrink command with empty option:-

Shrink command takes the hot data from the tail of the data file and transfers it into the newly added files.
It uses a proportional fill algorithm that will evenly distribute the data to the new files.So the database objects blocks are shared across the multiple data files.

Note: Number of additional files is calculating depending on what level we shrink the existing data file.

Originality of data files before the activity:-

-> A represents the Size of the data files

Originality of data files post the activity:-

-> A represents the Size of the data files

Performance during the activity:-

Disk Queue Length:-

Log Growth:-

During this operation we will be facing a high IO and unexpected heavy log growth.So overcome this log growth,we need to change the recovery model FULL to BULK-LOGGED where the servers in standalone setup.With this change,we can avoid the drastical log growth and we can’t change the recovery model where the servers in a high availability mode.

Benefits:-

This process is done online and it does not affect the live object.

Limitations:-

We might face a lot of I/O during the activity.
Log growth is high during the activity window.So it should cause the setup’s using always-on,database mirroring and log shipping because all of them need to be shipped.

Adding File-Group Method:-

Normally,data is stored in the leaf node of the clustered index.The File Group Method is nothing but the objects are moved to the new file group by rebuilding the clustered index on the new FG.

Existence of clustered index:-

Creating a FG with new data files.
Drop the clustered index on the existing default FG(Primary).
Rebuild the clustered index on the new FG which we created.
Shrink the old data files which reside in the default FG.

Non-Existence of clustered index:-

Creating a FG with new data files. .
Create the clustered index on the new FG.
Drop the clustered index on the new FG if not needed..
Shrink the old data files which reside in the default FG.

Benefits:-

Not a risky Task.
Process is done online.
Better for HA setup.

Limitations:-

If there is no clustered index,it’s a time taking process for larger tables.
It will give performance if it is in a separate physical drive.
Expect log growth.

Impact of log growth against HA and standalone environment with splitting method:-

Standalone env:-

If we change the recovery model from FULL to BULK-LOGGED,it will reduce the log growth,but we will lose the point-in-time recovery incase of any crashes/failures during the activity.So you should take the hot backup full/diff depending upon your needs without any delay.

HA env:-

Increases of log growth will affect the secondary setup because it’s working concept depends upon the Transaction Log.We are not able to change the recovery model from FULL to BULK-LOGGED because it will work only on FULL recovery model.

We will use adding a filegroup approach for both HA and standalone environments,it might cause log growth depending on recreation of index for the tables which are in larger size and for gaining the performance the new FG should be in separate physical drives.

Comparison between the methods:-

Conclusion:-

Splitting data files in a primary filegroup is a time taking online process with an expense of log growth depending on your database sizes and the addition of new FG method is not bit risky task when compared to the first method.So as we conclude,If we are in a standalone environment,we will go for the splitting method and in a high availability environment,we should go for adding filegroup methods to reduce the impact on the PROD environment.

Name *

Email *

Comment *

Data at Rest refers to inactive data stored in databases, while Data in Motion is actively transmitted or processed between systems in SQL Server.

1807 views

How to Optimize SQL Server Performance: Implementing Online DDL

SQL Server 2022 added the "Wait at Low Priority" option that allows index creation and alteration to wait for resources to become available before executing.

3264 views

Resumable Operations for ALTER TABLE Constraints in SQL Server 2022

Empower Your Database Management with SQL Server 2022's Resumable Operations

2520 views

SQL Server On Linux

SQL Server 2017 brings the best features of the Microsoft relational database engine to the enterprise Linux ecosystem, including SQL Server Agent, Azure Active Directory (Azure AD) authentication, best-in-class high availability/ disaster recovery, and unparalleled data security.

2763 views

Achieving High Availability Using Log Shipping