File systems are used to more efficiently abstract data from disk stores. They come in a variety of shapes and sizes, with different features and implementation processes. The file system defines the data structures that can determine where a file is stored on the disk. Ideally, an entire file will be located in one place, without being fragmented, enhancing performance and allowing applications to sequentially read data in big chunks. When files are moved or renamed, their data stays in the same location, but deleted, edited (size changed) and copied files can lead to fragmentation.
What Are Fragmentation and Defragmentation?
As mentioned above, over time, file systems become fragmented due to files being repeatedly written and deleted. When this happens, the space the files once held on the disk, is made available. Subsequently, files get written and deleted, and at some stage, particularly in busy and relatively full volumes, even the significant amount of free space that is available, is fragmented into small sections. This leads to the creation of many holes in the disk. While file systems are designed to handle such situations, as a result, files are simply fragmented even further. For example, if a 1GB file has to be written into a file system, and there is not enough space to hold 1GB together, that file will be broken up into several pieces. However, as the files within the file system become more and more dismantled and dispersed, the system’s performance, as well as that of the applications that use that file system, will be affected.
While the type of storage used does come into play here, most disks have sequential I/O and random I/O. Sequential I/O is when the disk can read large chunks of data without seeking other locations on the disk, whereas random I/O occurs when the disk has to seek out data in various places. As expected, random reading is much slower than sequential reading.
One well known tactic to make PCs run faster is to run disk defragmentation, which performs an overview of the entire file system and unifies chunks of fragmented files. The whole process can take quite a while to complete, especially with a big disk. Nonetheless, at the end, it essentially cleans up the mess made by the fragmentation process, leaving unified files in their own sequential chunks, ultimately improving performance.
In the Cloud: EBS Snapshots, Frag, and Defrag
Defrag “Resets” the snapshot state: The issue that remains with block-level incremental snapshots, including EBS snapshots, is that they are not aware of the happenings in the file system level. They simply capture changes on the disk. While this is very efficient in most cases, enabling the ability to backup modified sectors of large files without copying the entire file, when defragmentation is performed on the disk, the whole makeup of the disk is changed from the snapshot’s point of view. The defrag process moves all of the blocks around, modifying most of the block on the disk. Ergo, after defrag is run on a file system, the snapshot that follows will be a full snapshot, and not incremental. It is important to be aware of this step in order to calculate how long the snapshot will take, along with storage costs.
Highly fragmented file systems will affect the performance and size of incremental snapshots, because snapshots work according to a certain block size, not a single byte. So if you have a file that is 1GB, and 1MB was changed within that file, if the file is in one chunk (the file system was not fragmented), then the number of blocks that will be changed for the next snapshot will be calculated by dividing that 1MB by the size of the block. However, if the file system is highly fragmented, the disk can be changed in many more places due to the fact that a number of bytes can be written in multiple places, causing a much larger increment.
Running a disk defrag depends on preference. In most cases, for critical applications, if a file system is highly fragmented, at some point, a defrag will be beneficial and desired, bearing in mind the subsequent snapshot procedures. Databases generally allocate very big files and have the know-how to change the files inside, resulting in the file system being less affected, whereas file system and file server performance can be greatly affected.
Overall, it is important to be aware of the trade-offs of all actions performed. If a file system is fragmented, a defrag is not necessarily an immediate necessity, unless performance is considerably affected. It is important to understand what running a defrag means in terms of EBS snapshots. Additionally, if a disk is highly fragmented, the incremental snapshots will be bigger, which is also a cost that needs to be considered. Ultimately, knowledge about each possibility will ease the decision making process. The accompanying tasks mentioned above should be taken into consideration, so as to maximize the efficiency of the process.
N2W’s primary product, Cloud Protection Manager (CPM), is a pioneering enterprise-class data protection solution for the AWS cloud. CPM supports backup and recovery for AWS EC2 instances, EBS volumes and RDS databases. Additionally, it supports the consistent backup of applications including the features listed below:
- Flexible backup policies and schedules
- Consistent backup of databases like SQL Server, Oracle, MySQL, MongoDB and more
- Complete instance recovery across AWS regions in seconds
- “Pull” and “Push” alerts and notifications