Defragmentation wars

The subject of disk fragmentation will almost always draw heated arguments but seldom gets treated in its entirety. I'm going to try to do that here, but will probably miss a point or two: this is an extraordinarily complex subject and honestly there aren't any easy answers.

The basic idea is this: if a file's disk blocks are contiguous, following one after another on the physical disk drive, the disk heads won't have to move very much (perhaps not even at all) when reading the file. As moving those heads obviously takes more time than not moving them, fragmentation is undesirable.

Yes, but that doesn't mean your disk needs defragging. The first thing you need to know is that many modern file systems defrag themselves "on the fly". For example, the HFS+ file system currently used in Mac OS X defrags some files automatically (from http://www.osxbook.com/book/bonus/chapter12/hfsdebug/fragmentation.html):


When a file is opened on an HFS+ volume, 
the following conditions are tested:

    * If the file is less than 20 MB in size
    * If the file is not already busy
    * If the file is not read-only
    * If the file has more than eight extents
    * If the system has been up for at least three minutes

If all of the above conditions are satisfied, the file is relocated -- 
it is defragmented on-the-fly.
 

Modern file systems tend to use a "worst fit" algorithm when trying to decide where to put a file initially - that is, rather than looking for the smallest block of free space that will fit the new file and jamming it in between other files, they'll look for a big unused chunk of space and put the new file far away from any other files - and it certainly helps that we often have very large, mostly unused disks now! This Why doesn't Linux need defragmenting? post attempts to show that graphically. I have a little quibble with that: Linux has many possible filesystems and not all approach defragmentation the same way, but it can help visualize the basic concept.

Multi-user vs. Single-user

Before we get too excited about the various techniques to avoid defragmentation, consider a Linux or Unix system supporting a few dozen programmers and web developers. Obviously they'll mostly be working on different files, so those disk heads are going to be jumping around from file to file constantly. Fragmentation is of less concern on such a system, so you'll often hear people arguing that fragmentation has nothing to do with multi-user systems.

But what about a large multi-user system where there's a big database and the users are all accessing that. Wouldn't that need to worry about fragmentation? Well, perhaps more so than the system where everybody is doing different things, but even so it's still likely that the users will be asking for different sections, so the disk heads will be fluttering about just the same.

Elevator seek

Well maybe not. Instead of just asking the drive to give you the block Joe wants, why not wait a little bit and see what Sarah and Jane might need? If you do that, and Jane's next requested block is close to where the heads are now, why not get that first? Indeed, that's a common optimization technique.

But wait: modern disk drives are pretty smart. Why couldn't they optimize requests the same way? In fact, they can. And probably need to..

LBA and reassignment

Logical block addressing throws a new wrinkle into this. The block numbers the OS wants to access are translated into real physical addresses. On the face of it, that shouldn't necessarily affect fragmentation: Logical block 1 is likely to still be physically right beside logical block 2 or at least close by. But not if there has been bad sector reassignment: in that case, the supposedly contiguous data may in fact have a chunk that has been moved far, far away. The data is contiguous as far as the OS knows, but in fact it may not be. If the disk drive itself implements elevator seeking, it might be able to intelligently work around that up to soften the effect.

Note that disk caching also tends to destroy any value from OS based anticipation and reordering: the drive heads may not have to move at all because the data you want is already in the drive's own on-board cache, but OS based reordering may actually cause movement when none was necessary!

Partitioning

Partitioning can help fragmentation. For example, if you put /tmp and other frequently used file systems in their own partitions, files coming and going won't have to compete for disk blocks with more stable files. The other way to look at that is that a heavily accessed file like that multi-user database mentioned above might benefit from its own separate partition and filesystem.

However..

Virtual Files Systems

LVM screws that royally. Again, it's reasonable to assume that when first created, a LVM file system is likely to be built from contiguous blocks. But if it is extended, who knows where the new blocks came from?

Let's just pause for a second: imagine an LVM filesystem on a disk capable of elevator seeks that also has a large on-board cache and is being used on a multi-user system with hundreds of users all charged with different responsibilities.. how much do you think you need to worry about fragmentation?

Back to single user

Ok, but MY system is single user. Well.. sort of. Actually, whether it's Linux or OS X or Windows XP, it isn't really single user anymore. There's a lot of "system" stuff constantly going on - logging, daemons checking their config files, downloading OS updates and patches.. it's not just you asking for disk blocks, is it?

I just ran "sar -d 5 30" on my Mac and walked away.. Here are the results, stripped of the lines where there was no disk activity at all:

New Disk: [disk1] IODeviceTree:[email protected][email protected],[email protected]:0
New Disk: [disk0] IODeviceTree:[email protected][email protected],[email protected][email protected][email protected]:0
13:05:08   device    r+w/s    blks/s
13:05:13   disk0        1         11
13:05:23   disk0        1          9
13:05:33   disk0        1          9
13:05:38   disk0        4         85
13:05:43   disk0        1          9
13:05:53   disk0        1          9
13:06:03   disk0        1          9
13:06:08   disk0        2         39
13:06:13   disk0        1         11
13:06:23   disk0        1          9
13:06:33   disk0        1          9
13:06:38   disk0      186       19165
13:06:43   disk0       52       13837
13:06:53   disk0        1          9
13:07:03   disk0        1         11
13:07:08   disk0        3         82
13:07:13   disk0        1          9
13:07:23   disk0        1          9
13:07:33   disk0        1          9
13:07:38   disk0        2         34
           disk1    IODeviceTree:[email protected][email protected],[email protected]:0
Average:   disk1         0         0
           disk0    IODeviceTree:[email protected][email protected],[email protected][email protected][email protected]:0
Average:   disk0         9      1113
 

I wasn't doing anything, but OS X found plenty of reason to access my disk, didn't it? If I had been reading files, my reads would have had to compete with system reads.. what does that do with the carefully managed defragmentation mentioned above?

If Windows says I need it, I need it

The other thing I often hear is "What the heck - it only takes a few minutes and it can't hurt anything". Well, it can hurt: if the defrag gets interrupted midstream, that could hurt a lot. Defragging also obviously adds more wear and tear: you are reading and rewriting a lot of stuff and that does add up. But let's look at it from another slant: what file or files is the defragger worried about?

That is, are the files that cause the defrag utility to want to rearrange our whole disk anything that we are going to be accessing sequentially? The answer might be "yes", but it also might be "no". The defragger may see a system log file scattered willy-nilly here and there.. do we care? No, because we certainly are not reading that front to back on a daily basis and the OS itself is only adding to the end of it - it may be horribly fragmented, but that's never going to cause a disk head to move anywhere unusual.

Big database revisited

But no, the fragmented file is that big database we mentioned earlier - this time on a single user system. The stupid thing is all over your drive, here, there, everywhere.. surely a good defragging is in order?

Maybe. But often large databases use indexes.. your access probably involves reading that index (which may remain sitting in cache after the first read) and jumping out to particular sections of the big file from there - no sequential access at all. If that's the case (and it often is), the fragmentation of the big file is completely irrelevant: defragging it won't speed up a thing.

I'm sure I missed something..

As I said at the beginning, it's a big, complicated subject and I'm sure I've missed something. Feel free to add your thoughts to the comments.



Got something to add? Send me email.





(OLDER) <- More Stuff -> (NEWER)    (NEWEST)   

Printer Friendly Version

-> -> Fragmentation arguments never stop


10 comments



Increase ad revenue 50-250% with Ezoic


More Articles by

Find me on Google+

© Anthony Lawrence







Thu Apr 3 18:33:06 2008: 3936   rbailin


When the O/S relocates that mildly fragmented file from the crowded beginning of the disk to that big, empty space towards the end, performance suffers, because data transfer rates towards the end of the disk are much slower than at the beginning. For example, a Hitachi 7K200 notebook drive has a transfer rate of 67000KB/sec at 0GB, which drops to 34000KB/sec at 200GB according to the charts at Tom's Hardware.

--Bob



Thu Apr 3 18:54:22 2008: 3937   TonyLawrence

gravatar
Ayup (told you I'd forget something).

Though I think most algorithms work out from the center.. splitting the difference?



Fri Apr 4 12:40:18 2008: 3940   anonymous


What about free space fragmentation?

You completely ignored this. As the free space on the disk becomes fragmented, even by files that are not fragmented themselves, write performance on the disk will drop lower and lower over time. If you don't have large enough chucks of free space to write a file the file will have to fragment in order to be written to the drive.

To paraphrase Yoda: "Free Space Fragmentation leads to File Fragmentation and poor Write performance, File Fragmentation leads to poor Read performance, Poor performance leads to suffering...."

The very idea that disk fragmentation is not drain on disk performance is kind of silly and just smells like denial. Until hard drives are replaced by some completely digital format, fragmentation is going to be a problem.



Fri Apr 4 14:10:53 2008: 3942   TonyLawrence

gravatar
Well, I'm sorry, but as I have tried to explain above, it's much more complicated than your simple assertion states.

The fact is that under some circumstances, fragmentation of a specific file MAY impact performance, but an overall statement that fragmentation ALWAYS matters is just silly.



Fri Apr 4 18:32:46 2008: 3944   omega


Defragmenting windows (NTFS or FAT) is necessary to maintain optimum disk performance. The thing is, fragmentation is quite infectious i.e. a fragmented file(eg. page file) may induce fragmentation of other files if the fragments of the former prevent the latter from growing contiguously. Ofcourse, this description is somewhat simplified and idealized, but you get the idea.

The question of whether defragmentation is 'worth it' and how much is required, is usually considered against the time and resources expended for the task. What if the time and resource expenditure was not a factor at all?

To reduce the time and manpower costs associated with obsolete scheduled or manual defragmention of large numbers of machines, automatic defragmentation is becoming the norm with Windows. Intelligent automatic defragmenters work only on idle system resources, so there is no interference with other applications. It would be acceptable to say that Windows -users- don't have to waste time defragmenting anymore. Merely install a good defragmenter and it will do what it needs to do with the minimum of fuss and bother.



Fri Apr 4 19:37:19 2008: 3945   TonyLawrence

gravatar
"Defragmenting windows (NTFS or FAT) is necessary to maintain optimum disk performance."

Performance at what?

If a file that is never accessed sequentially (as in examples above) is fragmented, explain how that affects performance? Likewise, if a multiuser system has access patterns that read hundreds of different files (programmer example above), how does fragmentation affect that?

It doesn't in either case.





Sat Apr 5 02:01:15 2008: 3946   BigDumbDinosaur


I'll betcha most everyone on UNIX or Linux will never see a measurable difference in performance pre- and post-defragging. The reality is that hard drives (SCSI ones, at least) have gotten so fast any access appears to be instantaneous. Also, as Tony pointed out, there's a lot of random access going on at regular intervals. Obviously, disk buffering can help, but I doubt that defragging most filesystems will make enough of a difference to be noticeable.

The only time fragmentation might become a real performance issue on most systems is during an end-to-end search of an indexed database. Even there, the internal workings of the database may result in a lot of random access operations. For example, using an ISAM file, a search may read end-to-end in primary collating sequence, which means that each read op fetches the next logical key from the index. An ISAM index is a B-tree, which guarantees that random disk access is going to occur, even when a sequential search is being performed. Also the record to which the key points could be anywhere in the data file. So a contiguous set of files would probably offer no performance benefit at all and might actually hurt performance.

Bottom line, as Tony said, it's a very complicated subject and there's no "right" or "wrong" answer. Disk fragmentation on one system may be an issue, and not an issue on a different system. It all depends on how the system is being used. BTW, I recently replaced one of the (SCSI) drives in our office file and print server. Putting back all the files resulted in what was initially (one would think) a completely defragmented filesystem. Did I see any difference in performance? None at all!



Sat Apr 5 16:34:49 2008: 3950   TonyLawrence

gravatar
Ah, yes, the unexpected defragmentation from a system restore to a new disk :-)

Been there, done that. Even had sar reports to compare from before - absolutely no performance gain.

But I bet that we'll have more people insisting that defragging is a critical part of system maintenance.











Sun Apr 6 16:02:44 2008: 3976   BigDumbDinosaur


But I bet that we'll have more people insisting that defragging is a critical part of system maintenance.

System maintenance? Does that include changing the oil and greasing the spindle bearings in the hard drive? <Grin> In the Windows world, maintenance means removing the endless amounts of spyware that accumulates, as well as getting rid of the
virus de jour .



Sun Jan 4 20:50:24 2009: 5071   TonyLawrence

gravatar
This isn't fragmentation but it sure is interesting:
(link)

------------------------
Kerio Samepage


Have you tried Searching this site?

Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more.

Contact us





Never let a computer know you're in a hurry. (Anonymous)

C++ is a badly designed and ugly language. It would be a shame to use it in Emacs. (Richard Stallman)








This post tagged: