Freeing disk space with ">"

I wrote this up after a forum discussion in which several posters didn't really understand why ">" can free disk space when "rm" cannot. The basic problem is that if another process has a file open (for reading or writing, it doesn't matter), the disk blocks are not freed by an "rm" until the process or process using the file quits (or stops using the file, at least). That part seems to be well understood.

What is perhaps more difficult to understand is why a simple ">" CAN free up the bytes that "rm' cannot.

For those very new to Unix/Linux: if you are using almost any shell but "csh", a ">" followed by a file name will empty that file. That is, it will be truncated to zero bytes without changing the ownership or permissions. On more recent Linux machines, you may have a "truncate" command that will do the same thing.

For those NOT new to Unix or Linux, this article isn't meant for you. Unfortunately, it was linked to from some places frequented by more advanced users. Feel free to read it, of course, but it's not going to tell you anything you do not already know.

To show that, we need to write a little code. I'll use Perl for that, but if you don't grok Perl, don't worry -I'll explain it as we go along. I did this on a Mac, but you'd see the same thing on Linux or BSD.

Let's start with the "rm" issue. Our Perl code will just open a file and loop. We'll run that in one Terminal window and do everything else in another.

#!/usr/bin/perl
open(I,"./t");
while (1) {
 sleep 10;
 print "I'm still here";
}
 

The script just opens "t" and then loops. It never reads or writes anything, but it does have "t" open while running.

The file "t" already exists before running this and is large enough to notice its absence in "df". Here it is before the script runs and while it runs:.

$ ls -l  t;df | head -2
-rw-r--r--  1 apl  apl  20480 Nov 18 08:50 t
Filesystem    512-blocks     Used Available Capacity  Mounted on
/dev/disk0s2   155629664 99862360  55255304    65%    /

# start the script in another window
$ ls -l t; df | head -2
-rw-r--r--  1 apl  apl  20480 Nov 18 10:11 t
Filesystem    512-blocks     Used Available Capacity  Mounted on
/dev/disk0s2   155629664 99862360  55255304    65%    /

If we now remove "t", nothing will change:

$ rm t   
$ ls -l t; df | head -2
ls: t: No such file or directory
Filesystem    512-blocks     Used Available Capacity  Mounted 
/dev/disk0s2   155629664 99862360  55255304    65%    /
 

When we interrupt the script, disk space is reclaimed:

$ ls -l t; df | head -2
ls: t: No such file or directory
Filesystem    512-blocks     Used Available Capacity  Mounted on
/dev/disk0s2   155629664 99862320  55255344    65%    /
 

However, if we do the same thing with ">", diskspace will be reclaimed instantly:

$ ls -l t;df | head -2
-rw-r--r--  1 apl  apl  20480 Nov 18 09:20 t
Filesystem    512-blocks      Used  Available Capacity  Mounted on
/dev/disk0s2   155629664 100006784  55110880    65%    /
$> t
$ ls -l t;df | head -2
-rw-r--r--  1 apl  apl  0 Nov 18 09:21 t
Filesystem    512-blocks      Used  Available Capacity  Mounted on
/dev/disk0s2   155629664 100006744  55110920    65%    /
 

I demonstrated similar code at the forum and quickly got back this comment:


The > trick will simply remove the data in the file and have no need for the os to clear up unused data so this might work unless the process remembers where in the file it is appending too and always does a seek.

Let's see if that's true. We'll need a different script:

#!/usr/bin/perl
open(I,">./t");

while (1) {
 print I "x" x 4096;
 print "Block written\n";
 sleep 10;
}
 

This time, the script writes 4096 bytes (4096 "x"'s) on every loop. I'm not going to bother to show the listings and df's; the behavior is exactly the same: the bytes are freed as soon as you do "> t".

But our doubting poster mentioned "seek". For those who do not know, a seek moves the writing or reading position of the file to a specific place. We can do that with Perl:

#!/usr/bin/perl
open(I,">./t");
$x=0;

while (1) {
 $mypos=$x * 4096;
 seek I, 0, $mypos;
 print I "x" x 4096;
 print "Block $x written\n";
 $x++;

 sleep 10;
 }

This doesn't do anything different than the previous script, it just does it another way. Instead of just writing bytes, it specifically positions itself before writing. If nothing else is happening with "t", no different outcome is expected.

What happens when we do "> t" while that puppy is running?

$ ls -l t;df | head -2
-rw-r--r--  1 apl  apl  20480 Nov 18 09:20 t
Filesystem    512-blocks      Used  Available Capacity  Mounted on
/dev/disk0s2   155629664 100006784  55110880    65%    /
$ > t
$ ls -l t;df | head -2
-rw-r--r--  1 apl  apl  0 Nov 18 08:36 t
Filesystem    512-blocks      Used Available Capacity  Mounted on
/dev/disk0s2   155629664 100006176  55111488    65%    /
 

Instant reclaim. But on the next write from the script, the file's size is right back up:

$ ls -l t;df | head -2
-rw-r--r--  1 apl  apl  20480 Nov 18 08:36 t
Filesystem    512-blocks      Used Available Capacity  Mounted on
/dev/disk0s2   155629664 100006216  55111448    65%    /
 

But - and this is the important part - notice that the available disk space did NOT change!

What happens here is that the file goes to zero and available space increases, but then when the writer writes again, it's back to a large size instantly. That's because of the "seek" - the bytes were written at a specific position. But the available space is NOT back to what it was, and "od" shows why:

$ od -c t
0000000   \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
0040000    x   x   x   x   x   x   x   x   x   x   x   x   x   x   x   x
*
 

If you had looked at "t" before the ">", it would have looked like this:

$ od -c t
0000000    x   x   x   x   x   x   x   x   x   x   x   x   x   x   x   x
*
 

"od" shows repeated bytes by an '*" - since we are writing nothing but "x", there's no need to show more. After the ">", "od" shows "0" in all of the bytes up to the subsequent write by the script. Those nul bytes aren't really there - this is a "sparse" file. It was created by the absolute seeks of the perl script after "> t" had emptied the file.

If you didn't understand this, I encourage you to play with these scripts on your own system. To avoid confusion, the system should be relatively "quiet" while you do this - I couldn't control that absolulutely here so some figures shown by "df" are different than you might expect - that's because I had a few other things going on while doing this. You still should be able to see that ">" really does reclaim space.

The lesson is this: if another process has a file open, use ">" to reclaim the disk space. If the process is doing absolute seeks, you may not be able to tell with "ls -l" that the space has been reclaimed, but it will be.

Overwriting binaries

By the way: Generally speaking, Unix systems won't let you overwrite a binary that is in memory- that is enforced because of paging: not all of the binary is necessarily in memory and other parts may need to be brought in from disk- to avoid a spectacular crash, you aren't allowed to do that. However, the locking out is by inode number (names are really meaningless to the kernel) so an update that knows enough to move the older binary to another name (or just remove it outright) can slip in a replacement.

Try this as an experiment:

cp /bin/vi /tmp/t
/tmp/t /tmp/tt
 

Now switch to another window and try:

cp /bin/vi /tmp/t
 

You won't be able to.

But rm /tmp/t works, and notice that if you switch back, nothing bad has happened to your editor- it will still run, because it's data blocks will not be freed until you quit.



Got something to add? Send me email.





(OLDER) <- More Stuff -> (NEWER)    (NEWEST)   

Printer Friendly Version

-> -> Freeing disk space with >

20 comments



Increase ad revenue 50-250% with Ezoic


More Articles by

Find me on Google+

© Anthony Lawrence







Fri Nov 20 07:09:31 2009: 7571   anonymous

gravatar
boah, what an overly complicated approach this is when it is done with Perl. When you do this directly in a shell, a 'cat /dev/null 2>&1 > file' is enough to cut down a file to zero, while no filehandles are lost. This could in practice be used to reclaim the diskspace a large file (for instance a logfile, that is permanently opened by a daemonized process) is blocking, without having to restart the daemon.

Happy diskspace reclaiming



Fri Nov 20 08:10:17 2009: 7572   dutchkind

gravatar
Interesting post! I don't think the anonymous commenter read well, the perl script was, I think, only to show the > t was working.



Fri Nov 20 11:03:39 2009: 7573   TonyLawrence

gravatar
what an overly complicated approach

Another example of someone scanning a page at breakneck speed with no comprehension :-)







Fri Nov 20 13:11:10 2009: 7575   anonymous

gravatar
To be fair to the other poster, this article is long-winded, needlessly complicated and doesn't really answer the question it poses at the start. I can easily imagine a beginner being baffled by this exposition.

">" doesn't "free disk space". It truncates files (in this instance). When you remove data from a file, the space it used is reclaimed, even if a program is reading/writing to the file. When you remove the file itself, it's not reclaimed until no programs are reading/writing. That's the difference, it's not hard and doesn't need perl scripts or hex dumps to demonstrate.

If you'd used the actual "truncate" command it would also have been much clearer, and then you could have pointed out at the end that an empty redirect effectively performs a truncation.



Fri Nov 20 13:28:22 2009: 7576   TonyLawrence

gravatar
I'm not aware of any "truncate" command. There is a "truncate" library call - is that what you mean?

If so, are you suggesting that it would be better to write a C program for this explanation than to use ">" ????

Whether YOU think
"it's not hard and doesn't need perl scripts or hex dumps to demonstrate. " isn't the point. *I* had people arguing with me, insisting that truncating the file with ">" wouldn't help free disk space when the file was open for writing by another process.

Now - perhaps you are annoyed because you were sent here from the Deveoper section of Linux Today? I agree - this article never should have been put there. That's not my fault - I don't submit posts to LT. This article was written for people who DON'T understand what happens when a file is truncated.

If you take a look to at the top left, you'll notice this is tagged as "Basics". That means it's an introductory article written for relative newbies who DO need examples, hex dumps, and long winded explanations. Again - not my fault LT put a link to this in a more advanced section; I cannot control that.

Sheesh :-)










Fri Nov 20 13:54:36 2009: 7578   TonyLawrence

gravatar
I almost wish other sites like that wouldn't pick up these posts.

Yeah, we pick up a few regulars now and then, but mostly it's people who scan quickly looking for something to bitch about. Usually Linux Today readers aren't like that, but when something gets mis-categorized like this (it was linked to from the Developers section), I suppose it's natural for people to get upset.













Fri Nov 20 14:06:16 2009: 7579   anonymous

gravatar
"I'm not aware of any "truncate" command. There is a "truncate" library call - is that what you mean?"

No, I mean the truncate command. It's part of coreutils. I wouldn't comment on Perl scripts and then suggest writing in C....

As for being tagged with basics, that's not the point. I'm commenting on the comprehensibility of the material. It wasn't until my post that you even said that > "truncates" - you didn't explain what it was doing _at all_.

Worse, in your Perl script, you moved from opening "./t" to opening ">./t". On an article about > the shell redirector, and talking about running '> t', that's _really_ confusing.

I'm sorry you're not too happy receiving critiques of your posting, but if people are coming here from other sites I think they deserve a clearer explanation of what you're trying to get across. If you don't want comments don't have a comment box, and if you don't want other people linking to your articles then don't publish them. Otherwise, why not just try to learn from the experience? You could have written this in 1/3rd the space and made it 3 times clearer.



Fri Nov 20 14:30:36 2009: 7580   TonyLawrence

gravatar
For one, you are being Linux centric. This article references Mac OS X and other Unix systems.

"truncate" is relatively new to coreutils - added in the last year or so. For example, I find no "truncate" on my older Ubuntu. Apt-get says it knows of no such thing and that coreutils is up to date. Obviously that isn't true, but at this point I don't think it's reasonable to depend on that even if we were just talking about Linux.

As to the rest, well, I disagree. The ">" in the Perl scripts have nothing to do with anything. If you find that confusing, well, I'm sorry. But you are entitled to your opinion.

I don't mind critiques. I mind stupidity like the first comment. Your opinion that this is confusing and too long is fine. Again, I don't agree, but I agree that you can feel that way.








Fri Nov 20 14:37:29 2009: 7581   TonyLawrence

gravatar
I almost forgot:

If you feel you can do better (and I do NOT mean that in the sarcastic way it is often used), I wish you would. I'd be extremely happy to link to your article or to publish it here (
(link) ).

The intent of this site has always been to HELP people. I believe that there are always different ways to explain something and that one person will find one way more helpful than another. So, if you or anyone else wants to improve on this, please do.



Fri Nov 20 15:24:10 2009: 7582   anonymous

gravatar
"If you feel you can do better (and I do NOT mean that in the sarcastic way it is often used), I wish you would."

I already did, that's why I commented here rather than writing an article on another site.

I don't imagine many beginners will follow your explanation, shell scripts or not, and even if they agree it works as you say I don't think they would understand why solely from this article. That's why I'm offering comments.

I'll also offer a corrected Perl script, just in case people do actually want to try this.

#!/usr/bin/perl
open(I,">./t");
$x=0;

while (1) {
$mypos=$x * 4096;
seek I, $mypos, 0;
print I "x" x 4096;
print "Block $x written\n";
$x++;

sleep 10;
}

You're welcome.






Fri Nov 20 15:38:07 2009: 7583   TonyLawrence

gravatar
I'm just going to smile politely and say sure, whatever you say. Thanks for visiting and have a lovely day. It's Friday, it's raining here but balmy warm and I'm sure we both have better things to do.







Fri Nov 20 16:35:50 2009: 7584   BigDumbDInosaur

gravatar
cat /dev/null 2>&1 > file

Did I just read a complaint about the article content being "too complicated?"

Also, who's that Tiony guy that's passing himself off as A.P. Lawrence? <Grin>



Fri Nov 20 16:41:03 2009: 7585   TonyLawrence

gravatar
I saw that Tiony guy earlier today in the bathroom when I was shaving. He looks dangerous.



Fri Nov 20 16:52:59 2009: 7586   TonyLawrence

gravatar
But seriously:

The guy complaining thinks I should have used "truncate". I disagree because that's relatively new and is Linux centric. But he's right that I should have fully explained that ">" truncates a file. I didn't think about that while writing this because of the context - I was writing for people who knew what ">" does but didn't think it would free up disk space if the file were in use.

I'm going to add a paragraph up there to correct that oversight. I already removed the gratuitous "}" that snuck itself into one Perl script earlier (which that same reader also noticed).

As to being overly long and using unnecessary examples, I make no apology. The people at that forum needed specific examples.







Fri Nov 20 20:19:32 2009: 7587   anonymous

gravatar
I really doubt that either bash or csh uses ftruncate. Instead, they are probably using the O_TRUNC flag to open, which is part of POSIX and thus quite portable. From man 2 open:

O_TRUNC: If the file already exists and is a regular file and the open mode allows writing (i.e., is O_RDWR or O_WRONLY) it will be truncated to length 0. If the file is a FIFO or terminal device file, the O_TRUNC flag is ignored. Otherwise the effect of O_TRUNC is unspecified.






Fri Nov 20 20:23:14 2009: 7588   TonyLawrence

gravatar
OK.

And your point is what?



Sat Nov 21 17:32:43 2009: 7597   anonymous

gravatar
Some files should not be truncated though, but just removed, as they may still be in use (and I mean something reading from it, like a kernel mount on a loopback-mounted ISO file).



Sat Nov 21 17:46:54 2009: 7598   TonyLawrence

gravatar
Some files should not be truncated though

Yes, good point - if mounted, you definitely would not want to truncate without unmounting.



Tue Dec 1 16:29:52 2009: 7704   maria

gravatar
Thanks for sharing this

------------------------
Kerio Samepage


Have you tried Searching this site?

Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more.

Contact us