Skip to main content

Linux: Fun with Big Files

Recently, I was playing with a 150G compressed XML file containing a Wikipedia dump. Trying to delete the file gave me a fun glimpse into some Linux behavior that I normally wouldn't notice.

I had a job running that was parsing the file. I hit Control-c to kill the job. Then I deleted the file. "rm" returned immediately. I thought to myself, wow, that was fast. Hmm, I'm guessing that all it had to do was unlink the file. I would have figured it would have taken longer to mark all the inodes as free.

I ran "df -h" to see if there was now 150G more free space on my drive. There was no new free space. Hmm, that's weird. I futzed around for a bit. I started cycling through my tabs in screen. I discovered that I had only killed the job that was tailing one of the files, not the actual job itself.

This reminded me that Linux uses reference counting for files. Even if you can't get to a file through the filesystem, a file might still exist because a program has an open file handle for it. That's how "tempfile.TemporaryFile" works.

I killed the job. I ran "df -h". I now saw a bunch of free space. For some reason, even though I hit Control-c, the job hasn't returned, and I haven't been given a new shell prompt. Hitting Control-c again doesn't help. In fact, I can't even hit Control-z to "kill -9 %1" the job. Normally, that always works. Hmm, that's weird.

I switched to another tab in screen. I ran "ps aux". I don't see the job. I switched back to my other tab. The shell is still frozen. Hmm, that's really weird.

I typed "df -h" over and over again. I can see free disk space slowly returning. After several minutes, I finally got a new shell prompt. I can now see 150G of new free disk space.

Here's what I think happened. When I hit Control-c, the program exited. The kernel removed the process from the process table. While doing this, it closed the open file handle to the 150G file. Next, it had to start freeing inodes. 150G is a lot of inodes to free. Hence, even though there was no entry in the process table (hence the program was not visible to "ps aux"), the process was still stuck in kernel mode freeing up inodes.

Linux is fun ;)

Comments

Anonymous said…
Given that this is reproducible, it might be fun to test your hypothesis with lsof.
jjinux said…
> Given that this is reproducible, it might be fun to test your hypothesis with lsof.

Interesting idea. I'm almost 100% certain that the program still had the file open even though you couldn't reference it via the filesystem. That just makes sense.

The one thing I'm not 100% certain of is whether the shell was frozen because the kernel was releasing inodes. My guess is that the file was deleted, and then the kernel started deleting files. Since the program was no longer visible via "ps aux", I'm guessing "lsof" would also not see the file.
Brandon L. Golm said…
just making this up here, but the shell was probably blocked on IO. Remember that the shell creates three pipes, forks, and dups those pipes over to 0,1,2, does a setgrp, then execs whatever process (order, completeness, and accuracy are approximate). But the whole time, the shell is reading from those pipes (and spewing it back at you).

So when the process is completely stuck in some kernel place that isn't supposed to take long, there's probably something funny that happens with select or whatever, so the shell's routines that normally don't block ... are blocked.

That's *my* guess.
Brandon L. Golm said…
and then jj had to get all smart and suggest the shell is waiting on waitpid(). Only he can tell us. :-)
jjinux said…
Hahaha. Nah, I'll just wait for Kelly Yancey to tell me. He'll reply with the actual code from FreeBSD's kernel that would explain the situation ;)
Kelly Yancey said…
Hahaha, sorry JJ, I'd love to help you but FreeBSD is lacking Linux's freeze-while-it-performs-IO feature.

But, as you say, a file remains open so long as any process retains a handle to it (even if there is no name associated with the file in the filesystem). This is a common problem with naive log-rotation scripts: they rename the log file without signaling to the logging process that it needs to reopen the log file. As such, the process continues to write to the "old" file and the "new" file remains 0 bytes in size. This is because the mv operation (and rm/unlink operation) only modify directory entries, they do not touch the inode. Incidentally, that is also why stat(2)'s mtime and atime fields don't reflect name changes to files...stat only returns information about the inode, not anything about the (possibly multiple) names referring to the file.

Since you asked, the relevant logic in the FreeBSD kernel starts with the vput() and vrele() routines in src/sys/kern/vfs_subr.c. These are two variants of the drop reference part of the kernel's file handle reference counting code. When the reference count drops to zero, these routines call vinactive() which, in turn, calls the filesystem-specific implementation of the vfs_inactive callback. For the default UFS filesystem, that callback is ufs_inactive() in src/sys/ufs/ffs/ufs_inode.c.

Anyway, I don't recall FreeBSD ever having long stalls while deleting files, but that really depends on the I/O scheduler. I'm not a Linux expert by any means, but perhaps you're experiencing this bug:
http://it.slashdot.org/article.pl?sid=09/01/15/049201

Enjoying the blog as always, JJ. Even if you do put me on the spot. :)

Kelly
jjinux said…
Haha, nice! ;)

Popular posts from this blog

Drawing Sierpinski's Triangle in Minecraft Using Python

In his keynote at PyCon, Eben Upton, the Executive Director of the Rasberry Pi Foundation, mentioned that not only has Minecraft been ported to the Rasberry Pi, but you can even control it with Python. Since four of my kids are avid Minecraft fans, I figured this might be a good time to teach them to program using Python. So I started yesterday with the goal of programming something cool for Minecraft and then showing it off at the San Francisco Python Meetup in the evening.

The first problem that I faced was that I didn't have a Rasberry Pi. You can't hack Minecraft by just installing the Minecraft client. Speaking of which, I didn't have the Minecraft client installed either ;) My kids always play it on their Nexus 7s. I found an open source Minecraft server called Bukkit that "provides the means to extend the popular Minecraft multiplayer server." Then I found a plugin called RaspberryJuice that implements a subset of the Minecraft Pi modding API for Bukkit s…

Apple: iPad and Emacs

Someone asked my boss's buddy Art Medlar if he was going to buy an iPad. He said, "I figure as soon as it runs Emacs, that will be the sign to buy." I think he was just trying to be funny, but his statement is actually fairly profound.

It's well known that submitting iPhone and iPad applications for sale on Apple's store is a huge pain--even if they're free and open source. Apple is acting as a gatekeeper for what is and isn't allowed on your device. I heard that Apple would never allow a scripting language to be installed on your iPad because it would allow end users to run code that they hadn't verified. (I don't have a reference for this, but if you do, please post it below.) Emacs is mostly written in Emacs Lisp. Per Apple's policy, I don't think it'll ever be possible to run Emacs on the iPad.

Emacs was written by Richard Stallman, and it practically defines the Free Software movement (in a manner of speaking at least). Stal…

Creating Windows 10 Boot Media for a Lenovo Thinkpad T410 Using Only a Mac and a Linux Machine

TL;DR: Giovanni and I struggled trying to get Windows 10 installed on the Lenovo Thinkpad T410. We struggled a lot trying to create the installation media because we only had a Mac and a Linux machine to work with. Everytime we tried to boot the USB thumb drive, it just showed us a blinking cursor. At the end, we finally realized that Windows 10 wasn't supported on this laptop :-/I've heard that it took Thomas Edison 100 tries to figure out the right material to use as a lightbulb filament. Well, I'm no Thomas Edison, but I thought it might be noteworthy to document our attempts at getting it to boot off a USB thumb drive:Download the ISO. Attempt 1: Use Etcher. Etcher says it doesn't work for Windows. Attempt 2: Use Boot Camp Assistant. It doesn't have that feature anymore. Attempt 3: Use Disk Utility on a Mac. Erase a USB thumb drive: Format: ExFAT Scheme: GUID Partition Map Mount the ISO. Copy everything from the I…