linux - Removing files takes too long

Short version: rm -rf mydir, with mydir (recursively) containing 2.5
million files, takes about 12 hours on a mostly idle
machine.

More
information: Most of the files being deleted are hard links to files in
other directories (the directory being deleted is actually the oldest backup made by
rsnapshot; the rm command is actually
given by rsnapshot). So it's mostly directory entries being
deleted - the file content itself isn't much; it's in the order of some tens of
GB.

I'm far from certain that
btrfs is the culprit. I recall backup was also very slow before
I started to use btrfs, but I'm not certain that the slowness
was in the deletion.

The machine is
an Intel Core i5 2.67 GHz with 4 GB RAM. It has two SATA disks: one has the OS and some
other stuff, and the backup disk is a 1 TB WDC WD1002FAEX-00Z3A0. The motherboard is an Asus
P7P55D.

Edit: The
machine is a Debian wheezy with Linux 3.16.3-2~bpo70+1. This is
how the filesystem is
mounted:

root@thames:~# mount|grep
            rsnapshot
/dev/sdb1 on /var/backups/rsnapshot type btrfs
            (rw,relatime,compress=zlib,space_cache)

Edit:
Using rsync -a --delete /some/empty/dir mydir takes about 6
hours. A significant improvement over rm -rf, but still too
much I think. (Explanation of
why rsync is faster than rm:
"[M]ost filesystems store their directory structures in a btree format, the order [in]
which you delete files is ... important. One needs to avoid rebalancing the btree when
you perform the unlink.... rsync -a --delete ... does deletions
in-order")

Edit:
I attached another disk which had 2.2 million files (recursively) in a directory, but on
XFS. Here are some comparative
results:

 On the XFS disk On the
            BTRFS disk
Cached reads[1] 10 GB/s 10 GB/s
Buffered reads[1] 80 MB/s
            115 MB/s
Walk tree[2] 11 minutes 43 minutes
rm -rf mydir[3] 7
            minutes 12 hours

[1]
With hdparm -T /dev/sdX and hdparm -t /dev/sdX.
[2] Time taken to run find mydir -print|wc -l immediately after boot.
[3] On the XFS disk, this was soon
after walking the tree with find. On the BTRFS disk it is the
old measurement (and I don't think it was with the tree
cached).

It appears to be a problem
with btrfs.

Blog

Search This Blog

linux - Removing files takes too long

Comments

Post a Comment

Popular posts from this blog

iLO 3 Firmware Update (HP Proliant DL380 G7)

linux - Awstats - outputting stats for merged Access_logs only producing stats for one server's log

hp proliant - Smart Array P822 with HBA Mode?