Subject: bin/25857: du(1) is O(n^2) in number of multiply linked files
To: None <gnats-bugs@gnats.netbsd.org>
From: Darrin B. Jewell <dbj@netbsd.org>
List: netbsd-bugs
Date: 06/07/2004 09:32:11
>Number: 25857
>Category: bin
>Synopsis: du(1) is O(n^2) in number of multiply linked files
>Confidential: no
>Severity: non-critical
>Priority: medium
>Responsible: bin-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Jun 07 13:37:00 UTC 2004
>Closed-Date:
>Last-Modified:
>Originator: Darrin B. Jewell
>Release: -current via cvs ~20040530T0549Z
>Organization:
>Environment:
>Description:
du(1) is O(n^2) in number of multiply linked files
The linkchk() routine in du.c is too simplistic.
Code inspection shows that it uses a linear algorithm for
incrementally realloc'ing its array:
if (nfiles == maxfiles && (files = realloc((char *)files,
(u_int)(sizeof(ID) * (maxfiles += 128)))) == NULL)
err(1, "realloc");
Furthermore, it uses a linear search algorithm to find
if it has previously seen an inode:
for (fp = start + nfiles - 1; fp >= start; --fp)
if (ino == fp->inode && dev == fp->dev)
return (1);
>How-To-Repeat:
I have a 100gb filesystem using 4505606 inodes, almost all of which
have nlink > 1, due to a hard link snapshot scheme for backups.
I started a du -akx on this filesystem and observed that it was
still running 8 hours later, using 100% of cpu and barely accessing
the disk.
>Fix:
I suggest using open address hashing for inode lookups and an
exponential resize algorithm. This should make it take time O(n*logn)
in number of multiply linked inodes, while still using linearly
proportional memory. It should also make inode lookup constant time
for previously seen inodes.
Unfortunately, I don't have time to code that up today, so I'm filing
this bug report instead. I hope to get to it soon.
>Release-Note:
>Audit-Trail:
>Unformatted: