Monday, April 14, 2014

Sorting disk usage, Perl to the rescue.

My MacBook Air was close to full up after a lot of music editting (see results at http://www.stmaryssingers.com/recordings.html ). In the past I would run

du -s * | sort -rn

to identify the biggest users of disk space giving for example:
14104880        iTunes
683304  Noël Français
682592  sibs
221488  DancingDay
93664   MissaAlmePater
27480   ChristmasLullaby
17680   JesuJoy
13656   MissaBenedicamus
12256   MassInHonorOfSaintJoseph
8760    pdfs
3584    ChrissyCarols
2456    LookingAtTheStars
1352    BriggsMass
408     Bach-Jesu
264     40_The_First_Nowell.sib
64      Thou-knowest-Lord-Z-58b.pdf
24      StMS20140412
24      StMS20140322
24      StMS20140309
24      StMS20140208
24      StMS20131214
0       GarageBand


and the numbers are in 512-byte blocks. Nowadays that's a lot of digits to decipher in the listing. So I started using the 'h' ('human readable') option:

du -sh * | sort -rn

giving:
676K    BriggsMass
334M    Noël Français
333M    sibs
204K    Bach-Jesu
132K    40_The_First_Nowell.sib
108M    DancingDay
 46M    MissaAlmePater
 32K    Thou-knowest-Lord-Z-58b.pdf
 13M    ChristmasLullaby
 12K    StMS20140412
 12K    StMS20140322
 12K    StMS20140309
 12K    StMS20140208
 12K    StMS20131214
8.6M    JesuJoy
6.7M    MissaBenedicamus
6.7G    iTunes
6.0M    MassInHonorOfSaintJoseph
4.3M    pdfs
1.8M    ChrissyCarols
1.2M    LookingAtTheStars

  0B    GarageBand
Unfortunately sort doesn't know how to sort the unit suffixes. But Perl does. It's a while since I used the Schwartzian Transform but it seems perfect for the task. I copied the Wiki code into a file, dusort.pl, which I placed in a directory in my PATH variable (~/bin in this case) and modified the regex extraction to make it sort by unit suffix first and then by number giving this:

#!/usr/bin/env perl 
use 5.010;

my $size = {P => 6, T => 5, G => 4, M => 3, K => 2, B => 1};
print
  map { $_->[0] }
  sort {
  $size->{$b->[2]} <=> $size->{$a->[2]}
                  ||
        $b->[1] <=> $a->[1]
  }
  map { [$_, /^([ \.0-9]{3,4})([PTGMKB])\t/] }
  <>;

and now when I run the command(s):
du -sh * | dusort.pl
I get the result:
6.7G    iTunes
334M    Noël Français
333M    sibs
108M    DancingDay
 46M    MissaAlmePater
 13M    ChristmasLullaby
8.6M    JesuJoy
6.7M    MissaBenedicamus
6.0M    MassInHonorOfSaintJoseph
4.3M    pdfs
1.8M    ChrissyCarols
1.2M    LookingAtTheStars
676K    BriggsMass
204K    Bach-Jesu
132K    40_The_First_Nowell.sib
 32K    Thou-knowest-Lord-Z-58b.pdf
 12K    StMS20131214
 12K    StMS20140208
 12K    StMS20140309
 12K    StMS20140322
 12K    StMS20140412
  0B    GarageBand
Obviously the next thing to do is to make a shell alias:
alias dus='du -sh * | dusort.pl'
and now I even save a few keystrokes in my task to pinpoint the Biggest (L)User.



No comments: