HostWeb Forums » Microsoft Server Operating Systems » microsoft.public.win2000.file_system » Avoiding duplicate files using ln
Topic: Avoiding duplicate files using ln
A bit of a wish list, I know, but does anyone know of a tool, that can
anlyse an NTFS filesystem and create an instance of any duplicate file
in a sperate folder to link the other instances to using ln.
For example, consider the directory listings:
D:\notebook01\c_drive_backup\file01.doc
D:\notebook01\c_drive_backup\file02.doc
D:\notebook01\c_drive_backup\file13.doc
D:\notebook01\c_drive_backup\file16.doc
and:
D:\notebook02\c_drive_backup\file03.doc
D:\notebook02\c_drive_backup\file04.doc
D:\notebook02\c_drive_backup\file13.doc
D:\notebook02\c_drive_backup\file19.doc
If it were found that the file common to both lists (file13.doc) had
the same size and md5 hash we could create a folder with that hash
name, copy the file into it and link the originals to it with ln for
single instance storage. The other issue is to mop up 'orphans' that
are no longer referenced elsewhere in the filesystem.
Obviously there exists filesystems like Sun Microsystems ZFS which can
take care of this stuff for you on the block level, and indeed, you
can run NTFS over the top of ZFS, but the point here is to do it
cheap.
Any advice appreciated.
Denys Williams
Replies below ↓