HostWeb Forums » Microsoft Server Operating Systems » microsoft.public.win2000.file_system » Avoiding duplicate files using ln

Topic: Avoiding duplicate files using ln

Reply | New Topic | This is SPAM | This is Offensive

Submitted: 5/3/2008 12:37:08 AM

By: brigman
A bit of a wish list, I know, but does anyone know of a tool, that can
anlyse an NTFS filesystem and create an instance of any duplicate file
in a sperate folder to link the other instances to using ln.

For example, consider the directory listings:

D:\notebook01\c_drive_backup\file01.doc
D:\notebook01\c_drive_backup\file02.doc
D:\notebook01\c_drive_backup\file13.doc
D:\notebook01\c_drive_backup\file16.doc

and:

D:\notebook02\c_drive_backup\file03.doc
D:\notebook02\c_drive_backup\file04.doc
D:\notebook02\c_drive_backup\file13.doc
D:\notebook02\c_drive_backup\file19.doc

If it were found that the file common to both lists (file13.doc) had
the same size and md5 hash we could create a folder with that hash
name, copy the file into it and link the originals to it with ln for
single instance storage. The other issue is to mop up 'orphans' that
are no longer referenced elsewhere in the filesystem.

Obviously there exists filesystems like Sun Microsystems ZFS which can
take care of this stuff for you on the block level, and indeed, you
can run NTFS over the top of ZFS, but the point here is to do it
cheap.

Any advice appreciated.

Denys Williams

Replies below ↓
Contents
Home
Forums
About Us
Contact Us
Web Hosting:
Hosting Providers
How to choose a name
What is a Hosting Provider
Hosting Types
Choosing the right plan
 
Search
 
Login to HostWeb.com
Email
Password
If you do not have an account with us yet, join now - it's FREE!