l i n u x - u s e r s - g r o u p - o f - d a v i s
Next Meeting:
July 7: Social gathering
Next Installfest:
Latest News:
Jun. 14: June LUGOD meeting cancelled
Page last updated:
2002 Aug 07 20:29

The following is an archive of a post made to our 'vox-tech mailing list' by one of its subscribers.

Report this post as spam:

(Enter your email address)
Re: [vox-tech] shell script challenge
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [vox-tech] shell script challenge

On Wed, Aug 07, 2002 at 12:20:14PM -0700, Chris McKenzie wrote:
> I'm using cygwin and I was given the request by my boss to remove all
> duplicate files from the server
> the server is on the x: drive of the windows machine which means that
> cygwin saw it as /cygwin/x
> I forget exactly what command I ran toget checksums.txt
> but it is in the format
> <checksum> *x:<filename>
> The challenge is to find the duplicate checksums and print the file name
> of those checksums.  This is tricky because the directories contain spaces

md5 is a better way to go than checksums. Checksums are usually 16bit or
32bit. An md5 sum is 128bits. Although it will take longer to process,
md5's will give you greater accuracy.

More importantly, the program "md5sum" will output results as you have


gnulinux@deb:~$ md5sum funny.txt
ad739a7ca6402db9e8f73a544602137d  funny.txt

I am certainly no programmer, so I can't give you a shell program, but
here's a one-liner that will check directory "/foo" and output duplicate

find /cygdrive/c/foo -type f -exec md5sum '{}' ';' | sort | uniq -D -w
33| cut -c 34-

NOTE: This probably will _not_ work for directories and file names that
use strange characters (ie. single quotes, double quoes, carriage
returns, etc). 

There are MANY ways to find duplicate files and this is just one method
that will work under linux and under Cygwin windoze. I tried it with
both and like most things it works better with Linux.

I suggest checking out "man md5sum", and in particular the "-c" option

vox-tech mailing list

LUGOD Group on LinkedIn
Sign up for LUGOD event announcements
Your email address:
LUGOD Group on Facebook
'Like' LUGOD on Facebook:

Hosting provided by:
Sunset Systems
Sunset Systems offers preconfigured Linux systems, remote system administration and custom software development.

LUGOD: Linux Users' Group of Davis
PO Box 2082, Davis, CA 95617
Contact Us

LUGOD is a 501(c)7 non-profit organization
based in Davis, California
and serving the Sacramento area.
"Linux" is a trademark of Linus Torvalds.

Sponsored in part by:
Appahost Applications
For a significant contribution towards our projector, and a generous donation to allow us to continue meeting at the Davis Library.