l i n u x - u s e r s - g r o u p - o f - d a v i s
Next Meeting:
July 7: Social gathering
Next Installfest:
Latest News:
Jun. 14: June LUGOD meeting cancelled
Page last updated:
2002 Aug 08 23:24

The following is an archive of a post made to our 'vox-tech mailing list' by one of its subscribers.

Report this post as spam:

(Enter your email address)
[vox-tech] shell script challenge
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[vox-tech] shell script challenge

I'm using cygwin and I was given the request by my boss to remove all
duplicate files from the server
the server is on the x: drive of the windows machine which means that
cygwin saw it as /cygwin/x
I forget exactly what command I ran toget checksums.txt
but it is in the format

<checksum> *x:<filename>

The challenge is to find the duplicate checksums and print the file name
of those checksums.  This is tricky because the directories contain spaces
which gawk, sed, etc ... see as fields.  Even if I change the IFS to * and
then use gawk to print the *x:<fname> <checksum> -- sort wouldn't know how
to deal with it which would make uniq useless (I think).  if I do it the
other way, <checksum> *x:<filename> sort will work fine but uniq will fail
because the filename is there.  if I exclude the filename with a gawk ' {
print $1 } ' then sort and uniq will work fine but I won't have a
filename.  So all the combinations I can think of fail.  Does anyone know
how I can find only the duplicate checksums and the file names associated.

**I realize that with a lbut the problem is that there are 4,575 duplicate
checksums using:
cat checksums.txt | awk ' { print $1 } ' | sort -uniq -d | wc -l
and 46340 files on the server, which seems like it would take an awful
long time.  any suggestions?

	Christopher J. McKenzie

	H: (818) 991-7724
	C: (818) 429-3772
	1815 Mesa Ridge Ave
	Westlake Village, CA 91362

vox-tech mailing list

LUGOD Group on LinkedIn
Sign up for LUGOD event announcements
Your email address:
LUGOD Group on Facebook
'Like' LUGOD on Facebook:

Hosting provided by:
Sunset Systems
Sunset Systems offers preconfigured Linux systems, remote system administration and custom software development.

LUGOD: Linux Users' Group of Davis
PO Box 2082, Davis, CA 95617
Contact Us

LUGOD is a 501(c)7 non-profit organization
based in Davis, California
and serving the Sacramento area.
"Linux" is a trademark of Linus Torvalds.

Sponsored in part by:
O'Reilly and Associates
For numerous book donations.