l i n u x - u s e r s - g r o u p - o f - d a v i s
L U G O D
 
Next Meeting:
October 20: Web Application Hacking: How to Make and Break Security on the Web
Next Installfest:
TBD
Latest News:
Oct. 10: LUGOD Installfests coming again soon
Page last updated:
2002 Aug 08 23:24

The following is an archive of a post made to our 'vox-tech mailing list' by one of its subscribers.

Report this post as spam:

(Enter your email address)
[vox-tech] shell script challenge
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[vox-tech] shell script challenge



I'm using cygwin and I was given the request by my boss to remove all
duplicate files from the server
the server is on the x: drive of the windows machine which means that
cygwin saw it as /cygwin/x
I forget exactly what command I ran toget checksums.txt
but it is in the format

<checksum> *x:<filename>

The challenge is to find the duplicate checksums and print the file name
of those checksums.  This is tricky because the directories contain spaces
which gawk, sed, etc ... see as fields.  Even if I change the IFS to * and
then use gawk to print the *x:<fname> <checksum> -- sort wouldn't know how
to deal with it which would make uniq useless (I think).  if I do it the
other way, <checksum> *x:<filename> sort will work fine but uniq will fail
because the filename is there.  if I exclude the filename with a gawk ' {
print $1 } ' then sort and uniq will work fine but I won't have a
filename.  So all the combinations I can think of fail.  Does anyone know
how I can find only the duplicate checksums and the file names associated.

**I realize that with a lbut the problem is that there are 4,575 duplicate
checksums using:
cat checksums.txt | awk ' { print $1 } ' | sort -uniq -d | wc -l
and 46340 files on the server, which seems like it would take an awful
long time.  any suggestions?

Sincerely,
	Christopher J. McKenzie

	cjm@ucdavis.edu
	mckenzie@cs.ucdavis.edu
	H: (818) 991-7724
	C: (818) 429-3772
	1815 Mesa Ridge Ave
	Westlake Village, CA 91362

_______________________________________________
vox-tech mailing list
vox-tech@lists.lugod.org
http://lists.lugod.org/mailman/listinfo/vox-tech



LinkedIn
LUGOD Group on LinkedIn
Sign up for LUGOD event announcements
Your email address:
facebook
LUGOD Group on Facebook
'Like' LUGOD on Facebook:

Hosting provided by:
Sunset Systems
Sunset Systems offers preconfigured Linux systems, remote system administration and custom software development.

LUGOD: Linux Users' Group of Davis
PO Box 2082, Davis, CA 95617
Contact Us

LUGOD is a 501(c)7 non-profit organization
based in Davis, California
and serving the Sacramento area.
"Linux" is a trademark of Linus Torvalds.

Sponsored in part by:
Appahost Applications
For a significant contribution towards our projector, and a generous donation to allow us to continue meeting at the Davis Library.