l i n u x - u s e r s - g r o u p - o f - d a v i s
Next Meeting:
July 7: Social gathering
Next Installfest:
Latest News:
Jun. 14: June LUGOD meeting cancelled
Page last updated:
2002 Aug 08 16:31

The following is an archive of a post made to our 'vox-tech mailing list' by one of its subscribers.

Report this post as spam:

(Enter your email address)
Re: [vox-tech] shell script challenge - Now MD5sum erratia
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [vox-tech] shell script challenge - Now MD5sum erratia

Charles Polisher writes:
 > Micah Cowan wrote:
 > > GNU Linux writes:
 > >  > Found a very interesting page on md5sum. It's:
 > >  > 
 > >  > http://hills.ccsf.org/~jharri01/project.html
 > >  > 
 > >  > "So why does MD5 seem so secure? Because 128 bits allows you to have
 > >  > 2128=340,282,366,920,938,463,463,374,607,431,768,211,456 different
 > >  > possible MD5 codes"
 > >  > 
 > >  > Lots of good reading for insomniacs.
 > > 
 > > It still shouldn't be relied upon, however, that two identical MD5
 > > checksums are sufficient evidence that the corresponding files are
 > > identical; I've heard more than one person claim to have encountered
 > > identical MD5 sums for different files, and its certainly not
 > > impossible, just improbable.
 >   I'm dubious ;^)
 >   <H. Lector voice>
 >     The voices tell you they've seen MD5's collide; do they
 >     tell you other things, Micah? 
 >   </H. Lector voice>

Bruce Schneier, in "Applied Cryptography", points out that collisions
have been intentionally produced with success, by denBoer and
Bosselaers, though it is said that the attack has no practical
impact on security uses.

 > > But it's a heluva lot better than running diff from one file to every
 > > other file - a factorial-time operation! :)
 >   And now, a quibble:
 >   Actually, a comparison of the entire file would be
 >   no different than a comparison of the md5sum. From a
 >   Big O standpoint, it's just a constant factor. If 
 >   comparing md5's isn't factorial, neither should a full
 >   diff. If there were a ton of files and the lengths were
 >   spread out, the comparisons could be further reduced 
 >   by sorting the list by file size, then comparing only
 >   among groups of the same size.

You're right - I'd been assuming MD5s to be sorted, and
the diffs not to be.

In that case, both operations have equivalent complexity in both their
best and worst cases:

In the best case, where the sort key (file length or MD5) is unique
for identical files, the complexity is equal to the complexity of the
sort algorithm, hopefully O(N log N).

In the worst case, where the same sort key is frequently found for
different files, the complexity will be quadratic due to the many

However, collisions of MD5s are much less likely than collisions in
file lengths, so the diffs-comparison is much more likely to approach
its worst case, and the MD5s-comparison its best. So, practically
speaking, they are likely to be close to O(Nē) and O(N log N),
respectively, due to the fact that "N" in the quadratic
term for the MD5s-comparison should usually be 1, rendering it
equivalent to linear.

Thanks, all, for pointing out my major brain-farts. I've only just had
my coffee :)


vox-tech mailing list

LUGOD Group on LinkedIn
Sign up for LUGOD event announcements
Your email address:
LUGOD Group on Facebook
'Like' LUGOD on Facebook:

Hosting provided by:
Sunset Systems
Sunset Systems offers preconfigured Linux systems, remote system administration and custom software development.

LUGOD: Linux Users' Group of Davis
PO Box 2082, Davis, CA 95617
Contact Us

LUGOD is a 501(c)7 non-profit organization
based in Davis, California
and serving the Sacramento area.
"Linux" is a trademark of Linus Torvalds.

Sponsored in part by:
O'Reilly and Associates
For numerous book donations.