Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

And there's no clever trick you can do using eg hash functions or compression to shortcut the process? Of course the proofs that hash functions have collisions and there is no universal compression algorithm use the pigeonhole principle..


The shortcuts all revolve around the cases when the strings are different.

If we just do a zipper compare and the first pair of bits happens to differ, we are done, having looked at only two bits.

Similarly, we compute hash functions that sample only a small number of bits out of each string, and the hash codes don't match, we are done.

The worst case occurs when the strings turn out to be identical. We cannot make the correct verdict that the strings are identical without examining every bit; any bit we don't look at could be the same or different, and so we cannot announce a decision.


I wrote as my comment kind of as a rhetorical question, but perhaps in retrospect the connection between the string comparison problem the pigeonhole principle is not so surprising in the end.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: