Archived posting to the Leica Users Group, 2006/04/02
[Author Prev] [Author Next] [Thread Prev] [Thread Next] [Author Index] [Topic Index] [Home] [Search]Larry, Firstly an observation... There isn't an algorithm that will capture **all** that the eye/brain recognises as being "similar", not least because similar is so vague a term. There are some damn smart ones, though! The most accurate, if time consuming way, is a manual search. .... The following may help you. I am going to assume that all your pics are B&W, if they are colour there are some neat variants of this which may improve your hit rate. This algorithm creates a representation of each image and in a simple way of the amount of information in it and use the combination to select. lrzeitlin@optonline.net wrote: ... > I'm sure you are right but I'd hoped there was an easier way. There are > over 6,000 individual photographs ... > Scan them all in! > 2. Divide each image into about 2500 cells, say a 50 x 50 matrix. I would make the cells smaller, say 10 x 10, but the call is up to you. You are going to want to preserve as much information as possible. Only you now how much detail there is in the pictures. If in doubt make the cells smaller rather than larger. > This might be easier using only a 2 bit gray scale. Sum the number of white cells, sum the number of black cells. Each block is now represented by two numbers. Working left to right and then down across the image create a continuous string of values representing your blocks by concatenating the value obtained from each block to the previous one. Each cell is then represented by a pair of values (black, white) or vice versa. ALSO keep a running total of the sum of the values of black and white for each block. Stick this pair at the head of each image-string when you finish. We now have 6 000 image-strings representing the pictures. These now represent the image AND the amount of information contained in it. ASSUMPTION: an approximately equal amount of information in two images makes them worth comparing. This is why we used smaller cells; it gives more detailed information. 1. Start with image 1 as the image with which you can make comparison. 2. Compare its header pair of values with those for every other image, if the numbers come within a certain percentage of that of the first image then they are to be compared. Note that you don't have to have the same tolerance for black and white. 3. Having got your set to compare with, now check the string's "bodies". If they are identical the strings will match. For each image in the set: COMPARE! You don't have to match all elements of the strings! It suffices to take a sample of, say, 50 groups of elements may be 4 or 5 blocks long each and compare. Statistically they should all match. Again set tolerances. If they don't match try shifting the first block a bit to see if you can get a match, maybe even "down a few rows". They may be slightly out of register. 4. Repeat for the second image and so on. Hope that this helps, if you want to take this further I suggest that we take it O/L Peter Dzwig