Archived posting to the Leica Users Group, 2006/03/31
[Author Prev] [Author Next] [Thread Prev] [Thread Next] [Author Index] [Topic Index] [Home] [Search]<<Date: Fri, 31 Mar 2006 15:53:09 -0800 From: Brian Reid <reid@mejac.palo-alto.ca.us> Subject: Re: [Leica] Re: Photo comparison software To: Leica Users Group <lug@leica-users.org> Message-ID: <10989B25DFF39D348E4B0EE5@scarborough.isc.org> Content-Type: text/plain; charset=us-ascii; format=flowed This is a variation of a classic computer science problem. It's hard, and there's no software outside international government spy agencies that can do it. The only way to make it tractable is to define a classification scheme for the pictures and sort them into similarity groups. The scheme doesn't matter; you can do it by color, by whether or not it contains a chimney, by whether or not it contains a person, or how much of the paint is peeling. Once you've broken down the "several thousand" pictures into clusters (groups whose contents are similar according to your primary criterion) then pick one of those clusters and repeat the process. If the cluster consists of all photographs that have a chimney at the left side or all photographs that show shark teeth, find sub-categories to allow you to further divide the clusters into sub-clusters. Keep doing this until you get groups that have under about 50 pictures in them. Then compare by hand; the sub-sub-clusters will be small enough that you won't have any trouble finding similar pictures. I've done this 3 or 4 times in my life, this process works.>> Brian, I'm sure you are right but I'd hoped there was an easier way. There are over 6,000 individual photographs and it will take me months to classify them and then sort through the individual groups. I've conceptualized an easier way but my programming skills aren't good enough to implement it. Here is the conceptual scheme: 1. Copy and standardize the pictures as screen sized gray scale images. This can be done fairly easily by batch processing in GraphicConverter. 2. Divide each image into about 2500 cells, say a 50 x 50 matrix. Compute the average b rightness of each cell. This might be easier using only a 2 bit gray scale. The brightness calculation could be made by simply counting the number of black pixels in each cell. 3. Perform a product moment correlation between each image and all the other images using the digitized cell scores. I know that this means over 18 million correlations, but what the hell. It's a computer and it can work all weekend without complaining. 4. Identify pairs of photos where the correlation is higher than some arbitrary cutoff, say greater than 0.9. This would give me an 80% probability that the pictures are the same or at least quite similar. 5. Visually compare the originals of the high correlation images to see if they are indeed identical. If anyone has any ideas on how to improve the process, please let me know. I'll try and get some of my expert programming colleagues to see if they can make it work. Any idea will be appreciated. I'm sure that many of the LUG members have a related problem in trying to organize their digital shoe box of image files. I know that I have a couple of thousand unclassified photos on my Mac right now that I've laways promised myself I would sort through eventually. Larry Z