Archived posting to the Leica Users Group, 1997/09/04
[Author Prev] [Author Next] [Thread Prev] [Thread Next] [Author Index] [Topic Index] [Home] [Search]Someone requested that a statistician work out the confidence intervals for Fernando's M6 problem survey. I'm not a statistician, but I play one on TV. Actually, I often do this kind of thing as part of my job, so I guess I can do a little analysis on Fernando's numbers for the group. Before starting, let me say as I said before (and several other people have said) that Fernando's statistical methods are unsound. They are bad, bad, bad. First, he has surveyed a group that may not be representative. Yes, it is possible that LUG members are not representative of most M6 owners with respect to camera problems -- for example, we could conceivably be too enthusiastic, causing dealers to take advantage of us by selling us the few broken M6's they have, saving the majority of working ones for people who are less easily gulled. Second, he has invited his survey participants to select themselves, based on how interested they are in his survey. This will produce biased results. This is not to say that Fernando is bad. I like Fernando, and I like his messages to the LUG. I generally think he's wonderful. But statistics... tricky stuff. Best left to professionals. Okay, some error bounds on Fernando's numbers. The usual method for small sample sizes makes use of Student's t-distribution, and it assumes that we have a random sample, because correcting for bias in the sample would be hard without more information. Therefore, the confidence intervals will be wrong. They will only be wrong because of the bias problem described in the first part of this message. If we actually had a random sample, the confidence intervals would be correct. First, the mean 6/26 (about 23%) for the probability that an M6 has a problem. The size of the 90% confidence interval is (0.087, 0.37), so that we would say with 90% confidence that the probability that a randomly-selected M6 has a problem is somewhere between about 9% and 37% -- if Fernando's sample is unbiased, which I claim it is not. People who are upset about something tend to select for themselves. A similar calculation can be done for only the 95-96 M6's, where the sample size is much lower. Here the numbers are (0.205, 0.718) -- we would say (if we believed the sample to be unbiased) with 90% confidence that the probability that an M6 made in 95-96 had a problem was between about 21% and 72%. But, once again, because of the non-random sample, this interval is probably wrong. To demonstrate the absurdity of trying to estimate anything from Fernando's survey data, I would point out that although the M6 has been in production for 13 years, half of all M6's in his survey were made in 1995 and later. If we used the logic I've seen advanced by lots of LUG members in this discussion, we would say that half of all M6's in the world were made in 95 and 96, which is clearly absurd. If we were to assume that Fernando's sample is random, we would conclude with 90% certainty that the probability that a randomly selected M6 was made in 95-96 is between 33% and 67%, but our certainty would obviously be misplaced. So, in conclusion, if you want to do this kind of thing, you need to study up. Buy a book on conducting surveys and another book on statistical methods, and read them and pay attention. Otherwise you produce misleading results that could be quoted widely, increasing the amount of misinformation in the world. - -Patrick