HUNTING THE EDINBURGH BUG ------------------------- The problem is that in Edinburgh I cannot classify using 30 feature sets, while in Passau these are handled with no problems. In all other circumstances we appear to get similar classifier behaviour in the two laboratories. The problem with the Edinburgh code can be explained by one of (1) Differences in computing classifier parameters (2) Differences in applying the classifier parameters I have investigated both. (1) COMPARISON OF EDINBURGH AND PASSAU COMPUTATION OF MEANS AND COVARIANCES --------------------------------------------------------------------------- Starting with features obtained from Passau, I computed the "a" set 30 feature classifier, by the Edinburgh program except that I used the matrix inversion routines from Passau. The numerical data -- means, covariances, inverted covariances -- were compared with files provided by Ilse, with the following results: MEANS are not identical; but only 14 out of 720 are different, with MAXIMUM RELATIVE DIFFERENCE less than 1:1000000. Comment - I use "doubles" everywhere. Could it be that in Passau they use "float"? Alternatively, are the differences in the means due to a different compiler or CPU (mine were computed using the Sun compiler on a SPARC-10)? COVARIANCES are commonly different, with a MAXIMUM RELATIVE DIFFERENCE of 700:1000000, though more usually very much smaller. I accumulate covariances differently to Passau, by accumulating the sum of feature pair products and then subtracting N* the product of the feature means (a commonly used rearrangement of the standard formula). Have I lost precision by doing it in this fashion? INVERTED COVARIANCES are commonly different, with a MAXIMUM RELATIVE DIFFERENCE of 140000:1000000 - i.e., 14%, which is relatively huge. Remember, I am using the Passau matrix inversion method. I expect that the explanation the large differences in the INVERTED COVARIANCES is that the 30-feature covariance matrices are known to be almost singular, which will I suspect effectively magnify differences in the original COVARIANCES. Directly comparing the Edinburgh and Passau matrix inversion routines when applied to the 30 feature data, there were 14 different values (out of 21600). Again the relative differences were of the order 1:1000000, so we can discount this as a source of classification problems observed in the past. (2) CLASSIFICATION USING EDINBURGH AND PASSAU TRAINED CLASSIFIER PARAMETERS --------------------------------------------------------------------------- I used "greedy" (context-free) Mahalanobis distance classification only, as it shows the problems fully while avoiding the necessity to get involved with RC3 and TA. "Greedy" Mahalanobis distance classification of the Edinburgh Cph B set, using the Passau version of Cph A data to train a classifier with all 30 features, gives an error rate more than 50% when trained using the Edinburgh program. Surprisingly, when using the Passau-trained means and covariances supplied by Ilse (i.e. using the Passau classifier training program), the error rate was similar. In each case many raw Mahalanobis distances (D**2) were found to be negative. ****CONCLUSIONS**** - the Edinburgh problem is NOT the result of the differences in the two training methods. The small differences in parameter values observed in (1) above are irrelevant. If there is a bug, it must therefore lie in the test (classification) part of the system. (3) CHASING THE NEGATIVE D**2 VALUES ------------------------------------ Mostly (but not always) the negative D**2 values occurred for class 10. The negative values of D**2 were typically in the range -10 to -1000, while positive values were typically in the range +10 to +10000. Since Ilse claimed never to have seen this effect, I suspected a program bug. So in the trained classifier parameters, classes 10 and 11 were swapped, in order to check for a random store corruption bug in the classification program; however, the results were completely compatible with the earlier ones, with frequent negative D**2 for class 11 and values identical to what they were previously for class 10. Repeating the experiment (but using Edinburgh data) with 28 features, including those most highly correlated (which are 0, 1, 7, 27), led to similar results; as did a 5 feature experiment (using just 0, 1, 3, 7, 27), and even a 3 feature set (using 0, 1, 7). Most other feature sets tested were OK, including the 29 feature set omitting feature 1. ****CONCLUSION**** - the problem only occurs when the linearly dependent features 0, 1 and 7 are included.