HUNTING THE EDINBURGH BUG
			-------------------------

The problem is that in Edinburgh I cannot classify using 30 feature
sets, while in Passau these are handled with no problems.  In all
other circumstances we appear to get similar classifier behaviour
in the two laboratories.

The problem with the Edinburgh code can be explained by one of
	(1) Differences in computing classifier parameters
	(2) Differences in applying the classifier parameters
I have investigated both.


(1) COMPARISON OF EDINBURGH AND PASSAU COMPUTATION OF MEANS AND COVARIANCES
---------------------------------------------------------------------------

Starting with features obtained from Passau, I computed the "a" set 30
feature classifier, by the Edinburgh program except that I used the
matrix inversion routines from Passau.  The numerical data -- means,
covariances, inverted covariances -- were compared with files provided
by Ilse, with the following results:

MEANS are not identical; but only 14 out of 720 are different, with
MAXIMUM RELATIVE DIFFERENCE less than 1:1000000.  Comment - I use
"doubles" everywhere.  Could it be that in Passau they use "float"? 
Alternatively, are the differences in the means due to a different
compiler or CPU (mine were computed using the Sun compiler on a
SPARC-10)?

COVARIANCES are commonly different, with a MAXIMUM RELATIVE DIFFERENCE
of 700:1000000, though more usually very much smaller.  I accumulate
covariances differently to Passau, by accumulating the sum of feature
pair products and then subtracting N* the product of the feature means
(a commonly used rearrangement of the standard formula).  Have I lost
precision by doing it in this fashion?

INVERTED COVARIANCES are commonly different, with a MAXIMUM RELATIVE
DIFFERENCE of 140000:1000000 - i.e., 14%, which is relatively huge.
Remember, I am using the Passau matrix inversion method.

I expect that the explanation the large differences in the INVERTED
COVARIANCES is that the 30-feature covariance matrices are known to be
almost singular, which will I suspect effectively magnify differences
in the original COVARIANCES.

Directly comparing the Edinburgh and Passau matrix inversion routines
when applied to the 30 feature data, there were 14 different values
(out of 21600).  Again the relative differences were of the order
1:1000000, so we can discount this as a source of classification
problems observed in the past.

(2) CLASSIFICATION USING EDINBURGH AND PASSAU TRAINED CLASSIFIER PARAMETERS
---------------------------------------------------------------------------

I used "greedy" (context-free) Mahalanobis distance classification
only, as it shows the problems fully while avoiding the necessity to
get involved with RC3 and TA.

"Greedy" Mahalanobis distance classification of the Edinburgh Cph B
set, using the Passau version of Cph A data to train a classifier with
all 30 features, gives an error rate more than 50% when trained using
the Edinburgh program.

Surprisingly, when using the Passau-trained means and covariances
supplied by Ilse (i.e. using the Passau classifier training program),
the error rate was similar.  In each case many raw Mahalanobis
distances (D**2) were found to be negative.

****CONCLUSIONS**** - the Edinburgh problem is NOT the result of the
differences in the two training methods.  The small differences in
parameter values observed in (1) above are irrelevant.  If there is a
bug, it must therefore lie in the test (classification) part of the
system.


(3) CHASING THE NEGATIVE D**2 VALUES
------------------------------------

Mostly (but not always) the negative D**2 values occurred for class 10.
The negative values of D**2 were typically in the range -10 to -1000,
while positive values were typically in the range +10 to +10000.

Since Ilse claimed never to have seen this effect, I suspected a
program bug.  So in the trained classifier parameters, classes 10 and
11 were swapped, in order to check for a random store corruption bug in
the classification program; however, the results were completely
compatible with the earlier ones, with frequent negative D**2 for class
11 and values identical to what they were previously for class 10.

Repeating the experiment (but using Edinburgh data) with 28 features,
including those most highly correlated (which are 0, 1, 7, 27), led to
similar results; as did a 5 feature experiment (using just 0, 1, 3, 7,
27), and even a 3 feature set (using 0, 1, 7).  Most other feature sets
tested were OK, including the 29 feature set omitting feature 1.

****CONCLUSION**** - the problem only occurs when the linearly
dependent features 0, 1 and 7 are included.