Ordination | vs. | Classification |
(placing samples relative to continuous scales) | (placing samples into discontinuous categories) |
Classification
There is a large number of contrasting algorithms available for the classification of samples.
One contrast is between hierarchical and reticulate classification.
Hierarchical classification is one that can be represented by
means of a dendrogram; the placing of a sample within a class at a low
level in the dendrogram automatically places it within higher-level classes.
Reticulate
classifications do not have this property. The former are more informative
and more often used - if you have a biological background you should be
used to hierarchical classifications! If you understand the relationships
between files making up pages of a web wite - that's essentially a reticulate
classification, each unit being identified by its closest links.
A monothetic classification allocates items into classes according to their values for a single variable, in contrast to polythetic classifications which use many (usually all) variables.
There is also a contrast between agglomerative (i.e. lumping)
and divisive (i.e. splitting) approaches. Agglomerative methods
start with individual items and groups them together in a series of steps;
divisive ones start with the whole set of data and progressively split
them up to form the groups at lower levels of the dendrogram. The latter
is the preferred approach as it uses more of the information in the data
set.
Examples of different classification algorithms:
|
|
|
|
|
|
|
|
TWINSPAN |
TWINSPAN
stands for Two-Way INdicator SPecies ANalysis
It is based on Reciprocal Averaging ordination (RA) and is best envisaged in terms of samples characterised by species' abundances.
RA can be summarised thus: samples are placed in order according to the abundances of the various species; the species are then assigned weights to correspond with the relative sample positions and the sample scores re-calculated. The samples are then placed in order according to the re-calculated scores and the species weights can be re-calculated - then the sample scores are re-calculated.... and so on in a recursive process. Finally, this settles down with the samples in the best order according to their species composition and the species in the best order according to their occurrence in the samples.
Steps in TWINSPAN
1. Ordinate the samples by RA.
2. Find the best place ("centre of gravity") at which to split the
data set into two.
3. Identify the species showing most difference in occurrence on the
two sides (+ve and -ve) of the split - these are termed Indicator Species.
4. Use these species to do a "refined ordination" and verify the best
split.
5. Calculate indicator scores for the samples (adding +1 for each +ve
indicator species present and -1 for each -ve indicator species).
Repeat steps 1 - 5 for each of the sub-groups.
This process can then be repeated going down the dendrogram until the required number of classes is obtained.
The splits between classes can be described in terms of (a) how "good" they are, i.e. how different are the resultant groups, and (b) indicator species.
Contrast with Ordination
A. The samples are placed into discrete categories (i.e. the classes or end-groups) rather than placed in sequence along a continuous axis. Note that with TWINSPAN, however, it is an ordered classification, with a clear sequence to the classes usually evident (relative to the first axis of reciprocal averaging).
B. An ordination is restricted to the data set on which it was performed. In contrast, to some extent a classification can be applied to new samples. A classification such as TWINSPAN usually generates a key which can be applied to additional samples to place them within the defined classes (as long as their species composition is not too different from the analysed data set).