ISOCLUS

Name	Type	Length	Value range
Input: Input raster channel(s) *	Raster port	1 - 16	-1024 - 1024
Output: Output raster channel	Raster port	0 - 1
Mask: Area Mask	Bitmap port	0 - 4	Xoffset, Yoffset, Xsize, Ysize
OutputSig: Output signature segment	SIG port	0 - 1
Number of Clusters Desired	Integer	0 - 1	1 - 255 Default: 16
Maximum Number of Clusters	Integer	0 - 1	1 - 255 Default: 16
Minimum Number of Clusters	Integer	0 - 1	1 - 255 Default: 16
See File	String	0 -
Maximum Number of Iterations	Integer	0 - 1	1 - 10000 Default: 20
Movement Threshold	Float	0 - 1	0.0 - 1.0 Default: 0.01
Minimum Sample Threshold	Integer	0 - 1	0 - 255 Default: 5
Standard Deviation	Float	1 - 1	0.0 - Default: 10.0
Lumping Parameter	Float	0 - 1	0.0 - Default: 1.0
Maximum Number of Lumping Pairs	Integer	0 - 1	0 - 255 Default: 5
Background Gray-Level Value	Float	0 - 1
Report	String	0 - 192	See parameter description

Parameter descriptions

Input: Input raster channel(s)

Specifies the input channels to use for the clustering operation.

Up to 16 input channels may be specified. Input channels can be a combination of 8-bit, 16-bit, or 32-bit real. Duplicate channels are not allowed.

Output: Output raster channel

Specifies the output channel to receive the clustering results.

If this parameter is not specified, results will not be saved into a channel.

To generate a signature segment, this parameter must be specified.

The output channel can be the same as the input channel (DBIC). Only the area under the area mask (MASK) is written to the output.

Mask: Area Mask

Optionally specifies the bitmap that defines the area to be processed within the input raster. If this parameter is not specified, the entire layer is used by default. For a bitmap mask, you must specify the bitmap segment that you want to use. All of the pixels within the specified segment, having a pixel value of 1, define the area to be processed.

Only the area under the mask is written to the output.

OutputSig: Output signature segment

Specifies the segment to receive the output cluster signature(s).

If this parameter is specified, the output channels must also be specified.

Number of Clusters Desired

Specifies the number of clusters (classes) desired. Note that this is only an estimate; the final number of clusters may vary. You can, however, limit the variation by setting the maximum (MAXCLUS) and minimum (MINCLUS) cluster parameters. Because cluster 256 is reserved for discarded clusters, the range of this parameter is 1 <= x < 255. The default value is 16.

Maximum Number of Clusters

Specifies the maximum number of clusters allowed. This parameter limits the total number of clusters allowed during splitting.

Acceptable values are 1 <= x <=255; the default value is 16.

Minimum Number of Clusters

Specifies the minimum number of clusters allowed. This parameter limits the total number of clusters allowed during lumping.

Acceptable values are 1 <= x <=255; the default value is 16.

See File

Specifies the text file used to read in initial seeds. If this parameter is not specified, seeds are generated diagonally along the n-dimensional histogram.

To specify the seeds for 4 channels and 6 clusters in a text file, use the following format:

    1   1   1   1            | 1st seed position in channel 1,2,3,4
    5   3   5   9            | 2nd seed position in channel 1,2,3,4
   40  43  20  10            | 3rd seed position in channel 1,2,3,4
  100 101 140  50            | 4th seed position in channel 1,2,3,4
  150 155 200 175            | 5th seed position in channel 1,2,3,4
  240 200 195 140            | 6th seed position in channel 1,2,3,4

Maximum Number of Iterations

Specifies the maximum number of iterations for calculating the cluster mean positions.

Movement Threshold

Specifies the movement threshold, in percentage of cluster means.

If the movement of all cluster means is less than the value specified for this parameter, the convergence has been completed. For example, for all cluster means, the following situation causes the execution to terminate:

  New cluster mean position - Old cluster mean position
  ----------------------------------------------------- < MOVETHRS
              Old cluster mean position

Minimum Sample Threshold

Specifies the minimum sample threshold. The number of samples in a cluster domain is compared to the value of this parameter. When the number of samples in a cluster is less than the specified minimum threshold, the cluster is discarded and the total number of clusters is reduced by 1.

For a cluster:

if the standard deviation is greater than the specified standard deviation (STDV), and
if the average distance of samples from the cluster center is greater than the overall average distance of the samples from their respective cluster centers, and
if the total number of clusters is less than the specified maximum number of cluster (MAXCLUS), and
if the number of samples in the cluster is greater than 2(SAMPRM + 1), or
the number of clusters is less than or equal to NUMCLUS / 2

then splitting of the cluster will occur.

Standard Deviation

Specifies the standard deviation. If a cluster has a standard deviation greater than that specified in this parameter, splitting may occur. The default value of 10.0 is reasonable.

Lumping Parameter

Specifies the lumping parameter.

If the distance between two cluster centers is less than LUMP, the total number of clusters is greater than MINCLUS, and the number of lumped pairs is less than MAXPAIR, lumping of clusters will occur.

Maximum Number of Lumping Pairs

Specifies the maximum number of pairs of cluster centers that can be lumped during each iteration.

Background Gray-Level Value

Optionally specifies a background gray-level value to be ignored during classification. If this parameter is specified, pixels with the specified value will be assigned class 0 (null class).

Report

Specifies where to direct the generated report.

Available options are:

LOG: generates a report on the terminal (default)
<modulename.RPT>: appends a report to the module's RPT file
DISK: generates a report on file "IMPRPT.LST"

Details

The ISODATA method is similar in principle to the K-means procedure in the sense that cluster centers are iteratively determined sampled means. Unlike the latter algorithm, however, ISODATA represents a fairly comprehensive set of additional heuristic procedures which have been incorporated into an interactive scheme. Up to 16 image channels can be analyzed, and 255 clusters (classes) found.

Due to the large amount of memory required, ISOCLUS will sample a subset of the image data during cluster mean calculations to generate a histogram. The amount of sampling depends on the amount of image data. For example, for a 1024x1024 image data, ISOCLUS will sample every other pixel during calculation of cluster means. When writing results to an output channel, however, all pixels will be classified.

The MASK parameter specifies the area within the input channel to be processed. Only the area under mask will be classified; the rest of the image will not be processed.

If a mask is not specified, the entire image is processed.

It is quite common for satellite images to have a lot of black-filled areas (with zero gray levels) that should not be included in the classification. To solve this problem, you can first run THR by setting the minimum and maximum threshold values to 1 and 255, respectively. A bitmap mask is thus created only on the image area. You can then specify this bitmap for the MASK parameter.

You can specify the number of clusters desired through the NUMCLUS parameter. Note that this specification only provides an estimate; the final number of clusters may vary. You may limit this variation by setting the maximum (MAXCLUS) and mimumum (MINCLUS) number of clusters. MAXCLUS limits the total number of clusters allowed during splitting, and MINCLUS limits the minimum number of clusters allowed during lumping.

The initial seed values can be entered in a text file and specified by the SEEDFILE parameter. If this parameter is not specified, seeds will be generated diagonally along the n-dimensional histogram.

Define the maximum number of iterations allowed (MAXITER), and the movement threshold (MOVETHRS). The execution terminates when the number of iterations reaches MAXITER, or when the movement of all cluster means is less than MOVETHRS. For example, for all cluster means, the following situation causes the execution to terminate:

 New cluster mean position - Old cluster mean position  
 ----------------------------------------------------- < MOVETHRS
              Old cluster mean position

The result of the clustering is a theme map directed to a specified image channel (DBOC). If the output channel is not specified, the results will not be saved to a channel. A theme map encodes each cluster with a unique gray-level value. The cluster number is represented by the gray level; for example, cluster 1 is assigned a gray-level value of 1, cluster 2 is assigned the gray-level value of 2. Gray level 0 represents unclassified clusters; therefore, if the theme map is later directed to the display, a pseudocolor table should be loaded so that each cluster is represented by a different color.

ISOCLUS allows you to specify a background gray-level value (BACKVAL) to be ignored during classification. If this value is specified, pixels with the defined background value will be assigned class 0 (null class).

Algorithm

ISOCLUS is based (with minor modifications) on the ISODATA method described in the following publication:

Tou, Julius T. and Rafael C. Gonzalez. 1974. Pattern Recognition Principles. Addison-Wesley Publishing Co.

ISOCLUS performs unsupervised clustering using the ISODATA method on image data for up to 255 clusters (classes) and 48 channels. The output is a theme map directed to a database image channel.

The ISODATA method is similar in principle to the K-means procedure in the sense that cluster centers are iteratively determined sampled means. Unlike the latter algorithm, however, ISODATA represents a fairly comprehensive set of additional heuristic procedures that have been incorporated into an interactive scheme. Up to 48 image channels and 255 clusters can be found, and image data from up to 48 specified channels on a specified file can be read in.

Due to the large amount of memory required, ISOCLUS samples a subset of the image data during cluster mean calculations to generate a histogram. The amount of sampling depends on the amount of image data. For example, for a 1024x1024 image data, ISOCLUS will sample every other pixel during calculation of cluster means. However, when writing results to an output channel, all pixels will be classified.

The ISOCLUS algorithm is described as follows:

Before the algorithm is executed, it is necessary to specify a set Nc of initial centers, Z1, Z2, ..., ZN. This set, which need not necessarily be equal in number to the desired cluster centers, can be formed by selecting samples from the given set of data.

For a set of N samples, {X1, X2, ..., Xn}, ISODATA consists of the following principal steps:

Specify the process parameters:
- NUMCLUS: number of cluster centers desired
- SAMPRM: a parameter against which the number of samples in a cluster domain is compared
- STDV: Standard Deviation parameter
- LUMP: Lumping Parameter
- MAXPAIR: Maximum number of pairs of cluster centers which can be lumped
- MAXITER: Maximum number of iterations allowed
Distribute the N samples among the present cluster centers, using the following relation:
```
XinSj, if ||X-Zj|| < ||X-Zi||,
```
i=1,2,...,Nc; i not equal j for all X in the sample set. In this notation, Sj represents the subset of samples assigned to cluster center Zj.

Discard sample subsets with fewer than SAMPRM members; that is:

if for any j, Nj < SAMPRM, then discard Sj and reduce Nc by 1.

Update each cluster center Zj, j=1,2,...,Nc, by setting it equal to the sample mean of its corresponding set Sj; that is:
```
          Zj = 1/Nj  SUM X,  j=1,2,...Nc
                    XinSj
```
where Nj is the number of samples in Sj. If samples were deleted in step 3, go to step 2.
Compute the average distance Dj of samples in the cluster domain Sj from their corresponding cluster center, using the following relation:
```
          Dj = 1/Nj  SUM ||X-Zj||, j=1,2,..Nc
                    XinSj
```
Compute the overall average distance of the samples from their respective cluster centers, using the following relation:
```
                      Nc
           Do = 1/N  SUM Nj Dj
                     j=1
```
If this is the last iteration, set LUMP = 0 and go to step 11; if Nc is less than or equal to NUMCLUS / 2, go to step 8; if this is an even-numbered iteration, or if Nc is greater than or equal to 2 (NUMCLUS), go to Step 11; otherwise, continue.
Find the standard deviation vector Vj = (V1j, V2j,.. Vnj)' for each sample subset, using the following relation:
```
        Vij = sqrt( 1/Nj SUM (Xik-Zij)**2) i=1,2,..n;
                                           j=1,2,..Nc
```
where n is the sample dimensionality, Xik is the ith component of the kth sample in Sj, Zij is the ith component of Zj, and Nj is the number of samples in Sj. Each component of Vj represents the standard deviation of the samples in Sj along a principal coordinate axis.
Find the maximum component of each Vj, j=1,2,..Nc, and denote it by Vjmax.
If for any Vjmax, j=1,2,...,Nc, there is Vjmax > STDV and (a) Dj > Do and Nj > 2(SAMPRM+1) or (b) Nc is less than or equal to NUMCLUS/2, then split Zj into two new cluster centers Zj+ and Zj-, delete Zj, and increase Nc by 1. Cluster center Zj+ is formed by adding a given quantity Gj to the component of Zj which corresponds to the maximum component of Vj; Zj- is formed by subtracting Gj from the same component of Zj. One way of specifying Gj is to let it be equal to some fraction of Vjmax. That is, Gj = k Vjmax, where k is greater than 0 and less than or equal to 1. In this example, k=0.5. If splitting took place in this step, go to Step 2; otherwise continue.
Compute the pairwise distances Dij between all cluster centers.
```
Dij=||Zi-Zj||, i = 1,2,..,Nc-1; j=i+1,...,Nc
```
Compare the distance Dij against the parameter LUMP. Arrange the L smallest distances, which are less than LUMP, in ascending order:
```
[Di1j1, Di2j2, ...,DiLjL]
```
where Di1j1 < Di2j2 < ... DiLjL and L is the maximum number of pairs of cluster centers which can be lumped together. The lumping process is discussed in the next step.
With each distance Diljl there is an associated pair of cluster centers, Zil and Zjl. Starting with the smallest of these distances, perform a pairwise lumping operation according to the following rule:

For l = 1,2,....,L, if neither Zil nor Zjl has been used in lumping in this iteration, merge these two cluster centers using the following relation:
```
Zl* = [1/(Nil+Njl)] * [ Nil(Zil) + Njl(Zjl)]
```
Delete Zil and Zjl and reduce Nc by 1.

It is noted that only pairwise lumping is allowed and that a lumped cluster center is obtained by weighting each old center by the number of samples in its domain. Experimental evidence indicates that more complex lumping can produce unsatisfactory results. It is also important to note that, since a cluster center can be lumped only once, this step will not always result in L lumped centers.
If this is the last iteration, the algorithm terminates. Otherwise go to Step 1 if any of the process parameters require changing, or go to Step 2 if the parameters are to remain the same for the next iteration.

Environments	PYTHON :: EASI :: MODELER
Batch Mode	Yes
Quick links	Description :: Parameters :: Parameter descriptions :: Details :: Algorithm :: References :: Related

ISOCLUS

Isodata clustering

Description

Parameters

Parameter descriptions

Details

Algorithm

References