ISOCLUS

Isodata clustering


EnvironmentsPYTHON :: EASI :: MODELER
Quick linksDescription :: Parameters :: Parameter descriptions :: Details :: Example :: Algorithm :: References :: Related

Back to top

Description


Performs unsupervised clustering using the ISODATA method on image data for up to 255 clusters (classes) and 16 channels. The output is a theme map directed to a database image channel.
Back to top

Parameters


Name Type Caption Length Value range
FILE * String Input file name 1 - 192  
DBIC * Integer Input raster channel(s) 1 - 16 -1024 - 1024
DBOC Integer Output raster channel 0 - 1  
MASK Integer Area Mask 0 - 4 Xoffset, Yoffset, Xsize, Ysize
NUMCLUS Integer Number of clusters 0 - 1 1 - 255
Default: 16
MAXCLUS Integer Maximum number of clusters 0 - 1 1 - 255
Default: 16
MINCLUS Integer Minimum number of clusters 0 - 1 1 - 255
Default: 16
SEEDFILE String Initial seed text file 0 - 192  
MAXITER Integer Maximum number of iterations 0 - 1 1 - 10000
Default: 20
MOVETHRS Float Movement threshold 0 - 1 0.0 - 1.0
Default: 0.01
SIGGEN String Generate signatures 0 - 3 YES | NO
Default: NO
SAMPRM Integer Minimum sample threshold 0 - 1 0 - 255
Default: 5
STDV * Float Standard deviation 1 - 1 0.0 -
Default: 10.0
LUMP Float Lumping parameter 0 - 1 0.0 -
Default: 1.0
MAXPAIR Integer Maximum number of lumping pairs 0 - 1 0 - 255
Default: 5
BACKVAL Float Background gray-level value 0 - 1  
NSAM Integer Number of image points used in the sample 0 - 1 1 -
Default: 262144
REPORT String Report mode 0 - 192 Quick links

* Required parameter
Back to top

Parameter descriptions

FILE

Specifies the name of the PCIDSK image file on which to perform clustering.

DBIC

Specifies the input channels to use for the clustering operation.

Up to 16 input channels may be specified. Input channels can be a combination of 8-bit, 16-bit, or 32-bit real. Duplicate channels are not allowed.

Ranges of channels or segments can be specified with negative values. For example, {1,-4,10} is internally expanded to {1,2,3,4,10}. When you are not specifying a range in this way, only 48 numbers can be specified explicitly.

DBOC

Specifies the output channel to receive the clustering results.

If this parameter is not specified, results will not be saved into a channel.

To generate a signature segment, this parameter must be specified.

The output channel can be the same as the input channel (DBIC). Only the area under the area mask (MASK) is written to the output.

MASK

Optionally specifies the window or bitmap that defines the area to be processed within the input raster. If this parameter is not specified, the entire channel is processed.

A window mask is specified as follows:
MASK=Xoffset, Yoffset, Xsize, Ysize

Xoffset, Yoffset define the upper-left starting pixel coordinates of the window. Xsize is the number of pixels that define the window width. Ysize is the number of lines that define the window height.

For a bitmap mask, you can specify the bitmap segment number from the input file that you want to use. All the pixels within the specified segment having a pixel value of 1 define the area to be processed.

Only the area under the mask is written to the output.

NUMCLUS

Specifies the number of clusters (classes) desired. Note that this is only an estimate; the final number of clusters may vary. You can, however, limit the variation by setting the maximum (MAXCLUS) and minimum (MINCLUS) cluster parameters. Because cluster 256 is reserved for discarded clusters, the range of this parameter is 1 <= x < 255. The default value is 16.

MAXCLUS

Specifies the maximum number of clusters allowed. This parameter limits the total number of clusters allowed during splitting.

Acceptable values are 1 <= x <=255; the default value is 16.

MINCLUS

Specifies the minimum number of clusters allowed. This parameter limits the total number of clusters allowed during lumping.

Acceptable values are 1 <= x <=255; the default value is 16.

SEEDFILE

Specifies the text file used to read in initial seeds. If this parameter is not specified, seeds are generated diagonally along the n-dimensional histogram.

To specify the seeds for 4 channels and 6 clusters in a text file, use the following format:

    1   1   1   1            | 1st seed position in channel 1,2,3,4
    5   3   5   9            | 2nd seed position in channel 1,2,3,4
   40  43  20  10            | 3rd seed position in channel 1,2,3,4
  100 101 140  50            | 4th seed position in channel 1,2,3,4
  150 155 200 175            | 5th seed position in channel 1,2,3,4
  240 200 195 140            | 6th seed position in channel 1,2,3,4

MAXITER

Specifies the maximum number of iterations for calculating the cluster mean positions.

MOVETHRS

Specifies the movement threshold, in percentage of cluster means.

If the movement of all cluster means is less than the value specified for this parameter, the convergence has been completed. For example, for all cluster means, the following situation causes the execution to terminate:

  New cluster mean position - Old cluster mean position
  ----------------------------------------------------- < MOVETHRS
              Old cluster mean position

SIGGEN

Specifies whether to generate the signature for each cluster. These signature segments can be input to MLC (Maximum Likelihood Classification) to classify other images.

Available options are:

If this parameter is set to YES, DBOC must be specified.

SAMPRM

Specifies the minimum sample threshold. The number of samples in a cluster domain is compared to the value of this parameter. When the number of samples in a cluster is less than the specified minimum threshold, the cluster is discarded and the total number of clusters is reduced by 1.

For a cluster:

then splitting of the cluster will occur.

STDV

Specifies the standard deviation. If a cluster has a standard deviation greater than that specified in this parameter, splitting may occur. The default value of 10.0 is reasonable.

LUMP

Specifies the lumping parameter.

If the distance between two cluster centers is less than LUMP, the total number of clusters is greater than MINCLUS, and the number of lumped pairs is less than MAXPAIR, lumping of clusters will occur.

MAXPAIR

Specifies the maximum number of pairs of cluster centers that can be lumped during each iteration.

BACKVAL

Optionally specifies a background gray-level value to be ignored during classification. If this parameter is specified, pixels with the specified value will be assigned class 0 (null class).

NSAM

Specifies the number of samples to collect on which to perform the the iterative clustering.

If this parameter is not specified, it defaults to 262144. If the specified value is larger than the total number of pixels in the image, all the pixels in the image will be used.

The time to compute each iteration is proportional to the number of samples used; using more than the default number of samples may significantly slow down the the clustering process. Also, because all the samples are stored in memory, a large NSAM value may lead to higher memory requirements. With 262144 samples and five bands of 8-bit input data, approximately 1.3MB of memory would be required; with NSAM set to 2000000, it would require 10MB of memory.

REPORT

Specifies where to direct the generated report.

Available options are:

Back to top

Details

The ISODATA method is similar in principle to the K-means procedure in the sense that cluster centers are iteratively determined sampled means. Unlike the latter algorithm, however, ISODATA represents a fairly comprehensive set of additional heuristic procedures which have been incorporated into an interactive scheme. Up to 16 image channels can be analyzed, and 255 clusters (classes) found.

Due to the large amount of memory required, ISOCLUS will sample a subset of the image data during cluster mean calculations to generate a histogram. The amount of sampling depends on the amount of image data. For example, for a 1024x1024 image data, ISOCLUS will sample every other pixel during calculation of cluster means. When writing results to an output channel, however, all pixels will be classified.

The MASK parameter specifies the area within the input channel to be processed. Only the area under mask will be classified; the rest of the image will not be processed.

If a single MASK value is specified, this refers to a bitmap segment that defines the area to be classified. When four values are specified, these define the x,y offsets and x,y dimensions of a rectangular window within the image to be classified.

If a mask is not specified, the entire image is processed.

It is quite common for satellite images to have a lot of black-filled areas (with zero gray levels) that should not be included in the classification. To solve this problem, you can first run THR by setting the minimum and maximum threshold values to 1 and 255, respectively. A bitmap mask is thus created only on the image area. You can then specify this bitmap for the MASK parameter.

You can specify the number of clusters desired through the NUMCLUS parameter. Note that this specification only provides an estimate; the final number of clusters may vary. You may limit this variation by setting the maximum (MAXCLUS) and mimumum (MINCLUS) number of clusters. MAXCLUS limits the total number of clusters allowed during splitting, and MINCLUS limits the minimum number of clusters allowed during lumping.

The initial seed values can be entered in a text file and specified by the SEEDFILE parameter. If this parameter is not specified, seeds will be generated diagonally along the n-dimensional histogram.

Define the maximum number of iterations allowed (MAXITER), and the movement threshold (MOVETHRS). The execution terminates when the number of iterations reaches MAXITER, or when the movement of all cluster means is less than MOVETHRS. For example, for all cluster means, the following situation causes the execution to terminate:

 New cluster mean position - Old cluster mean position  
 ----------------------------------------------------- < MOVETHRS
              Old cluster mean position

The result of the clustering is a theme map directed to a specified image channel (DBOC). If the output channel is not specified, the results will not be saved to a channel. A theme map encodes each cluster with a unique gray-level value. The cluster number is represented by the gray level; for example, cluster 1 is assigned a gray-level value of 1, cluster 2 is assigned the gray-level value of 2. Gray level 0 represents unclassified clusters; therefore, if the theme map is later directed to the display, a pseudocolor table should be loaded so that each cluster is represented by a different color.

ISOCLUS provides the option of generating signatures for each cluster through the SIGGEN parameter. If SIGGEN is YES, a signature for each cluster is generated. You can use MLC (Maximum Likelihood Classifier) to classify other images.

ISOCLUS allows you to specify a background gray-level value (BACKVAL) to be ignored during classification. If this value is specified, pixels with the defined background value will be assigned class 0 (null class).

Back to top

Example

The following is an example of finding clusters from 5 channels in the file irvine.pix. The desired number of clusters is 10, and the maximum and minimum allowable number of clusters are 20 and 5, respectively.

EASI>file	=	irvine.pix	! input file
EASI>dbic	=	1,2,3,4,5	! input channels
EASI>dboc	=	9	! output channel
EASI>mask	=		! process entire channel
EASI>numclus	=	5	! request 5 clusters 
EASI>maxclus	=	20	! at most 20 clusters
EASI>minclus	=	5	! at least 5 clusters
EASI>seedfile	=	 	! automatically generate seeds
EASI>maxiter	=	20	! no more than 20 iterations
EASI>movethrs	=	0.01     
EASI>siggen	=	"NO"	! no signature generation
EASI>samprm	=	5
EASI>stdv	=	10.0
EASI>lump	=	1.0
EASI>maxpair	=	5	! no more than 5 cluster center pairs
			! clumped in one iteration
EASI>backval	=		! no background value
EASI>nsam	=		! default number of samples

EASI>RUN ISOCLUS

The following is an example of the output:

  Iteration :  1
 
  No. of Clusters :  5
 
  Cluster   Samples   Max. Stdv.  Mean Position:
 
  (  1)     73507     9.51611       55.81042    19.86229    19.91672  
                                    29.82999    13.69804
  (  2)    187513     9.03195       67.61027    27.55517    32.50297   
                                    43.59904    30.56694
  (  3)      1093    14.98393      120.00732    57.25526    78.13449
                                    66.09241    58.46386
  (  4)        28    23.78014      207.92857   103.00000   144.28572   
                                   115.46429    71.10714
  (  5)         3     1.69967      238.33333   119.00000   163.66667   
                                   129.00000    71.66666
 Cluster No. Discarded :   5
 Total No. of Clusters discarded :   1

 Recalculating Clusters Mean Values

 Iteration : ...
 Iteration : ...

 Iteration : 20

 No. of Clusters : 10

 Cluster   Samples   Max. Stdv.  Mean Position:

 (  1)     44672     7.13140       54.77601    19.13615    18.82732    
                                   24.60888    11.74371
 (  2)     84972     5.76638       59.93691    22.69998    24.52145    
                                   38.33545    21.61785
 (  3)     66118     5.16872       66.85352    27.10802    32.22654    
                                   40.91793    32.69448
 (  4)      3373     8.07777       96.96590    43.43404    56.30388    
                                   49.36911    39.12481
 (  5)     17970     6.01282       79.61875    33.60000    41.91302    
                                   40.81953    31.33311
 (  6)       781    14.94864      125.76312    59.08835    80.00000    
                                   66.81434    50.97439
 (  7)     25571     8.84357       64.53916    26.48536    29.13046    
                                   58.64683    25.93798
 (  8)      2164     8.68565       98.43207    47.03928    64.37061    
                                   57.31747    59.35536
 (  9)     16485     6.48150       74.65708    32.60097    42.35007    
                                   48.62518    45.63634
 ( 10)        35    26.02109      200.60001    99.22857   138.28572   
                                  110.91428    69.05714


 FINAL RESULTS :

 NO. OF CLUSTERS : 10

 CLUSTER   PIXELS       MEAN POSITION :

 (  1)     42309        54.6381     19.0508     18.6878     24.1532
                        11.5138
 (  2)     83602        59.7712     22.5730     24.3250     37.9481
                        21.2744
 (  3)     24671        64.1729     26.2388     28.6585     58.8584
                        25.1137
 (  4)     67492        66.4786     26.9082     31.8410     41.0427
                        32.3277
 (  5)     18472        73.8437     32.0540     41.4255     48.4454
                        44.7910
 (  6)     18409        79.1224     33.3069     41.4644     40.5139
                        31.0331
 (  7)      4004        95.0092     42.5310     55.1276     48.7040
                        39.3821
 (  8)      2258        97.3866     46.4632     63.6063     57.0106
                        59.1103
 (  9)       888       124.4178     58.3705     78.8570     65.9673
                        50.7140
 ( 10)        39       202.7692    100.2564    139.4103    111.5385
                        68.9231
        --------
          262144
Back to top

Algorithm

ISOCLUS is based (with minor modifications) on the ISODATA method described in the following publication:

Tou, Julius T. and Rafael C. Gonzalez. 1974. Pattern Recognition Principles. Addison-Wesley Publishing Co.

ISOCLUS performs unsupervised clustering using the ISODATA method on image data for up to 255 clusters (classes) and 48 channels. The output is a theme map directed to a database image channel.

The ISODATA method is similar in principle to the K-means procedure in the sense that cluster centers are iteratively determined sampled means. Unlike the latter algorithm, however, ISODATA represents a fairly comprehensive set of additional heuristic procedures that have been incorporated into an interactive scheme. Up to 48 image channels and 255 clusters can be found, and image data from up to 48 specified channels on a specified file can be read in.

Due to the large amount of memory required, ISOCLUS samples a subset of the image data during cluster mean calculations to generate a histogram. The amount of sampling depends on the amount of image data. For example, for a 1024x1024 image data, ISOCLUS will sample every other pixel during calculation of cluster means. However, when writing results to an output channel, all pixels will be classified.

The ISOCLUS algorithm is described as follows:

Before the algorithm is executed, it is necessary to specify a set Nc of initial centers, Z1, Z2, ..., ZN. This set, which need not necessarily be equal in number to the desired cluster centers, can be formed by selecting samples from the given set of data.

For a set of N samples, {X1, X2, ..., Xn}, ISODATA consists of the following principal steps:
  1. Specify the process parameters:
    • NUMCLUS: number of cluster centers desired
    • SAMPRM: a parameter against which the number of samples in a cluster domain is compared
    • STDV: Standard Deviation parameter
    • LUMP: Lumping Parameter
    • MAXPAIR: Maximum number of pairs of cluster centers which can be lumped
    • MAXITER: Maximum number of iterations allowed
  2. Distribute the N samples among the present cluster centers, using the following relation:

    XinSj, if ||X-Zj|| < ||X-Zi||,
    

    i=1,2,...,Nc; i not equal j for all X in the sample set. In this notation, Sj represents the subset of samples assigned to cluster center Zj.

  3. Discard sample subsets with fewer than SAMPRM members; that is:

    if for any j, Nj < SAMPRM, then discard Sj and reduce Nc by 1.
    
  4. Update each cluster center Zj, j=1,2,...,Nc, by setting it equal to the sample mean of its corresponding set Sj; that is:

              Zj = 1/Nj  SUM X,  j=1,2,...Nc
                        XinSj
    

    where Nj is the number of samples in Sj. If samples were deleted in step 3, go to step 2.

  5. Compute the average distance Dj of samples in the cluster domain Sj from their corresponding cluster center, using the following relation:

              Dj = 1/Nj  SUM ||X-Zj||, j=1,2,..Nc
                        XinSj
    
  6. Compute the overall average distance of the samples from their respective cluster centers, using the following relation:

                          Nc
               Do = 1/N  SUM Nj Dj
                         j=1
    
  7. If this is the last iteration, set LUMP = 0 and go to step 11; if Nc is less than or equal to NUMCLUS / 2, go to step 8; if this is an even-numbered iteration, or if Nc is greater than or equal to 2 (NUMCLUS), go to Step 11; otherwise, continue.

  8. Find the standard deviation vector Vj = (V1j, V2j,.. Vnj)' for each sample subset, using the following relation:

            Vij = sqrt( 1/Nj SUM (Xik-Zij)**2) i=1,2,..n;
                                               j=1,2,..Nc
    

    where n is the sample dimensionality, Xik is the ith component of the kth sample in Sj, Zij is the ith component of Zj, and Nj is the number of samples in Sj. Each component of Vj represents the standard deviation of the samples in Sj along a principal coordinate axis.

  9. Find the maximum component of each Vj, j=1,2,..Nc, and denote it by Vjmax.

  10. If for any Vjmax, j=1,2,...,Nc, there is Vjmax > STDV and (a) Dj > Do and Nj > 2(SAMPRM+1) or (b) Nc is less than or equal to NUMCLUS/2, then split Zj into two new cluster centers Zj+ and Zj-, delete Zj, and increase Nc by 1. Cluster center Zj+ is formed by adding a given quantity Gj to the component of Zj which corresponds to the maximum component of Vj; Zj- is formed by subtracting Gj from the same component of Zj. One way of specifying Gj is to let it be equal to some fraction of Vjmax. That is, Gj = k Vjmax, where k is greater than 0 and less than or equal to 1. In this example, k=0.5. If splitting took place in this step, go to Step 2; otherwise continue.

  11. Compute the pairwise distances Dij between all cluster centers.

    Dij=||Zi-Zj||, i = 1,2,..,Nc-1; j=i+1,...,Nc
    
  12. Compare the distance Dij against the parameter LUMP. Arrange the L smallest distances, which are less than LUMP, in ascending order:

    [Di1j1, Di2j2, ...,DiLjL]
    

    where Di1j1 < Di2j2 < ... DiLjL and L is the maximum number of pairs of cluster centers which can be lumped together. The lumping process is discussed in the next step.

  13. With each distance Diljl there is an associated pair of cluster centers, Zil and Zjl. Starting with the smallest of these distances, perform a pairwise lumping operation according to the following rule:

    For l = 1,2,....,L, if neither Zil nor Zjl has been used in lumping in this iteration, merge these two cluster centers using the following relation:

    Zl* = [1/(Nil+Njl)] * [ Nil(Zil) + Njl(Zjl)]
    

    Delete Zil and Zjl and reduce Nc by 1.

    It is noted that only pairwise lumping is allowed and that a lumped cluster center is obtained by weighting each old center by the number of samples in its domain. Experimental evidence indicates that more complex lumping can produce unsatisfactory results. It is also important to note that, since a cluster center can be lumped only once, this step will not always result in L lumped centers.

  14. If this is the last iteration, the algorithm terminates. Otherwise go to Step 1 if any of the process parameters require changing, or go to Step 2 if the parameters are to remain the same for the next iteration.

Back to top

References

ISOCLUS is based (with minor modifications) on the ISODATA method described in the following publication:

Tou, Julius T. and Rafael C. Gonzalez. 1974. Pattern Recognition Principles. Addison-Wesley Publishing Co.

© PCI Geomatics Enterprises, Inc.®, 2026. All rights reserved.