KCLUS

K-Means Clustering


EnvironmentsPYTHON :: EASI :: MODELER
Batch ModeYes
Quick linksDescription :: Parameters :: Parameter descriptions :: Details :: References :: Related

Back to top

Description


Performs unsupervised clustering using the K-means (Minimum Distance) method on image data for up to 255 clusters (classes) and 16 channels. The output is a theme map directed image channel.
Back to top

Parameters


Name Type Length Value range
Input: Input raster channels * Raster port 1 - 16 -1024 - 1024
Output: Output raster channel Raster port 0 - 1  
Mask: Area mask Bitmap port 0 - 4 Xoffset, Yoffset, Xsize, Ysize
OutputSig: Output signature segment SIG port 0 - 1  
Number of Cluster Centers Integer 0 - 1 1 - 255
Default: 16
Seed File String 0 -    
Maximum Number of Iterations Integer 0 - 1 1 - 10000
Default: 20
Movement Threshold Float 0 - 1 0.0 - 1.0
Default: 0.01
Background Gray-Level Value Float 0 - 1 0 -
Number of Pixel Values to Sample Integer 0 - 1 1 -
Default: 262144
Report String 0 - 192 See parameter description

* Required parameter
Back to top

Parameter descriptions

Input: Input raster channels

Specifies the input channels to use for the clustering operation.

Up to 16 input channels may be specified. Input channels can be a combination of 8-bit, 16-bit, or 32-bit real. Duplicate channels are not allowed.

Output: Output raster channel

Specifies the output channel to receive the clustering results.

If this parameter is not specified, results will not be saved into a channel.

To generate a signature segment, this parameter must be specified.

The output channel can be the same as the input channel (DBIC). Only the area under the area mask (MASK) is written to the output.

Mask: Area mask

Optionally specifies the bitmap that defines the area to be processed within the input raster. If this parameter is not specified, the entire layer is used by default. For a bitmap mask, you must specify the bitmap segment that you want to use. All of the pixels within the specified segment, having a pixel value of 1, define the area to be processed.

Only the area under the mask is written to the output.

OutputSig: Output signature segment

Specifies the segment to receive the output cluster signature(s).

If this parameter is specified, the output channels must also be specified.

Number of Cluster Centers

Specifies the number of clusters (classes) desired. The default value is 16.

Seed File

Specifies the text file used to read in initial seeds. If this parameter is not specified, seeds are generated diagonally along the n-dimensional histogram.

Maximum Number of Iterations

Specifies the maximum number of iterations for calculating the cluster mean positions.

Movement Threshold

Specifies the movement threshold, in percentage of cluster means.

If the movement of all cluster means is less than the value specified for this parameter, the convergence has been completed. For example, for all cluster means, the following situation causes the execution to terminate:

  New cluster mean position - Old cluster mean position
  ----------------------------------------------------- < MOVETHRS
              Old cluster mean position

Background Gray-Level Value

Optionally specifies a background gray-level value to be ignored during classification. If this parameter is specified, pixels with the specified value will be assigned class 0 (null class).

Number of Pixel Values to Sample

Specifies the number of samples to collect on which to perform the the iterative clustering.

If this parameter is not specified, it defaults to 262144. If the specified value is larger than the total number of pixels in the image, all the pixels in the image will be used.

The time to compute each iteration is proportional to the number of samples used; using more than the default number of samples may significantly slow down the the clustering process. Also, because all the samples are stored in memory, a large NSAM value may lead to higher memory requirements. With 262144 samples and five bands of 8-bit input data, approximately 1.3MB of memory would be required; with NSAM set to 2000000, it would require 10MB of memory.

Report

Specifies where to direct the generated report.

Available options are:

Back to top

Details

KCLUS performs unsupervised clustering using the K-means (Minimum Distance) method on image data. Up to 16 image channels can be analyzed, and 255 clusters (classes) found. The output is a theme map directed image channel. The K-Means method is used to classify image data into different clusters.

Due to the large amount of memory required, KCLUS will sample a subset of the image data during cluster means calculation to generate a histogram. The amount of sampling depends on the amount of image data. For example, for a 1024x1024 image data, KCLUS will sample every other pixel during calculation of cluster means. When writing results to an output channel, however, all pixels will be classified.

The MASK parameter specifies the area within the input channel to be processed. Only the area under mask will be classified; the rest of the image will not be processed.

If a mask is not specified, the entire image is processed.

It is quite common for satellite images to have a lot of black-filled areas (with zero gray levels) that should not be included in the classification. To solve this problem, you can first run THR by setting the minimum and maximum threshold values to 1 and 255, respectively. A bitmap mask is thus created only on the image area. You can then specify this bitmap for the MASK parameter.

You can specify the number of clusters desired through the NUMCLUS parameter, using any value between 1 and 255. The initial seed values can be entered in a text file specified by the SEEDFILE parameter. If no seed file is provided, the function will generate seeds diagonally along the n-dimensional histogram.

The text file containing the initial seeds for 4 channels and 6 clusters has the following format:

      1   1   1   1            | 1st seed, channels 1,2,3,4
      5   3   5   9            | 2nd seed, channels 1,2,3,4
     40  43  20  10            | 3rd seed, channels 1,2,3,4
    100 101 140  50            | 4th seed, channels 1,2,3,4
    150 155 200 175            | 5th seed, channels 1,2,3,4
    240 200 195 140            | 6th seed, channels 1,2,3,4

In the example above, the numbers represent gray-level values. The values 5,3,5,9 represent the second seed gray-level values in channels 1, 2, 3, and 4 respectively.

Define the maximum number of iterations allowed (MAXITER), and the movement threshold (MOVETHRS). The execution terminates when the number of iterations reaches MAXITER, or when the movement of all cluster means is less than MOVETHRS.

The result of the clustering is a theme map directed to a specified image channel (DBOC). If the output channel is not specified, the results will not be saved to a channel. A theme map encodes each cluster with a unique gray-level value. The cluster number is represented by the gray level; for example, cluster 1 is assigned a gray-level value of 1, cluster 2 is assigned the gray-level value of 2. Gray level 0 represents unclassified clusters; therefore, if the theme map is later directed to the display, a pseudocolor table should be loaded so that each cluster is represented by a different color.

KCLUS allows you to specify a background gray-level value (BACKVAL) to be ignored during classification. If this value is specified, pixels with the defined background value will be assigned class 0 (null class).

KCLUS generates a report of the current cluster mean values and sample counts after each iteration.

Back to top

References

Julius T. Tou and Rafael C. Gonzalez. 1974. Pattern Recognition Principles. Addison-Wesley Publishing Co.

© PCI Geomatics Enterprises, Inc.®, 2026. All rights reserved.