KCLUS

K-Means Clustering


EnvironmentsPYTHON :: EASI :: MODELER
Quick linksDescription :: Parameters :: Parameter descriptions :: Details :: Example :: References :: Related

Back to top

Description


Performs unsupervised clustering using the K-means (Minimum Distance) method on image data for up to 255 clusters (classes) and 16 channels. The output is a theme map directed image channel.
Back to top

Parameters


Name Type Caption Length Value range
FILE * String Input file name 1 - 192  
DBIC * Integer Input raster channels 1 - 16 -1024 - 1024
DBOC Integer Output raster channel 0 - 1  
MASK Integer Area mask 0 - 4 Xoffset, Yoffset, Xsize, Ysize
NUMCLUS Integer Number of cluster centers 0 - 1 1 - 255
Default: 16
SEEDFILE String Seed file for initial centers 0 - 192  
MAXITER Integer Maximum number of iterations 0 - 1 1 - 10000
Default: 20
MOVETHRS Float Movement threshold 0 - 1 0.0 - 1.0
Default: 0.01
SIGGEN String Generate signatures 0 - 3 YES | NO
Default: NO
BACKVAL Float Background gray-level value 0 - 1 0 -
NSAM Integer Number of pixel values to sample 0 - 1 1 -
Default: 262144
REPORT String Report mode 0 - 192 Quick links

* Required parameter
Back to top

Parameter descriptions

FILE

Specifies the name of the PCIDSK image file on which to perform clustering.

DBIC

Specifies the input channels to use for the clustering operation.

Up to 16 input channels may be specified. Input channels can be a combination of 8-bit, 16-bit, or 32-bit real. Duplicate channels are not allowed.

Ranges of channels or segments can be specified with negative values. For example, {1,-4,10} is internally expanded to {1,2,3,4,10}. When you are not specifying a range in this way, only 48 numbers can be specified explicitly.

DBOC

Specifies the output channel to receive the clustering results.

If this parameter is not specified, results will not be saved into a channel.

To generate a signature segment, this parameter must be specified.

The output channel can be the same as the input channel (DBIC). Only the area under the area mask (MASK) is written to the output.

MASK

Optionally specifies the window or bitmap that defines the area to be processed within the input raster. If this parameter is not specified, the entire channel is processed.

A window mask is specified as follows:
MASK=Xoffset, Yoffset, Xsize, Ysize

Xoffset, Yoffset define the upper-left starting pixel coordinates of the window. Xsize is the number of pixels that define the window width. Ysize is the number of lines that define the window height.

For a bitmap mask, you can specify the bitmap segment number from the input file that you want to use. All the pixels within the specified segment having a pixel value of 1 define the area to be processed.

Only the area under the mask is written to the output.

NUMCLUS

Specifies the number of clusters (classes) desired. The default value is 16.

SEEDFILE

Specifies the text file used to read in initial seeds. If this parameter is not specified, seeds are generated diagonally along the n-dimensional histogram.

MAXITER

Specifies the maximum number of iterations for calculating the cluster mean positions.

MOVETHRS

Specifies the movement threshold, in percentage of cluster means.

If the movement of all cluster means is less than the value specified for this parameter, the convergence has been completed. For example, for all cluster means, the following situation causes the execution to terminate:

  New cluster mean position - Old cluster mean position
  ----------------------------------------------------- < MOVETHRS
              Old cluster mean position

SIGGEN

Specifies whether to generate the signature for each cluster. These signature segments can be input to MLC (Maximum Likelihood Classification) to classify other images.

Available options are:

If this parameter is set to YES, DBOC must be specified.

BACKVAL

Optionally specifies a background gray-level value to be ignored during classification. If this parameter is specified, pixels with the specified value will be assigned class 0 (null class).

NSAM

Specifies the number of samples to collect on which to perform the the iterative clustering.

If this parameter is not specified, it defaults to 262144. If the specified value is larger than the total number of pixels in the image, all the pixels in the image will be used.

The time to compute each iteration is proportional to the number of samples used; using more than the default number of samples may significantly slow down the the clustering process. Also, because all the samples are stored in memory, a large NSAM value may lead to higher memory requirements. With 262144 samples and five bands of 8-bit input data, approximately 1.3MB of memory would be required; with NSAM set to 2000000, it would require 10MB of memory.

REPORT

Specifies where to direct the generated report.

Available options are:

Back to top

Details

KCLUS performs unsupervised clustering using the K-means (Minimum Distance) method on image data. Up to 16 image channels can be analyzed, and 255 clusters (classes) found. The output is a theme map directed image channel. The K-Means method is used to classify image data into different clusters.

Due to the large amount of memory required, KCLUS will sample a subset of the image data during cluster means calculation to generate a histogram. The amount of sampling depends on the amount of image data. For example, for a 1024x1024 image data, KCLUS will sample every other pixel during calculation of cluster means. When writing results to an output channel, however, all pixels will be classified.

The MASK parameter specifies the area within the input channel to be processed. Only the area under mask will be classified; the rest of the image will not be processed.

If a single MASK value is specified, this refers to a bitmap segment that defines the area to be classified. When four values are specified, these define the x,y offsets and x,y dimensions of a rectangular window within the image to be classified.

If a mask is not specified, the entire image is processed.

It is quite common for satellite images to have a lot of black-filled areas (with zero gray levels) that should not be included in the classification. To solve this problem, you can first run THR by setting the minimum and maximum threshold values to 1 and 255, respectively. A bitmap mask is thus created only on the image area. You can then specify this bitmap for the MASK parameter.

You can specify the number of clusters desired through the NUMCLUS parameter, using any value between 1 and 255. The initial seed values can be entered in a text file specified by the SEEDFILE parameter. If no seed file is provided, the function will generate seeds diagonally along the n-dimensional histogram.

The text file containing the initial seeds for 4 channels and 6 clusters has the following format:

      1   1   1   1            | 1st seed, channels 1,2,3,4
      5   3   5   9            | 2nd seed, channels 1,2,3,4
     40  43  20  10            | 3rd seed, channels 1,2,3,4
    100 101 140  50            | 4th seed, channels 1,2,3,4
    150 155 200 175            | 5th seed, channels 1,2,3,4
    240 200 195 140            | 6th seed, channels 1,2,3,4

In the example above, the numbers represent gray-level values. The values 5,3,5,9 represent the second seed gray-level values in channels 1, 2, 3, and 4 respectively.

Define the maximum number of iterations allowed (MAXITER), and the movement threshold (MOVETHRS). The execution terminates when the number of iterations reaches MAXITER, or when the movement of all cluster means is less than MOVETHRS.

The result of the clustering is a theme map directed to a specified image channel (DBOC). If the output channel is not specified, the results will not be saved to a channel. A theme map encodes each cluster with a unique gray-level value. The cluster number is represented by the gray level; for example, cluster 1 is assigned a gray-level value of 1, cluster 2 is assigned the gray-level value of 2. Gray level 0 represents unclassified clusters; therefore, if the theme map is later directed to the display, a pseudocolor table should be loaded so that each cluster is represented by a different color.

KCLUS provides the option of generating signatures for each cluster through the SIGGEN parameter. If SIGGEN is YES, a signature for each cluster is generated. You can use MLC (Maximum Likelihood Classifier) to classify other images.

KCLUS allows you to specify a background gray-level value (BACKVAL) to be ignored during classification. If this value is specified, pixels with the defined background value will be assigned class 0 (null class).

KCLUS generates a report of the current cluster mean values and sample counts after each iteration.

Back to top

Example

Find five clusters from five channels of the file 'irvine.pix'.

EASI>file	=	"irvine.pix"
EASI>dbic	=	1,2,3,4,5	! input channels
EASI>dboc	=	9	! output channel
EASI>mask	=		! process entire image
EASI>numclus	=	5	! requested number of clusters
EASI>seedfile	=		!  automatically generate seeds
EASI>maxiter	=	20	! no more than 20 iterations
EASI>movethrs	=	0.01
EASI>siggen	=		! do not generate signatures
EASI>backval	=		! no background value
EASI>nsam	=		! default number of sample points
EASI>report	=	"term"

EASI>RUN KCLUS

The following shows the output of the example:

Iteration : 1

 No. of Clusters :  5

 Cluster   Samples   Mean Position :

 (  1)     73507       55.81042    19.86229    19.91672    29.82999
                       13.69804
 (  2)    187513       67.61027    27.55517    32.50297    43.59904
                       30.56694
 (  3)      1093      120.00732    57.25526    78.13449    66.09241
                       58.46386
 (  4)        28      207.92857   103.00000   144.28572   115.46429
                       71.10714
 (  5)         3      238.33333   119.00000   163.66667   129.00000
                       71.66666
 Iteration : ....
 Iteration : ....

Iteration : 20

 No. of Clusters :  5

 Cluster   Samples   Mean Position :

 (  1)     71512       56.03448    19.94805    20.10361    28.69195
                       13.90687
 (  2)    113316       62.51694    24.58107    27.36789    43.10883
                       25.68433
 (  3)     65801       72.15125    30.23051    37.38433    44.27776
                       36.01833
 (  4)      9958       89.60956    40.50070    53.23549    49.50381
                       44.58837
 (  5)      1557      119.91137    56.73282    76.92101    64.70135
                       55.04303

 __________________________________________________________________

 FINAL RESULTS :

 NO. OF CLUSTERS :  5

 CLUSTER   PIXELS     MEAN POSITION :
 (  1)     70108       55.98217    19.90998    20.04543    28.51610
                       13.80305
 (  2)    111994       62.36456    24.47869    27.19808    42.98307
                       25.44976
 (  3)     66732       71.64445    29.95717    36.93435    44.30622
                       35.79205
 (  4)     11424       88.10084    39.54902    51.73206    48.56434
                       43.33421
 (  5)      1886      117.32185    55.44327    75.09173    63.34253
                       54.74974
        ________
          262144
Back to top

References

Julius T. Tou and Rafael C. Gonzalez. 1974. Pattern Recognition Principles. Addison-Wesley Publishing Co.

© PCI Geomatics Enterprises, Inc.®, 2026. All rights reserved.