NGCLUS2

Multi-bit Narendra-Goldberg clustering


EnvironmentsPYTHON :: EASI :: MODELER
Quick linksDescription :: Parameters :: Parameter descriptions :: Details :: Example :: References :: Related

Back to top

Description


Produces non-parametric multi-dimensional clustering, based on the algorithm developed by Narendra and Goldberg. Up to 16 multi-bit input channels can be used. The output is a theme map-directed image channel.Note that NGCLUS2 can take a long time (20-25 minutes) to write results to the database after module execution has completed.
Back to top

Parameters


ngclus2(file, dbic, dboc, mask, clthrs, samprm, smooth, siggen, res)

Name Type Caption Length Value range
FILE* str Input file name 1 -    
DBIC* List[int] Input raster channel(s) 1 - 16 -16 -
DBOC List[int] Output raster channel 0 - 1  
MASK List[int] Area mask 0 - 4 Xoffset, Yoffset, Xsize, Ysize
CLTHRS List[int] Cluster neighbor threshold 0 - 1 0 - 255
SAMPRM List[int] Minimum sample threshold 0 - 1 0 -
Default: 5
SMOOTH List[int] Histogram smoothing threshold 0 - 1 1 -
Default: 5
SIGGEN str Generate signatures 0 - 3 YES | NO
Default: NO
RES List[int] Resolution 0 - 1 0 -
Default: 1

* Required parameter
Back to top

Parameter descriptions

FILE

Specifies the name of the PCIDSK image file for which a histogram file will be generated.

DBIC

Specifies the input image channels for which a histogram file will be generated.

Up to 16 input channels may be specified. Input channels can be a combination of 8-bit, 16-bit, or 32-bit real. Duplicate channels are not allowed.

DBOC

Specifies the output channel to receive the clustering results. If this parameter is not specified, the clustering results will not be saved.

The output channel can be the same as the input channel.

If a mask is specified, only the area under the mask is processed.

MASK

Specifies the window or bitmap that defines the area to be processed within the input raster.

If a single value is specified, that value represents the channel number of the bitmap segment in the input file. Only the pixels under the bitmap are processed; the rest of the image remains unchanged.

If four values are specified, they define the x,y offsets and x,y dimensions of a rectangular window identifying the area to process. Xoffset, Yoffset define the upper-left starting pixel coordinates of the window. Xsize is the number of pixels that define the window width. Ysize is the number of lines that define the window height.

If no value is specified, the entire channel is processed.

CLTHRS

Specifies a cluster threshold distance in gray levels.

Two vectors are considered neighbors when the difference between the vectors in each channel is less than the value specified in this parameter. The default is the difference between the maximum and minimum gray-level values in all channels, divided by 64.

SAMPRM

Specifies the minimum number of samples allowed in a clustering, allowing you to eliminate clusters with very few samples.

If the number of samples in a cluster is less than the value specified in this parameter, each sample inside the cluster will be merged into a neighboring cluster.

SMOOTH

Specifies the histogram smoothing threshold.

If the histogram value of a vector is less than the value specified in this parameter, the value will be replaced by the average histogram value of the vector and its neighbors.

SIGGEN

Specifies whether to generate a signature for each cluster. The signatures can be used as input to MLC (Maximum Likelihood Classification) to classify other images.

Available options are:

A maximum of 1000 signatures can be created; signatures are not generated for class values greater than 1000.

RES

Specifies the scaling power used to scale down the gray-level value of each pixel during histogram generation.

Each pixel's gray-level value is divided by 2 to the power of Res. For example, if the input gray-level value is 256 and Res is set to 2, the resulting gray-level value will be 64.

Back to top

Details

NGCLUS2 is a generalized version of the NGCLUS (Narendra and Goldberg) algorithm. It avoids the four 8-bit input channel limitation imposed by the use of a hash table. To allow multi-bit data and more input channels, a hash table is not used, at the expense of computation time. NGCLUS2 is mainly for users who need to use Narendra and Goldberg's algorithm for more than four channels or for multi-bit data. Note that NGCLUS2 requires a high level of computation time. Also, a maximum of 1000 classes can be generated.

NGCLUS2 provides an alternative for unsupervised classification. It is based on the algorithm developed by P.M. Narendra and M. Goldberg. The clustering algorithm operates upon the histogram and isolates the vectors into clusters that are unimodal in the histogram, with the boundaries between clusters running through the valleys in the histogram. This is a reasonable way to characterize the clusters, which can be of any shape. The number of clusters need not be specified a priori and, the algorithm is noniterative.

Histogram generation is the first step in the histogram clustering procedure. Histogram clustering uses one of several non-parametric histogram-based algorithms for unsupervised image data.

NGCLUS2 creates a new histogram based on image data stored in up to 16 image channels (DBIC) in a specified input file.

The SMOOTH (Smoothing) parameter can be used to smooth the histogram by averaging the histogram values over the different neighborhoods of each vector. The histogram value of each vector is replaced by the new smoothed value. "Adaptive smoothing" is used. Only vectors with histogram values less than SMOOTH will be smoothed. This tends to smooth the histogram only over the low-density areas prone to noise.

The MASK parameter specifies the area within the input channel that will be processed. Only the area under mask will be classified; the rest of the image will not be processed.

If a single MASK value is specified, this value refers to a bitmap segment, which defines the area to be classified. If four values are specified, these define the x,y offsets and x,y dimensions of a rectangular window within the image to be classified.

If no mask is specified, the entire image is processed by default.

It is common for satellite images to have a lot of black-filled areas (with zero gray levels) which should not be included in the classification. To solve this problem, you can first run THR by setting the TVAL (Threshold Value) minimum and maximum values to 1 and 255, respectively. A bitmap mask is created only on the image area. You can then use this bitmap as input to the MASK parameter.

The result of the clustering is a theme map directed to a specified output image channel (DBOC). A theme map encodes each cluster with a unique gray level. For example, cluster 1 is assigned gray level 1, and cluster 2 is assigned gray level 2. Gray level 0 represents unclassified pixels. Therefore, if the theme map is later directed to the display, a pseudocolor table should be loaded so that each cluster is represented by a different color. If no output image channel is specified, the clustering results will not be saved.

NGCLUS2 generates a report of the total number of clusters and samples.

For more information about Narendra & Goldberg's algorithm, see NGCLUS and the paper cited in References.

Back to top

Example

This example generates clusters based on four input channels of the demo file 'irvine.pix'.

from pci.ngclus2 import *

file	=	"IRVINE.PIX"
dbic	=	[1,2,3,4]	# input channels
dboc	=	[7]	# output channel
mask	=	[]	# process entire image
clthrs	=	[1]	# maximum neighbor vector difference
samprm	=	[10]	# at least 10 samples per cluster
smooth	=	[3]	# maximum histogram value
siggen	=	"NO"	# no signature generation
res	=	[1]	# divide each gray level by 2

ngclus2( file, dbic, dboc, mask, clthrs, samprm, smooth, siggen, res )

This example produces the following a sample report.

RESULTS
-------
 
 Final Results :
 No. of Clusters : 72
 Cluster   Samples  :
 (  1)      6259
 (  2)       973
 (  3)      2603
 (  4)      9016
 (  5)     16281
 (  6)     26444
 (  7)        39
 (  8)    102557
 (  9)        19
 ( 10)       690
 ( 11)      1198
 ( 12)     12435
 ( 13)      3213
 ( 14)       786
 ( 15)     24540
 ( 16)       771
 ( 17)     20122
 ( 18)       259
 ( 19)      3268
 ( 20)       198
 ( 21)       980
 ( 22)       194
 ( 23)      2179
 ( 24)      2189
 ( 25)      1249
 ( 26)       924
 ( 27)       595
 ( 28)       254
 ( 29)       834
 ( 30)       294
 ( 31)       182
 ( 32)       606
 ( 33)      2854
 ( 34)      1744
 ( 35)       233
 ( 36)      1606
 ( 37)      1412
 ( 38)       157
 ( 39)       349
 ( 40)        23
 ( 41)        58
 ( 42)      3909
 ( 43)        23
 ( 44)        36
 ( 45)      1052
 ( 46)        53
 ( 47)       408
 ( 48)      1146
 ( 49)       401
 ( 50)        41
 ( 51)       111
 ( 52)        35
 ( 53)       586
 ( 54)        69
 ( 55)       164
 ( 56)        75
 ( 57)       372
 ( 58)       125
 ( 59)        69
 ( 60)       259
 ( 61)       369
 ( 62)       556
 ( 63)       200
 ( 64)       101
 ( 65)       184
 ( 66)        50
 ( 67)       312
 ( 68)       360
 ( 69)       166
 ( 70)       192
 ( 71)        54
 ( 72)        79
        --------
          262144

 Unclassified :          0
Back to top

References

Narendra & Goldberg. 1977. "A Non-parametric clustering scheme for Landsat". Pattern Recognition, vol.9. pp. 207-215.

© PCI Geomatics Enterprises, Inc.®, 2024. All rights reserved.