FUZCLUS

Description

FUZCLUS performs unsupervised clustering using the Fuzzy K-means method on image data for up to 255 clusters (classes) and 16 channels. The output is a theme map-directed database image channel.

Parameters

fuzclus(file, dbic, dboc, mask, numclus, seedfile, maxiter, movethrs, siggen, backval, nsam)

Name	Type	Caption	Length	Value range
FILE *	str	Input file name	1 -
DBIC *	List[int]	Input raster channel(s)	1 - 16
DBOC	List[int]	Output raster channel	0 - 1
MASK	List[int]	Area mask	0 - 4
NUMCLUS	List[int]	Number of cluster centers	0 - 1	1 - 255 Default: 16
SEEDFILE	str	Seed file for initial centers	0 -
MAXITER	List[int]	Maximum number of iterations	0 - 1	0 - 10000 Default: 20
MOVETHRS	List[float]	Movement threshold	0 - 1	0.0 - 1.0 Default: 0.01
SIGGEN	str	Generate Signatures	0 - 3	YES \| NO Default: NO
BACKVAL	List[float]	Background gray-level value	0 - 1	0.0 -
NSAM	List[int]	Number of pixel values to sample	0 - 1	0 - Default: 262144

* Required parameter

Parameter descriptions

FILE

Specifies the name of the PCIDSK image file for which to perform clustering.

DBIC

Specifies the image channels that will be used to perform clustering.

Up to 16 input channels may be specified. Input channels can be a combination of 8-bit, 16-bit, or 32-bit real. Duplicate channels are not allowed.

DBOC

Specifies the output channel to receive the clustering results.

If no value is specified, results will not be saved to a channel.

For signature generation (SIGGEN), DBOC must be specified.

DBOC can be equal to DBIC. If a MASK is specified, only the area under the mask is written to the output channel.

MASK

Specifies the window or bitmap that defines the area of the input raster to be processed.

If a single value is specified, that value represents the channel number of the bitmap segment in the input file. Only the pixels under the bitmap are processed; the rest of the image remains unchanged.

If four values are specified, they define the X,Y offsets and X,Y dimensions of a rectangular window identifying the area to process. Xoffset, Yoffset define the upper-left starting pixel coordinates of the window. Xsize is the number of pixels that define the window width. Ysize is the number of pixels that define the window height.

If no value is specified, the entire channel is processed.

NUMCLUS

Specifies the number of clusters (classes) to find. Up to 255 clusters may be specified; the default value is 16.

SEEDFILE

Specifies the text file from which to read initial seeds. If no filename is given, seeds will be generated diagonally along the n-dimensional histogram.

MAXITER

Specifies the maximum number of iterations in calculating the cluster mean positions. The default maximum is 20 iterations.

MOVETHRS

Specifies the movement threshold for the relative change in cluster centroids.

If the movement of all cluster centroids is less than MOVETHRS, the program has converged.

The default movement threshold is 0.01.

SIGGEN

Specifies whether to generate a signature segment for each cluster.

Signatures may be used as input for MLC (Maximum Likelihood Classification) to classify other images.

If this parameter is set to 'YES', DBOC must also be specified.

BACKVAL

Optionally specifies a background gray-level value to be ignored during classification. If this parameter is specified, pixels with the given gray-level value will be assigned class 0 (null class).

NSAM

Specifies the number of samples to collect to perform the iterative clustering. The default value is 262144. If the specified number is larger than the total number of pixels in the image, all the pixels in the image will be used.

The time required to compute each iteration is proportional to the number of samples used; using a great deal more than the default number of samples may significantly increase the processing time of the clustering process. Also, because all the samples are stored in memory, a high NSAM value may require a large amount of memory. For example, with 262144 samples and five bands of 8-bit input data, FUZCLUS would require approximately 1.3MB of memory; with NSAM set to 2000000, it would need about 10MB.

Details

FUZCLUS performs unsupervised clustering using the Fuzzy K-means method to classify image data into different clusters. Up to 16 image channels can be analyzed, and 255 clusters (classes) found.

FUZCLUS reads in image data from a file specified by the FILE parameter. Input channels are specified using the DBIC (Input Image Layers) parameter.

The MASK parameter specifies the area within the input channel to process. Only the area under the mask is read; the rest of the image is not used.

If a single MASK value is specified, this value points to a bitmap segment, which defines the area to be processed. When four MASK values are specified, these values define the X,Yoffsets and X,Y dimensions of a rectangular window within the image to process.

If MASK is not specified, the entire image is sampled.

It is common for satellite images to have several black-filled areas (with no gray levels) that should not be included in the classification. To solve this problem, the user can first run THR by setting the TVAL minimum and maximum values to 1 and 255, respectively. This creates a bitmap mask only on the image area. The user can then input this bitmap as the Area Mask for FUZCLUS.

Users may specify the desired number of clusters (NUMCLUS); acceptable values are between 1 and 255. The initial seed values can be entered in a text file and specified using the SEEDFILE parameter. If no file name is specified for the Seed File (SEEDFILE) parameter, seeds will be generated diagonally along the n-dimensional histogram.

The Seed File text file containing the initial seeds for 4 channels and 6 clusters would have the following format:

        1   1   1   1            | 1st seed, channels 1,2,3,4
        5   3   5   9            | 2nd seed, channels 1,2,3,4
       40  43  20  10            | 3rd seed, channels 1,2,3,4
      100 101 140  50            | 4th seed, channels 1,2,3,4
      150 155 200 175            | 5th seed, channels 1,2,3,4
      240 200 195 140            | 6th seed, channels 1,2,3,4

In the example above, the numbers represent gray-level values. The values 5,3,5,9 represent the second seed gray-level values in channels 1, 2, 3, and 4, respectively.

The maximum number of iterations allowed in FUZCLUS is defined through the Maximum Iterations (MAXITER) parameter, and the Movement Threshold through the MOVETHRS parameter.

The result of the clustering is a theme map directed to a specified output image channel (DBOC). If an output channel is not specified, results will not be saved to a channel.

A theme map encodes each cluster with a unique gray level. The cluster number is represented by a gray level; for example, cluster 1 is assigned the gray level of 1, and cluster 2 is assigned the gray level of 2. Gray level 0 represents unclassified pixels. Therefore, if the theme map is later directed to the display, a pseudocolor table should be loaded so that each cluster is represented by a different color.

FUZCLUS allows the user to specify a background gray-level value (BACKVAL) to be ignored during classification. If this value is specified, pixels with the defined background gray-level value will be assigned class 0 (null class).

FUZCLUS generates a report of the current cluster mean values and sample counts after each iteration.

Example

The following example finds 5 clusters from 5 channels of the file 'irvine.pix'.

from pci.fuzclus import fuzclus

file		=	"irvine.pix"
dbic		=	[1,2,3,4,5]	#input channels
dboc		=	[9]			#output channel
mask		=	[]
numclus		=	[5]			#requested number of clusters
seedfile	=	""			#automatically generate seeds
maxiter		=	[20]		#no more than 20 iterations
movethrs	=	[0.01]
siggen		=	""			#do not generate signatures
backval		=	[]			#no background value to be ignored
nsam		=	[]			#use default 262144

fuzclus( file, dbic, dboc, mask, numclus, seedfile, maxiter, movethrs, siggen, backval, nsam )

The example produces the following output:

   Iteration :  1
  
   No. of Clusters :  5
  
   Cluster   Samples   Mean Position :
  
   (  1)     73507       55.81042    19.86229    19.91672    29.82999
                         13.69804
   (  2)    187513       67.61027    27.55517    32.50297    43.59904
                         30.56694
   (  3)      1093      120.00732    57.25526    78.13449    66.09241
                         58.46386
   (  4)        28      207.92857   103.00000   144.28572   115.46429
                         71.10714
   (  5)         3      238.33333   119.00000   163.66667   129.00000
                         71.66666
   Iteration : ....
   Iteration : ....
  
  Iteration : 20 
  
   No. of Clusters :  5
  
   Cluster   Samples   Mean Position :
  
   (  1)     71512       56.03448    19.94805    20.10361    28.69195
                         13.90687
   (  2)    113316       62.51694    24.58107    27.36789    43.10883
                         25.68433
   (  3)     65801       72.15125    30.23051    37.38433    44.27776
                         36.01833
   (  4)      9958       89.60956    40.50070    53.23549    49.50381
                         44.58837
   (  5)      1557      119.91137    56.73282    76.92101    64.70135
                         55.04303
  
    +---------------------------------------------------------------+
  
   FINAL RESULTS :
  
   NO. OF CLUSTERS :  5
  
   CLUSTER   PIXELS     MEAN POSITION :
   (  1)     70108       55.98217    19.90998    20.04543    28.51610
                         13.80305
   (  2)    111994       62.36456    24.47869    27.19808    42.98307
                         25.44976
   (  3)     66732       71.64445    29.95717    36.93435    44.30622
                         35.79205
   (  4)     11424       88.10084    39.54902    51.73206    48.56434
                         43.33421
   (  5)      1886      117.32185    55.44327    75.09173    63.34253
                         54.74974
           +------+
            262144

Algorithm

The fuzzy K-means algorithm is based on the minimization of the following objective function, with respect to U, a fuzzy K-partition of the data set, and to V, a set of K prototypes:

J_q(U,V) = sum(j=1,N)sum(i=1,K) u_{ij}^q d^2(Xj,Vi)

where:

Xj is the j-th feature vector
Vi is the centroid of the i-th cluster
u_{ij} is the degree of membership of Xj in the i-th cluster
d(Xj,Vi) is some distance metric between Xj and Vi (here we use the Euclidean distance)
N is the number of data points
K is the number of clusters
q is the weighting exponent for u_{ij} and controls the "fuzziness" of the resulting cluster.

In this implementation, a fixed value of q=2 is chosen, which appears to be reasonable for most applications. It also allows a fast implementation.

Fuzzy partition is carried out through an iterative optimization:

Choose initial cluster centroids (seeds) Vi.

Compute the degree of membership of all feature vectors in all the clusters

                      (1/d^2(Xj,Vi))^(1/(q-1))
       u_{ij} = -----------------------------------
                sum(k=1,K) (1/d^2(Xj,Vk))^(1/(q-1))

Compute new centroids Vi_new:

                   sum(j=1,N)(u_{ij})^q Xj
       Vi_new = -----------------------------
                    sum(j=1,N)(u_{ij})^q

When the movement of centroids (relative changes) is less than a predetermined threshold (MOVETHRS), stop the iteration; otherwise go to step 2. The algorithm will also terminate when a maximum number of iterations is reached.
Finally, a data point Xj is assigned to cluster i if the fuzzy membership u_{ij} >= u_{kj} for all k clusters.

The cluster centroids Vi will be saved in a signature segment, if requested.

For more details about the Fuzzy K-means method, see the References section.

References

J.C. Bezdek, "Fuzzy mathematics in pattern classification", Ph.D. dissertation, Cornell Univ., Itheca, NY, 1973.

Environments	PYTHON :: EASI :: MODELER
Quick links	Description :: Parameters :: Parameter descriptions :: Details :: Example :: Algorithm :: References :: Related