| Environments | PYTHON :: EASI :: MODELER |
| Quick links | Description :: Parameters :: Parameter descriptions :: Details :: Example :: Algorithm :: References :: Related |
| Back to top |
| Back to top |
fuzclus(file, dbic, dboc, mask, numclus, seedfile, maxiter, movethrs, siggen, backval, nsam)
| Name | Type | Caption | Length | Value range |
|---|---|---|---|---|
| FILE * | str | Input file name | 1 - | |
| DBIC * | List[int] | Input raster channel(s) | 1 - 16 | |
| DBOC | List[int] | Output raster channel | 0 - 1 | |
| MASK | List[int] | Area mask | 0 - 4 | |
| NUMCLUS | List[int] | Number of cluster centers | 0 - 1 | 1 - 255 Default: 16 |
| SEEDFILE | str | Seed file for initial centers | 0 - | |
| MAXITER | List[int] | Maximum number of iterations | 0 - 1 | 0 - 10000 Default: 20 |
| MOVETHRS | List[float] | Movement threshold | 0 - 1 | 0.0 - 1.0 Default: 0.01 |
| SIGGEN | str | Generate Signatures | 0 - 3 | YES | NO Default: NO |
| BACKVAL | List[float] | Background gray-level value | 0 - 1 | 0.0 - |
| NSAM | List[int] | Number of pixel values to sample | 0 - 1 | 0 - Default: 262144 |
| Back to top |
FILE
Specifies the name of the PCIDSK image file for which to perform clustering.
DBIC
Specifies the image channels that will be used to perform clustering.
Up to 16 input channels may be specified. Input channels can be a combination of 8-bit, 16-bit, or 32-bit real. Duplicate channels are not allowed.
DBOC
Specifies the output channel to receive the clustering results.
If no value is specified, results will not be saved to a channel.
For signature generation (SIGGEN), DBOC must be specified.
DBOC can be equal to DBIC. If a MASK is specified, only the area under the mask is written to the output channel.
MASK
Specifies the window or bitmap that defines the area of the input raster to be processed.
If a single value is specified, that value represents the channel number of the bitmap segment in the input file. Only the pixels under the bitmap are processed; the rest of the image remains unchanged.
If four values are specified, they define the X,Y offsets and X,Y dimensions of a rectangular window identifying the area to process. Xoffset, Yoffset define the upper-left starting pixel coordinates of the window. Xsize is the number of pixels that define the window width. Ysize is the number of pixels that define the window height.
If no value is specified, the entire channel is processed.
NUMCLUS
Specifies the number of clusters (classes) to find. Up to 255 clusters may be specified; the default value is 16.
SEEDFILE
Specifies the text file from which to read initial seeds. If no filename is given, seeds will be generated diagonally along the n-dimensional histogram.
MAXITER
Specifies the maximum number of iterations in calculating the cluster mean positions. The default maximum is 20 iterations.
MOVETHRS
Specifies the movement threshold for the relative change in cluster centroids.
If the movement of all cluster centroids is less than MOVETHRS, the program has converged.
The default movement threshold is 0.01.
SIGGEN
Specifies whether to generate a signature segment for each cluster.
Signatures may be used as input for MLC (Maximum Likelihood Classification) to classify other images.
If this parameter is set to 'YES', DBOC must also be specified.
BACKVAL
Optionally specifies a background gray-level value to be ignored during classification. If this parameter is specified, pixels with the given gray-level value will be assigned class 0 (null class).
NSAM
Specifies the number of samples to collect to perform the iterative clustering. The default value is 262144. If the specified number is larger than the total number of pixels in the image, all the pixels in the image will be used.
The time required to compute each iteration is proportional to the number of samples used; using a great deal more than the default number of samples may significantly increase the processing time of the clustering process. Also, because all the samples are stored in memory, a high NSAM value may require a large amount of memory. For example, with 262144 samples and five bands of 8-bit input data, FUZCLUS would require approximately 1.3MB of memory; with NSAM set to 2000000, it would need about 10MB.
| Back to top |
FUZCLUS performs unsupervised clustering using the Fuzzy K-means method to classify image data into different clusters. Up to 16 image channels can be analyzed, and 255 clusters (classes) found.
FUZCLUS reads in image data from a file specified by the FILE parameter. Input channels are specified using the DBIC (Input Image Layers) parameter.
The MASK parameter specifies the area within the input channel to process. Only the area under the mask is read; the rest of the image is not used.
If a single MASK value is specified, this value points to a bitmap segment, which defines the area to be processed. When four MASK values are specified, these values define the X,Yoffsets and X,Y dimensions of a rectangular window within the image to process.
If MASK is not specified, the entire image is sampled.
It is common for satellite images to have several black-filled areas (with no gray levels) that should not be included in the classification. To solve this problem, the user can first run THR by setting the TVAL minimum and maximum values to 1 and 255, respectively. This creates a bitmap mask only on the image area. The user can then input this bitmap as the Area Mask for FUZCLUS.
Users may specify the desired number of clusters (NUMCLUS); acceptable values are between 1 and 255. The initial seed values can be entered in a text file and specified using the SEEDFILE parameter. If no file name is specified for the Seed File (SEEDFILE) parameter, seeds will be generated diagonally along the n-dimensional histogram.
The Seed File text file containing the initial seeds for 4 channels and 6 clusters would have the following format:
1 1 1 1 | 1st seed, channels 1,2,3,4
5 3 5 9 | 2nd seed, channels 1,2,3,4
40 43 20 10 | 3rd seed, channels 1,2,3,4
100 101 140 50 | 4th seed, channels 1,2,3,4
150 155 200 175 | 5th seed, channels 1,2,3,4
240 200 195 140 | 6th seed, channels 1,2,3,4
In the example above, the numbers represent gray-level values. The values 5,3,5,9 represent the second seed gray-level values in channels 1, 2, 3, and 4, respectively.
The maximum number of iterations allowed in FUZCLUS is defined through the Maximum Iterations (MAXITER) parameter, and the Movement Threshold through the MOVETHRS parameter.
The result of the clustering is a theme map directed to a specified output image channel (DBOC). If an output channel is not specified, results will not be saved to a channel.
A theme map encodes each cluster with a unique gray level. The cluster number is represented by a gray level; for example, cluster 1 is assigned the gray level of 1, and cluster 2 is assigned the gray level of 2. Gray level 0 represents unclassified pixels. Therefore, if the theme map is later directed to the display, a pseudocolor table should be loaded so that each cluster is represented by a different color.
FUZCLUS allows the user to specify a background gray-level value (BACKVAL) to be ignored during classification. If this value is specified, pixels with the defined background gray-level value will be assigned class 0 (null class).
FUZCLUS generates a report of the current cluster mean values and sample counts after each iteration.
| Back to top |
The following example finds 5 clusters from 5 channels of the file 'irvine.pix'.
from pci.fuzclus import fuzclus file = "irvine.pix" dbic = [1,2,3,4,5] #input channels dboc = [9] #output channel mask = [] numclus = [5] #requested number of clusters seedfile = "" #automatically generate seeds maxiter = [20] #no more than 20 iterations movethrs = [0.01] siggen = "" #do not generate signatures backval = [] #no background value to be ignored nsam = [] #use default 262144 fuzclus( file, dbic, dboc, mask, numclus, seedfile, maxiter, movethrs, siggen, backval, nsam )
The example produces the following output:
Iteration : 1
No. of Clusters : 5
Cluster Samples Mean Position :
( 1) 73507 55.81042 19.86229 19.91672 29.82999
13.69804
( 2) 187513 67.61027 27.55517 32.50297 43.59904
30.56694
( 3) 1093 120.00732 57.25526 78.13449 66.09241
58.46386
( 4) 28 207.92857 103.00000 144.28572 115.46429
71.10714
( 5) 3 238.33333 119.00000 163.66667 129.00000
71.66666
Iteration : ....
Iteration : ....
Iteration : 20
No. of Clusters : 5
Cluster Samples Mean Position :
( 1) 71512 56.03448 19.94805 20.10361 28.69195
13.90687
( 2) 113316 62.51694 24.58107 27.36789 43.10883
25.68433
( 3) 65801 72.15125 30.23051 37.38433 44.27776
36.01833
( 4) 9958 89.60956 40.50070 53.23549 49.50381
44.58837
( 5) 1557 119.91137 56.73282 76.92101 64.70135
55.04303
+---------------------------------------------------------------+
FINAL RESULTS :
NO. OF CLUSTERS : 5
CLUSTER PIXELS MEAN POSITION :
( 1) 70108 55.98217 19.90998 20.04543 28.51610
13.80305
( 2) 111994 62.36456 24.47869 27.19808 42.98307
25.44976
( 3) 66732 71.64445 29.95717 36.93435 44.30622
35.79205
( 4) 11424 88.10084 39.54902 51.73206 48.56434
43.33421
( 5) 1886 117.32185 55.44327 75.09173 63.34253
54.74974
+------+
262144
| Back to top |
The fuzzy K-means algorithm is based on the minimization of the following objective function, with respect to U, a fuzzy K-partition of the data set, and to V, a set of K prototypes:
J_q(U,V) = sum(j=1,N)sum(i=1,K) u_{ij}^q d^2(Xj,Vi)
In this implementation, a fixed value of q=2 is chosen, which appears to be reasonable for most applications. It also allows a fast implementation.
Choose initial cluster centroids (seeds) Vi.
Compute the degree of membership of all feature vectors in all the clusters
(1/d^2(Xj,Vi))^(1/(q-1))
u_{ij} = -----------------------------------
sum(k=1,K) (1/d^2(Xj,Vk))^(1/(q-1))
Compute new centroids Vi_new:
sum(j=1,N)(u_{ij})^q Xj
Vi_new = -----------------------------
sum(j=1,N)(u_{ij})^q
When the movement of centroids (relative changes) is less than a predetermined threshold (MOVETHRS), stop the iteration; otherwise go to step 2. The algorithm will also terminate when a maximum number of iterations is reached.
Finally, a data point Xj is assigned to cluster i if the fuzzy membership u_{ij} >= u_{kj} for all k clusters.
The cluster centroids Vi will be saved in a signature segment, if requested.
For more details about the Fuzzy K-means method, see the References section.
| Back to top |
J.C. Bezdek, "Fuzzy mathematics in pattern classification", Ph.D. dissertation, Cornell Univ., Itheca, NY, 1973.
© PCI Geomatics Enterprises, Inc.®, 2026. All rights reserved.