| Environments | PYTHON :: EASI :: MODELER |
| Quick links | Description :: Parameters :: Parameter descriptions :: Details :: Example :: Algorithm :: References :: Related |
| Back to top |
| Back to top |
isoclus(file, dbic, dboc, mask, numclus, maxclus, minclus, seedfile, maxiter, movethrs, siggen, samprm, stdv, lump, maxpair, backval, nsam)
| Name | Type | Caption | Length | Value range |
|---|---|---|---|---|
| FILE * | str | Input file name | 1 - | |
| DBIC * | List[int] | Input raster channel(s) | 1 - 16 | -1024 - 1024 |
| DBOC | List[int] | Output raster channel | 0 - 1 | |
| MASK | List[int] | Area Mask | 0 - 4 | Xoffset, Yoffset, Xsize, Ysize |
| NUMCLUS | List[int] | Number of clusters | 0 - 1 | 1 - 255 Default: 16 |
| MAXCLUS | List[int] | Maximum number of clusters | 0 - 1 | 1 - 255 Default: 16 |
| MINCLUS | List[int] | Minimum number of clusters | 0 - 1 | 1 - 255 Default: 16 |
| SEEDFILE | str | Initial seed text file | 0 - | |
| MAXITER | List[int] | Maximum number of iterations | 0 - 1 | 1 - 10000 Default: 20 |
| MOVETHRS | List[float] | Movement threshold | 0 - 1 | 0.0 - 1.0 Default: 0.01 |
| SIGGEN | str | Generate signatures | 0 - 3 | YES | NO Default: NO |
| SAMPRM | List[int] | Minimum sample threshold | 0 - 1 | 0 - 255 Default: 5 |
| STDV * | List[float] | Standard deviation | 1 - 1 | 0.0 - Default: 10.0 |
| LUMP | List[float] | Lumping parameter | 0 - 1 | 0.0 - Default: 1.0 |
| MAXPAIR | List[int] | Maximum number of lumping pairs | 0 - 1 | 0 - 255 Default: 5 |
| BACKVAL | List[float] | Background gray-level value | 0 - 1 | |
| NSAM | List[int] | Number of image points used in the sample | 0 - 1 | 1 - Default: 262144 |
| Back to top |
FILE
Specifies the name of the PCIDSK image file on which to perform clustering.
DBIC
Specifies the input channels to use for the clustering operation.
Up to 16 input channels may be specified. Input channels can be a combination of 8-bit, 16-bit, or 32-bit real. Duplicate channels are not allowed.
DBOC
Specifies the output channel to receive the clustering results.
If this parameter is not specified, results will not be saved into a channel.
To generate a signature segment, this parameter must be specified.
The output channel can be the same as the input channel (DBIC). Only the area under the area mask (MASK) is written to the output.
MASK
Optionally specifies the window or bitmap that defines the area to be processed within the input raster. If this parameter is not specified, the entire channel is processed.
MASK=Xoffset, Yoffset, Xsize, Ysize
Xoffset, Yoffset define the upper-left starting pixel coordinates of the window. Xsize is the number of pixels that define the window width. Ysize is the number of lines that define the window height.
For a bitmap mask, you can specify the bitmap segment number from the input file that you want to use. All the pixels within the specified segment having a pixel value of 1 define the area to be processed.
Only the area under the mask is written to the output.
NUMCLUS
Specifies the number of clusters (classes) desired. Note that this is only an estimate; the final number of clusters may vary. You can, however, limit the variation by setting the maximum (MAXCLUS) and minimum (MINCLUS) cluster parameters. Because cluster 256 is reserved for discarded clusters, the range of this parameter is 1 <= x < 255. The default value is 16.
MAXCLUS
Specifies the maximum number of clusters allowed. This parameter limits the total number of clusters allowed during splitting.
Acceptable values are 1 <= x <=255; the default value is 16.
MINCLUS
Specifies the minimum number of clusters allowed. This parameter limits the total number of clusters allowed during lumping.
Acceptable values are 1 <= x <=255; the default value is 16.
SEEDFILE
Specifies the text file used to read in initial seeds. If this parameter is not specified, seeds are generated diagonally along the n-dimensional histogram.
To specify the seeds for 4 channels and 6 clusters in a text file, use the following format:
1 1 1 1 | 1st seed position in channel 1,2,3,4
5 3 5 9 | 2nd seed position in channel 1,2,3,4
40 43 20 10 | 3rd seed position in channel 1,2,3,4
100 101 140 50 | 4th seed position in channel 1,2,3,4
150 155 200 175 | 5th seed position in channel 1,2,3,4
240 200 195 140 | 6th seed position in channel 1,2,3,4
MAXITER
Specifies the maximum number of iterations for calculating the cluster mean positions.
MOVETHRS
Specifies the movement threshold, in percentage of cluster means.
If the movement of all cluster means is less than the value specified for this parameter, the convergence has been completed. For example, for all cluster means, the following situation causes the execution to terminate:
New cluster mean position - Old cluster mean position
----------------------------------------------------- < MOVETHRS
Old cluster mean position
SIGGEN
Specifies whether to generate the signature for each cluster. These signature segments can be input to MLC (Maximum Likelihood Classification) to classify other images.
If this parameter is set to YES, DBOC must be specified.
SAMPRM
Specifies the minimum sample threshold. The number of samples in a cluster domain is compared to the value of this parameter. When the number of samples in a cluster is less than the specified minimum threshold, the cluster is discarded and the total number of clusters is reduced by 1.
then splitting of the cluster will occur.
STDV
Specifies the standard deviation. If a cluster has a standard deviation greater than that specified in this parameter, splitting may occur. The default value of 10.0 is reasonable.
LUMP
Specifies the lumping parameter.
If the distance between two cluster centers is less than LUMP, the total number of clusters is greater than MINCLUS, and the number of lumped pairs is less than MAXPAIR, lumping of clusters will occur.
MAXPAIR
Specifies the maximum number of pairs of cluster centers that can be lumped during each iteration.
BACKVAL
Optionally specifies a background gray-level value to be ignored during classification. If this parameter is specified, pixels with the specified value will be assigned class 0 (null class).
NSAM
Specifies the number of samples to collect on which to perform the the iterative clustering.
If this parameter is not specified, it defaults to 262144. If the specified value is larger than the total number of pixels in the image, all the pixels in the image will be used.
The time to compute each iteration is proportional to the number of samples used; using more than the default number of samples may significantly slow down the the clustering process. Also, because all the samples are stored in memory, a large NSAM value may lead to higher memory requirements. With 262144 samples and five bands of 8-bit input data, approximately 1.3MB of memory would be required; with NSAM set to 2000000, it would require 10MB of memory.
| Back to top |
The ISODATA method is similar in principle to the K-means procedure in the sense that cluster centers are iteratively determined sampled means. Unlike the latter algorithm, however, ISODATA represents a fairly comprehensive set of additional heuristic procedures which have been incorporated into an interactive scheme. Up to 16 image channels can be analyzed, and 255 clusters (classes) found.
Due to the large amount of memory required, ISOCLUS will sample a subset of the image data during cluster mean calculations to generate a histogram. The amount of sampling depends on the amount of image data. For example, for a 1024x1024 image data, ISOCLUS will sample every other pixel during calculation of cluster means. When writing results to an output channel, however, all pixels will be classified.
The MASK parameter specifies the area within the input channel to be processed. Only the area under mask will be classified; the rest of the image will not be processed.
If a single MASK value is specified, this refers to a bitmap segment that defines the area to be classified. When four values are specified, these define the x,y offsets and x,y dimensions of a rectangular window within the image to be classified.
If a mask is not specified, the entire image is processed.
It is quite common for satellite images to have a lot of black-filled areas (with zero gray levels) that should not be included in the classification. To solve this problem, you can first run THR by setting the minimum and maximum threshold values to 1 and 255, respectively. A bitmap mask is thus created only on the image area. You can then specify this bitmap for the MASK parameter.
You can specify the number of clusters desired through the NUMCLUS parameter. Note that this specification only provides an estimate; the final number of clusters may vary. You may limit this variation by setting the maximum (MAXCLUS) and mimumum (MINCLUS) number of clusters. MAXCLUS limits the total number of clusters allowed during splitting, and MINCLUS limits the minimum number of clusters allowed during lumping.
The initial seed values can be entered in a text file and specified by the SEEDFILE parameter. If this parameter is not specified, seeds will be generated diagonally along the n-dimensional histogram.
Define the maximum number of iterations allowed (MAXITER), and the movement threshold (MOVETHRS). The execution terminates when the number of iterations reaches MAXITER, or when the movement of all cluster means is less than MOVETHRS. For example, for all cluster means, the following situation causes the execution to terminate:
New cluster mean position - Old cluster mean position
----------------------------------------------------- < MOVETHRS
Old cluster mean position
The result of the clustering is a theme map directed to a specified image channel (DBOC). If the output channel is not specified, the results will not be saved to a channel. A theme map encodes each cluster with a unique gray-level value. The cluster number is represented by the gray level; for example, cluster 1 is assigned a gray-level value of 1, cluster 2 is assigned the gray-level value of 2. Gray level 0 represents unclassified clusters; therefore, if the theme map is later directed to the display, a pseudocolor table should be loaded so that each cluster is represented by a different color.
ISOCLUS provides the option of generating signatures for each cluster through the SIGGEN parameter. If SIGGEN is YES, a signature for each cluster is generated. You can use MLC (Maximum Likelihood Classifier) to classify other images.
ISOCLUS allows you to specify a background gray-level value (BACKVAL) to be ignored during classification. If this value is specified, pixels with the defined background value will be assigned class 0 (null class).
| Back to top |
The following is an example of finding clusters from 5 channels in the file irvine.pix. The desired number of clusters is 10, and the maximum and minimum allowable number of clusters are 20 and 5, respectively.
from pci.isoclus import isoclus
file = "irvine.pix" # input file
dbic = [1,2,3,4,5] # input channels
dboc = [9] # output channel
mask = [] # process entire channel
numclus = [5] # request 5 clusters
maxclus = [20] # at most 20 clusters
minclus = [5] # at least 5 clusters
seedfile = "" # automatically generate seeds
maxiter = [20] # no more than 20 iterations
movethrs = [0.01]
siggen = "NO" # no signature generation
samprm = [5]
stdv = [10.0]
lump = [1.0]
maxpair = [5] # no more than 5 cluster center pairs
# clumped in one iteration
backval = [] # no background value
nsam = [] # default number of samples
isoclus( file, dbic, dboc, mask, numclus, maxclus, minclus, seedfile, maxiter,\
movethrs, siggen, samprm, stdv, lump, maxpair, backval, nsam )
The following is an example of the output:
Iteration : 1
No. of Clusters : 5
Cluster Samples Max. Stdv. Mean Position:
( 1) 73507 9.51611 55.81042 19.86229 19.91672
29.82999 13.69804
( 2) 187513 9.03195 67.61027 27.55517 32.50297
43.59904 30.56694
( 3) 1093 14.98393 120.00732 57.25526 78.13449
66.09241 58.46386
( 4) 28 23.78014 207.92857 103.00000 144.28572
115.46429 71.10714
( 5) 3 1.69967 238.33333 119.00000 163.66667
129.00000 71.66666
Cluster No. Discarded : 5
Total No. of Clusters discarded : 1
Recalculating Clusters Mean Values
Iteration : ...
Iteration : ...
Iteration : 20
No. of Clusters : 10
Cluster Samples Max. Stdv. Mean Position:
( 1) 44672 7.13140 54.77601 19.13615 18.82732
24.60888 11.74371
( 2) 84972 5.76638 59.93691 22.69998 24.52145
38.33545 21.61785
( 3) 66118 5.16872 66.85352 27.10802 32.22654
40.91793 32.69448
( 4) 3373 8.07777 96.96590 43.43404 56.30388
49.36911 39.12481
( 5) 17970 6.01282 79.61875 33.60000 41.91302
40.81953 31.33311
( 6) 781 14.94864 125.76312 59.08835 80.00000
66.81434 50.97439
( 7) 25571 8.84357 64.53916 26.48536 29.13046
58.64683 25.93798
( 8) 2164 8.68565 98.43207 47.03928 64.37061
57.31747 59.35536
( 9) 16485 6.48150 74.65708 32.60097 42.35007
48.62518 45.63634
( 10) 35 26.02109 200.60001 99.22857 138.28572
110.91428 69.05714
FINAL RESULTS :
NO. OF CLUSTERS : 10
CLUSTER PIXELS MEAN POSITION :
( 1) 42309 54.6381 19.0508 18.6878 24.1532
11.5138
( 2) 83602 59.7712 22.5730 24.3250 37.9481
21.2744
( 3) 24671 64.1729 26.2388 28.6585 58.8584
25.1137
( 4) 67492 66.4786 26.9082 31.8410 41.0427
32.3277
( 5) 18472 73.8437 32.0540 41.4255 48.4454
44.7910
( 6) 18409 79.1224 33.3069 41.4644 40.5139
31.0331
( 7) 4004 95.0092 42.5310 55.1276 48.7040
39.3821
( 8) 2258 97.3866 46.4632 63.6063 57.0106
59.1103
( 9) 888 124.4178 58.3705 78.8570 65.9673
50.7140
( 10) 39 202.7692 100.2564 139.4103 111.5385
68.9231
--------
262144
| Back to top |
ISOCLUS is based (with minor modifications) on the ISODATA method described in the following publication:
Tou, Julius T. and Rafael C. Gonzalez. 1974. Pattern Recognition Principles. Addison-Wesley Publishing Co.
ISOCLUS performs unsupervised clustering using the ISODATA method on image data for up to 255 clusters (classes) and 48 channels. The output is a theme map directed to a database image channel.
The ISODATA method is similar in principle to the K-means procedure in the sense that cluster centers are iteratively determined sampled means. Unlike the latter algorithm, however, ISODATA represents a fairly comprehensive set of additional heuristic procedures that have been incorporated into an interactive scheme. Up to 48 image channels and 255 clusters can be found, and image data from up to 48 specified channels on a specified file can be read in.
Due to the large amount of memory required, ISOCLUS samples a subset of the image data during cluster mean calculations to generate a histogram. The amount of sampling depends on the amount of image data. For example, for a 1024x1024 image data, ISOCLUS will sample every other pixel during calculation of cluster means. However, when writing results to an output channel, all pixels will be classified.
The ISOCLUS algorithm is described as follows:
Before the algorithm is executed, it is necessary to specify a set Nc of initial centers, Z1, Z2, ..., ZN. This set, which need not necessarily be equal in number to the desired cluster centers, can be formed by selecting samples from the given set of data.
Distribute the N samples among the present cluster centers, using the following relation:
XinSj, if ||X-Zj|| < ||X-Zi||,
i=1,2,...,Nc; i not equal j for all X in the sample set. In this notation, Sj represents the subset of samples assigned to cluster center Zj.
Discard sample subsets with fewer than SAMPRM members; that is:
if for any j, Nj < SAMPRM, then discard Sj and reduce Nc by 1.
Update each cluster center Zj, j=1,2,...,Nc, by setting it equal to the sample mean of its corresponding set Sj; that is:
Zj = 1/Nj SUM X, j=1,2,...Nc
XinSj
where Nj is the number of samples in Sj. If samples were deleted in step 3, go to step 2.
Compute the average distance Dj of samples in the cluster domain Sj from their corresponding cluster center, using the following relation:
Dj = 1/Nj SUM ||X-Zj||, j=1,2,..Nc
XinSj
Compute the overall average distance of the samples from their respective cluster centers, using the following relation:
Nc
Do = 1/N SUM Nj Dj
j=1
If this is the last iteration, set LUMP = 0 and go to step 11; if Nc is less than or equal to NUMCLUS / 2, go to step 8; if this is an even-numbered iteration, or if Nc is greater than or equal to 2 (NUMCLUS), go to Step 11; otherwise, continue.
Find the standard deviation vector Vj = (V1j, V2j,.. Vnj)' for each sample subset, using the following relation:
Vij = sqrt( 1/Nj SUM (Xik-Zij)**2) i=1,2,..n;
j=1,2,..Nc
where n is the sample dimensionality, Xik is the ith component of the kth sample in Sj, Zij is the ith component of Zj, and Nj is the number of samples in Sj. Each component of Vj represents the standard deviation of the samples in Sj along a principal coordinate axis.
Find the maximum component of each Vj, j=1,2,..Nc, and denote it by Vjmax.
If for any Vjmax, j=1,2,...,Nc, there is Vjmax > STDV and (a) Dj > Do and Nj > 2(SAMPRM+1) or (b) Nc is less than or equal to NUMCLUS/2, then split Zj into two new cluster centers Zj+ and Zj-, delete Zj, and increase Nc by 1. Cluster center Zj+ is formed by adding a given quantity Gj to the component of Zj which corresponds to the maximum component of Vj; Zj- is formed by subtracting Gj from the same component of Zj. One way of specifying Gj is to let it be equal to some fraction of Vjmax. That is, Gj = k Vjmax, where k is greater than 0 and less than or equal to 1. In this example, k=0.5. If splitting took place in this step, go to Step 2; otherwise continue.
Compute the pairwise distances Dij between all cluster centers.
Dij=||Zi-Zj||, i = 1,2,..,Nc-1; j=i+1,...,Nc
Compare the distance Dij against the parameter LUMP. Arrange the L smallest distances, which are less than LUMP, in ascending order:
[Di1j1, Di2j2, ...,DiLjL]
where Di1j1 < Di2j2 < ... DiLjL and L is the maximum number of pairs of cluster centers which can be lumped together. The lumping process is discussed in the next step.
With each distance Diljl there is an associated pair of cluster centers, Zil and Zjl. Starting with the smallest of these distances, perform a pairwise lumping operation according to the following rule:
For l = 1,2,....,L, if neither Zil nor Zjl has been used in lumping in this iteration, merge these two cluster centers using the following relation:
Zl* = [1/(Nil+Njl)] * [ Nil(Zil) + Njl(Zjl)]
Delete Zil and Zjl and reduce Nc by 1.
It is noted that only pairwise lumping is allowed and that a lumped cluster center is obtained by weighting each old center by the number of samples in its domain. Experimental evidence indicates that more complex lumping can produce unsatisfactory results. It is also important to note that, since a cluster center can be lumped only once, this step will not always result in L lumped centers.
If this is the last iteration, the algorithm terminates. Otherwise go to Step 1 if any of the process parameters require changing, or go to Step 2 if the parameters are to remain the same for the next iteration.
| Back to top |
ISOCLUS is based (with minor modifications) on the ISODATA method described in the following publication:
Tou, Julius T. and Rafael C. Gonzalez. 1974. Pattern Recognition Principles. Addison-Wesley Publishing Co.
© PCI Geomatics Enterprises, Inc.®, 2026. All rights reserved.