KNN

Description

KNN performs supervised classification using the K-Nearest Neighbor method under either resubstitution or independent classification paradigms.

Parameters

Name	Type	Caption	Length	Value range
FILE *	String	Input file name	1 - 192
DBSA *	Integer	Input sub-area channel	1 -
DBIC *	Integer	Input channel to be classified	1 -
DBBS *	Integer	Input class bitmap segments	1 - 48
DBOC *	Integer	Output theme map channel	1 - 1
MASK	Integer	Area mask (window or bitmap)	0 - 4	Xoffset, Yoffset, Xsize, Ysize
KVALUE	Integer	Number of nearest neighbors	0 - 1	1 -
MAXSAM	Integer	Maximum number of samples per class	0 - 1	1 - Default: 200
REPORT	String	Report mode	0 - 192	Quick links
MONITOR	String	Monitor mode	0 - 3	ON, OFF Default: ON

* Required parameter

Parameter descriptions

FILE

Specifies the name of the PCIDISK file that contains the training set signature data, the validation set data, and the class bitmaps.

DBSA

Specifies the channel(s) containing the classified pixels for the training set data. These channels may either be classified channels created by one of the classification functions (ISOCLUS) or multispectral images.

Ranges of channels or segments can be specified with negative values. For example, {1,-4,10} is internally expanded to {1,2,3,4,10}. When you are not specifying a range in this way, only 48 numbers can be specified explicitly.

DBIC

Specifies the channel(s) to be classified. This parameter must specify the same number of channels as the input sub-area channels (DBSA/InputB).

Ranges of channels or segments can be specified with negative values. For example, {1,-4,10} is internally expanded to {1,2,3,4,10}. When you are not specifying a range in this way, only 48 numbers can be specified explicitly.

DBBS

Specifies the bitmap (type 101) segments containing training sites to use in the classification.

Ranges of channels or segments can be specified with negative values. For example, {1,-4,10} is internally expanded to {1,2,3,4,10}. When you are not specifying a range in this way, only 48 numbers can be specified explicitly.

DBOC

Specifies the channel to receive the resulting theme map. Only one output channel may be specified. The theme map will contain as many theme classes as there are DBBS (InputBitmap) values.

MASK

Specifies the window or bitmap that defines the area to be processed within the input raster.

If a single value is specified, that value represents the channel number of the bitmap segment in the input file. Only the pixels under the bitmap are processed; the rest of the image remains unchanged.

If four values are specified, they define the x,y offsets and x,y dimensions of a rectangular window identifying the area to process. Xoffset, Yoffset define the upper-left starting pixel coordinates of the window. Xsize is the number of pixels that define the window width. Ysize is the number of lines that define the window height.

If no value is specified, the entire channel is processed.

KVALUE

Specifies the number of neighbors (k) to be used. A k value between 1 and 10 is usually effective; the default value is 5. The value of this parameter must be a positive integer.

MAXSAM

Specifies the maximum number of samples per training class. The default value is 200.

REPORT

Specifies where to direct the generated report.

Available options are:

TERM: generates a report on the terminal (default)
DISK: generates a report on file "IMPRPT.LST"
OFF: turns off report generation
<filename>: appends a report to the specified file

MONITOR

The program progress can be monitored by printing the percentage of processing completed. A system parameter, MONITOR, controls this activity.

Available options are:

ON: Turns the monitor ON (default)
OFF: Turns the monitor OFF (recommended if running in batch or background mode)

Details

KNN performs non-parametric supervised classification using the K-Nearest Neighbor (k-NN) algorithm. Both training and unclassified data sets must be provided as image channels and not as class signature segments.

The training set is created by reading in all image data from the input sub-area channels contained in the specified class bitmap segments. Each bitmap corresponds to one class, which is labeled using the bitmap segment number.

Samples from the unclassified input channels (DBIC) that lie under the area specified by MASK (InputBitmapMask) are classified. Classification is performed by computing the Euclidean distance between the unclassified sample's feature vector and each training set sample's feature vector. The labels of the k (specified by KVALUE) closest training samples are found. The unclassified sample is assigned to the class that has the majority of the k labels. In the event of a tie, the algorithm chooses the class with the label with the nearest distance encountered. Typical k values range from 1 to 10, with larger values necessary for noisy or high dimensionality data.

It is possible to use the same data for both training and unclassified sets. This is considered classification by resubstitution. The sample being classified is automatically excluded from the list of potential k-NNs during resubstitution.

The k-NN classifier may involve a large amount of computation as each unclassified pixel is compared to each training pixel. Users should take appropriate care in creating the database signature bitmaps so that they are representative of each cover class. The user may also specify a maximum population size for any class training set, using the MAXSAM (Maximum Number of Samples per Class) parameter. A default value of 200 is used.

The k-NN classifier has been shown to asymptotically approach the lower bound of the Bayes optimal error. This property applies to both parametric and non-parametric class conditional probability density functions. In addition, the k-NN classifier does not demand global dimensionality reduction of the training feature space to ensure accurate and precise results. Refer to texts such as Fukunaga for specific information on the appropriate design of a k-NN classifier, especially the choice for k and MAXSAM.

Example

Label a portion of irvine.pix using the k-NN classifier. The top-left corner of the region to be labeled is at pixel 10, line 20. The region is rectangular window 100 pixel by 100 lines. Use Landsat MSS channels 2 and 4 as features for the classifier.

Use the bitmaps in segments 9,11,12,13,14,15, and 16 to indicate training sub-areas for the classifier. Set the value of k to 2 and limit the number of samples per class to 220. Save the output theme map to channel 8.

EASI>FILE    =   "irvine.pix"    ! input file name
EASI>DBSA    =   2,4             ! image channels for training
EASI>DBIC    =   2,4             ! image channels to be classified
EASI>DBBS    =   9,11,-16        ! training set classes
EASI>DBOC    =   8               ! channel for output theme map
EASI>MASK    =   10,20,100,100   ! window (mask area) to be classified
EASI>KVALUE  =   2               ! nearest-neighbor classifier
EASI>MAXSAM  =   220             ! maximum training class size
EASI>REPORT  =   ""              ! send report to terminal

EASI>RUN KNN

The following report appears upon completion, where:

Seg: segment number of class bitmap
Name: name of class bitmap segment
Code: identification value (code) of class bitmap (pixel value used to encode theme map)
Pixels: number of pixels in class
%Image: percentage of image covered by class

irvine.pix    [11 Channels     512P 512L] 

Seg   Name    Code    Pixels    %Image

9    Water1    0         553      5.53
11   Urban     1        2342     23.42
12   Range     2        4033     40.33
13   Crop1     3        2532     25.32
14   Crop2     4         169      1.69
15   Crop3     5         129      1.29
16   Forest    6         242      2.42

     Total             10000    100.00

Note: The user could have used a bitmap segment to specify the region to be classified. The bitmap could be created by first using DCP to trace its outline on a graphic plane while the image is displayed. The graphic plane could then be saved as a bitmap segment using VIB. The use of a bitmap segment as a mask is necessary when classifying a non-rectangular region.

References

K. Fukunaga (1990) Introduction to Statistical Pattern Recognition Academic Press, Boston.

Environments	PYTHON :: EASI :: MODELER
Quick links	Description :: Parameters :: Parameter descriptions :: Details :: Example :: References :: Related

KNN

K-nearest-neighbor supervised classifier

Description

Parameters

Parameter descriptions

Details

Example

References