Environments | PYTHON :: EASI |
Quick links | Description :: Parameters :: Parameter descriptions :: Return Value :: Details :: Example :: Related |
Back to top |
Back to top |
oartclass (filv, dbvs, trnmodel, filo, dbov, ftype, classfld)
Name | Type | Caption | Length | Value range |
---|---|---|---|---|
FILV* | str | Input vector file name | 1 - | |
DBVS* | List[int] | Segment number of vector layer | 1 - 1 | 1 - |
TRNMODEL* | str | XML file containing trained model | 1 - | |
FILO* | str | Output vector file name | 1 - | |
DBOV | List[int] | Output vector segment | 0 - 1 | |
FTYPE | str | Output file type | 0 - 3 | PIX | SHP Default: PIX |
CLASSFLD | str | Output field name of classes | 0 - | Default: Class |
Back to top |
FILV
The name of the file containing the segmentation vector layer to classify.
DBVS
The segment number of the vector layer that contains the vector polygons with attributes to classify.
TRNMODEL
The file created by running OARTTRAIN that contains the RT training model.
The file name extension must be .xml.
FILO
The name of the file to which to write the classification result.
If the output file does not exist, a new file will be created. If the output file does exist, it will be modified. The output file can also be the same as the input vector file, provided the file is writable.
DBOV
The segment number of the vector layer of the output file to which to write the results of the classification.
When the output file does not exist, DBOV is ignored and must be left blank (defaulted). The results will be written to the new file.
When the output file exists, and DBOV is specified, it must be the number of an existing vector segment that will be overwritten. When the output file exists, and DBOV is not specified, a new vector segment will be appended to the output file.
FTYPE
The format of the output file.
The default is PIX.
CLASSFLD
The field name of the output vector segment attribute table that contains the class names.
In the output file, a class name will be written to each object of the segmentation vector layer specified as input.
Back to top |
Returns: Segment number
Type: PCI_INT
Back to top |
A typical workflow starts by running the OASEG algorithm, to segment your image into a series of object polygons. Next you would calculate a set of attributes (statistical, geometrical, textural, and so on) by running the OACALCATT algorithm. Alternatively, when you are working with SAR data, you would use OASEGSAR and OACALCATTSAR. You can then, in Focus Object Analyst, manually collect or import training samples for some land-cover or land-use classes; alternatively, use OAGTIMPORT for this task. The training samples are stored in a field of the segmentation attribute table with a default name of Training.
You can create the list of attributes by running OAFLDNMEXP. Alternatively, the list can be read directly from the table of segmentation attributes using field metadata that was created by OACALCATT or OACALCATTSAR.
Typically, you specify as input the segmentation vector layer from OASEG or OASEGSAR.
The training (OARTTRAIN) and classification (OARTCLASS) steps are distinct to allow you to reuse a trained Random Trees model for other segmentations, provided that the list of attributes is the same for all segmentations and calculated from similar images; that is, from the same sensor and in the same acquisition mode.
A single decision tree is easy to conceptualize but will typically suffer from high variance, which makes them not competitive in terms of accuracy.
One way to overcome this limitation is to produce many variants of a single decision tree by selecting every time a different subset of the same training set in the context of randomization-based ensemble methods (Breiman, 2001). Random Forest Trees (RFT) is a machine learning algorithm based on decision trees. Random Trees (RT) belong to a class of machine learning algorithms which does ensemble classification. The term ensemble implies a method which makes predictions by averaging over the predictions of several independent base models.
The Random Forest algorithm, called thereafter Random Trees for trademark reasons, was originally conceived by Breiman (2001) “as a method of combining several CART style decision trees using bagging [...] Since its introduction by Breiman (2001) the random forests framework has been extremely successful as a general-purpose classification and regression method" (Denil et al., 2014).
The fundamental principle of ensemble methods based on randomization “is to introduce random perturbations into the learning procedure in order to produce several different models from a single learning set L and then to combine the predictions of those models to form the prediction of the ensemble” (Louppe, 2014). In other words, "significant improvements in classification accuracy have resulted from growing an ensemble of trees and letting them vote for the most popular class. In order to grow these ensembles, often random vectors are generated that govern the growth of each tree in the ensemble" (Breiman, 2001).
"There are three main choices to be made when constructing a random tree. These are (1) the method for splitting the leafs, (2) the type of predictor to use in each leaf, and (3) the method for injecting randomness into the trees" (Denil et al., 2014). A common technique for introducing randomness in a Tree "is to build each tree using a bootstrapped or sub-sampled data set. In this way, each tree in the forest is trained on slightly different data, which introduces differences between the trees" (Denil et al., 2014). Randomization can also occur by randomizing "the choice of the best split at a given node... experiments show however that when noise is important, Bagging usually yield better results" (Louppe, 2014).
When optimizing a Random Trees model, “special care must be taken so that the resulting model is neither too simple nor too complex. In the former case, the model is indeed said to underfit the data, i.e., to be not flexible enough the capture the structure between X and Y. In the latter case, the model is said to overfit the data, i.e., to be too flexible and to capture isolated structures (i.e., noise) that are specific to the learning set" (Louppe, 2014).
In general, the random trees classifier, unlike the Support Vector Machine (SVM), can handle a mix of categorical and numerical variable. The Random Trees is also less sensitive to data scaling while SVM often required data to be normalized prior to the training/classification. However, SVM is reported to perform better when the training set is small or unbalanced. The Random Trees classifier is computationally less intensive than SVM and works better and faster with large training sets.
Many versions of the Random Trees algorithm exist. Object Analyst uses the OpenCV implementation which use the Gini Impurity index to determine what is a good split point for a node on the classification tree and the minimum number of samples, the maximum tree depth and the accuracy of the trees as stopping criteria. An in-depth review of the popular implementation of Random Trees is provided in Louppe (2014) at section 5.4.2.
Back to top |
from pci.oartclass import oartclass filv="l7_ms_seg25_0.5_0.5.pix" dbvs=[2] trnmodel="rt_model_1.xml" # random trees (RT) model created by OARTTRAIN filo="l7_ms_seg25_0.5_0.5.pix" # same as filv dbov=[] # a new segment will be created ftype="" classfld="rt1_class" oartclass (filv, dbvs, trnmodel, filo, dbov, ftype, classfld )
© PCI Geomatics Enterprises, Inc.®, 2024. All rights reserved.