VECCLEAN

Vector Cleaning


EnvironmentsPYTHON :: EASI :: MODELER
Quick linksDescription :: Parameters :: Parameter descriptions :: Details :: Example :: Algorithm

Back to top

Description


VECCLEAN is used to correct common digitizing errors in line work. It is also the first step to produce cleaned topologically correct data. The output of the cleaned lines can take the form of lines, topological lines, or topological polygons.
Back to top

Parameters


Name Type Caption Length Value range
FILI * String Input file name 1 - 192  
FILO String Output file name 0 - 192  
DBVS1 * Integer Input layer 1 - 1  
DBVS2 Integer Reference layer 0 - 1  
WEEDTOL Float Weed tolerance 0 - 48 0 -
Default: 0
SHOOTTOL Float Shoot Tolerance 0 - 48 0 -
Default: 0
MERGETOL Float Merge tolerance 0 - 48 0 -
Default: 0
BREAKINTER String Break at intersections 0 - 5 TRUE, FALSE
Default: TRUE
REMNODES String Remove pseudo nodes 0 - 5 TRUE, FALSE
Default: FALSE
OUTLAYER String Output layer type 0 - 20 LINES, TOPOLOGICAL LINES, TOPOLOGICAL POLYGONS
Default: LINES

* Required parameter
Back to top

Parameter descriptions

FILI

Specifies the name of the PCIDSK file containing the input lines to be cleaned.

FILO

Specifies the name of the PCIDSK file that will store the output cleaned lines.

If FILO is not specified, FILI is used for the output cleaned lines. If FILO does not exist, a new file is created.

DBVS1

Specifies the vector segment that contains the input lines to be cleaned. The input lines may be in an unstructured, line, or whole polygon layer. Points and topological polygons are invalid input for this parameter.

Ranges of channels or segments can be specified with negative values. For example, {1,-4,10} is internally expanded to {1,2,3,4,10}. When you are not specifying a range in this way, only 48 numbers can be specified explicitly.

DBVS2

Specifies the optional input reference layer. Must have the same projection and datum as the input layer. The reference layer is used for under/over shoot corrections. Shapes in the reference layer are used to clip or extend shapes from the input layer. Elements in this layer are not modified. This could be used to extend or clip lines in an input layer to a boundary reference layer.

Ranges of channels or segments can be specified with negative values. For example, {1,-4,10} is internally expanded to {1,2,3,4,10}. When you are not specifying a range in this way, only 48 numbers can be specified explicitly.

WEEDTOL

Specifies the tolerance to be used by the Weed process. Tolerance is specified in map units.

If this parameter is set to 0, the Weed process will not be used.

SHOOTTOL

Specifies the tolerance to be used by the Under/Over Shoot process. Tolerance is expressed in map units.

If this parameter is set to 0, the Shoot process will not be used.

MERGETOL

Specifies the tolerance to be used by the Merge process. Tolerance is specified in map units.

If this parameter is set to 0, the Merge process will not be used.

BREAKINTER

Breaks the intersection of lines that cross or form a "T" intersection. This parameter is optional only if the "output vector type" is Lines (broken intersections are mandatory for topological output).

REMNODES

Specifies whether lines with common start/end points should be joined into one line. This join process is not based on attribute values; therefore, if this parameter is set to TRUE, all pseudo nodes are removed regardless of attribute values.

OUTLAYER

Specifies the type of output required. Supported values are:

For the Topological Polygons output type to be clean, the VECCLEAN-processed lines must have no remaining digitizing errors (that is, under/over shoots). Be sure to view the log file for the results of building the topological polygons.

Back to top

Details

VECCLEAN can perform the following clean-up procedures on the input line work:
VECCLEAN automatically performs the following processes:

VECCLEAN can drastically change the input lines depending on the size of the tolerances. For best results, the tolerances should be carefully selected. Too large a tolerance can have undesired effects on lines that weren't errors (for example, removing them or merging two lines into one).

The input layer can be an unstructured layer. If the layer contains a mixture of lines, whole polygons, and points, then the lines and whole polygons will be processed and the points will be ignored. This will be reported in the log file.

Processed under/over shoots result in "T" intersections. The new intersection results in the z-value of all three new nodes getting a linear interpolation of the z-values based on the vertices on either side of the intersection. The line that is being snapped to acts as the reference line to provide the linear interpolation.

Because the Merge process has two or more lines to merge together, there is no reference line for the interpolation. Therefore, the z-value is interpolated from the merged line segments.

Back to top

Example

Read vector segment 25 from irvine.pix. The cleaned vector is then stored in a new file called oirvine.pix. The tolerances have not been processed. No pseudo nodes have been removed but the Break at Intersection option is used.

EASI>FILI       = "irvine.pix"
EASI>FILO       = "oirvine.pix"         ! Create a new file called 'oirvine.pix'
EASI>DBVS1      = 25                    ! Vector segment 25 contains the lines
EASI>DBVS2      =! No reference layer is used
EASI>WEEDTOL    =! No weed process
EASI>SHOOTTOL   =! No shoot process
EASI>MERGETOL   =! No merge process
EASI>BREAKINTER =! Defaults to TRUE
EASI>REMNODES   =! Defaults to FALSE
EASI>OUTLAYER   =! Defaults to LINES
EASI>run vecclean
Back to top

Algorithm

In a general sense, a weeding (filtering) algorithm is used to reduce the number of points (nodes) on a vector element without greatly affecting its general aspect.

The Douglas-Poiker weeding algorithm works on a non-topological layer. The algorithm is dependent on the physical layout of the line. Lines with lots of variation between points will require more time to simplify than lines with the same number of points but less variation.

The parameters to the algorithm are:
The algorithm has the following variable:

The algorithm works in phases. The first thing that must be done is to determine an anchor point and a floater. In the example below, at stage B, the anchor is p1 and the floater is p16. Notice how the anchor point remains stable through stages B to E. while the floater moves. This is not to say that the anchor remains completely stable; it only moves at a slower rate than the floater point.


              p4       
   A        *  * *             *  *
        *         *                    *p16
       *p1        *  *p8      *
                      *        
                       * 
                        *
                           *p12


              p4       
   B        *  * *             *  *
       ___________*_______________________      
        *                              * p16
       *p1        *  *p8      *             ______
       _______________*______|____________  ______t
                       *     | d
                         *   | 
                             * p12


Phase 1

The line in the figure above requires 14+10+2+1 = 27 calculations of the value d (the perpendicular distance from a point to a corridor) before a single point is eliminated.

Stage 1.B: At the start, the anchor is one end of the line and the floater is the other end of the line. In the case of a closed polygon, the anchor is still the first point while the floater starts at the next to last. The floater and the anchor points establish the corridor direction while the tolerance, t, establishes the corridor width. Once the initial corridor is established, the point furthest away from the center of the corridor is determined. For stage B of the diagram above, the point p12 is furthest away from the corridor.

When the maximal point (p12) is determined, it is placed on the stack. This maximal point is called the maximum perpendicular bisector.

Determining the maximal point requires n-2 calculations of d (and the same number of comparisons). There is one comparison to determine whether the maximal point falls within the corridor. If the maximal point falls within the corridor, then every point between the anchor and the floater is eliminated.

At this stage, the maximal distance is not in the corridor, so skip to stage C.

Stage 1.C: The anchor is still the same as in stage B but the floater has now become p12, the previous maximal point. Now the maximal point between p1 and p12 is evaluated. It turns out to be p5. Determining that p5 is the maximal point requires 12 - 2=10 steps.

Since p5 is still larger than the tolerance, p5 is placed on the stack.

Stage 1.D: The floater point is p5, while the maximal distance is at p3. Again the maximal point is not within the tolerance so p3 is placed on the stack. The number of steps needed to determine that p3 is the maximal point is 2.

Stage 1.E: Finally, for last stage of phase 1, a maximal point, p2, falls within the tolerance. p2 is thus eliminated from the line. If there were more points between the floater and the anchor at this stage, all of those points would be eliminated as well.

Phase 2

Each phase is determined by the movement of the anchor to a new position.

Now that p2 has been eliminated, the anchor can be moved. The anchor is moved to the last position of the float, p3, and the float is moved to the point on the top of the stack, which should be p5.

The maximal point has been found and it falls within the tolerance. Thus all points between p3 and p5 must be eliminated. Point p4 is deleted and p3 is connected with p5.

Phase 3

Point p4 has been deleted, so the anchor now has to be moved to p5. Point p11 is at the top of the stack and the floater is moved to it. The maximal point is found and it does not fall within the tolerance so it has to be stored on the stack and the float moved to it.

The float has move to the maximal point found in C. Now a new maximal point must be found, as the new one falls within the tolerance corridor. Every point between the anchor and the float must be deleted.

The anchor has moved to a new position so a new phase begins. The maximal point here is outside of the tolerance corridor so it is pushed onto the stack.

This is a degenerate case; the anchor and the float are directly linked, with no intervening points. Thus the anchor is moved to the float, and the float is moved to the next point on the stack.

© PCI Geomatics Enterprises, Inc.®, 2026. All rights reserved.