3.3.9. pci.api.finder module

A framework for finding, filtering, and handling files from various input sources.

New in version 2018.

3.3.9.1. Examples

The following example shows how to search for files in the folder ‘/data’ for a file name with extension ‘pix’ or ‘tif’ and exclude files that start with ‘tmp’ or ‘bak’:

 1from pci.api.finder import MaskedFileExistsFilter
 2from pci.api.finder import ListBuildingFilenameHandler
 3from pci.api.finder import HandlingFilteredFilenameFinder
 4
 5# ...
 6inclusion_mask = ['*.pix', '*.tif']
 7exclusion_masks = ['tmp*', 'bak*']
 8
 9# Construct a filter that accepts files with extension 'pix' and 'tif'
10# but exclude files that start with 'tmp' or 'bak'
11file_filter = MaskedFileExistsFilter(inclusion_mask, True, exclusion_masks)
12# Construct a handler that builds a list from the accepted files
13handler = ListBuildingFilenameHandler()
14# Initialize the finder with the filter and handler
15finder = HandlingFilteredFilenameFinder(file_filter, handler, False)
16# Find files in the folder '/data'. Every file found is passed to the filter and
17# only the file that is accepted is passed to the handler
18finder.find('/data')
19print(handler.get_found_files())
20print(handler.get_found_filenames())

If the folder ‘/data’, contains the following files: ‘strip_1.pix’, ‘Mosaic.Pix’, ‘THUMBNAIL.TIF’, ‘tmpimage.pix’, the output will be as follows:

[('/data', 'strip_1.pix'), ('/data', 'Mosaic.Pix'), ('/data', 'THUMBNAIL.TIF')]
['/data/strip_1.pix', '/data/Mosaic.Pix', '/data/THUMBNAIL.TIF']

3.3.9.2. Filename finders

class pci.api.finder.AbstractFilenameFinder(quiet=False)

Bases: object

Abstract class for finding files and folders.

When quiet is False, information is logged using log.info() is called, otherwise information is logged using log.debug().

find(source)

Find all of the files in the given input source, source.

This implementation always throws a NotImplementedError.

class pci.api.finder.HandlingFilteredFilenameFinder(file_filter, accepted_file_handler, recursive=False, rejected_file_handler=None, ignored_file_handler=None, quiet=False, meta_dict_prototype={})

Bases: AbstractFilenameFinder

Class that finds files within an input source. It accepts or rejects files based upon an AbstractFilenameFilter. It sends accepted files to an AbstractFilenameHandler, rejected files to a separate AbstractFilenameHandler, and ignored files to another handler.

Ignored files and rejected files are subtly different. A rejected file is one you likely want to highlight to the user. An ignored file is something you usually don’t care about and don’t need to point out.

Initialize HandlingFilteredFilenameFinder with file_filter, which may be None, in that case all files are accepted. The accepted files are handled by accepted_file_handler, rejected files are handled by rejected_file_handler and the ignored files are handled by ignored_file_handler. If recursive is True then input source is searched recursively.

When quiet is False, information is logged using log.info() is called, otherwise information is logged using log.debug().

The parameter meta_dict_prototype, is an object that behaves like a dictionary. It can be a dictionary or any object that has the methods: __getitem__ and __setitem__. A copy of this object is used to pass information between the filter and the handlers, so that the original object is not modified.

For each file found a new copy of meta_dict_prototype is passed to the filter via the meta_dict parameter of AbstractFilenameFilter.accept(). The same copy is then passed to the AbstractFilenameHandler.handle() method of the appropriate handler.

find(source)

Iterate over the files in source and handle the files matching the criteria tested by the file filter. Files that match the file filter’s criteria are passed to the registered file handler. Recursive search is done only if the source supports recursive search and this object is constructed with recursive*=*True.

The parameter source is either an absolute directory name, as a string or pathlib.PurePath, or an instance of InputSourceCollection.

This method calls HandlingFilteredFilenameFinder.terminate() to ensure that things are shut down correctly.

find_no_terminate(source)

Iterate over the files in source and handle the files matching the criteria tested by the file filter. Files that match the file filter’s criteria are passed to the registered file handler. Recursive search is done only if the source supports recursive search and this object is constructed with recursive*=*True.

The parameter source is either an absolute directory name, as a string or pathlib.PurePath, or an instance of InputSourceCollection.

This method does NOT call HandlingFilteredFilenameFinder.terminate(). Any clean up or resource deallocation should be done by the caller.

terminate()

Terminates the finder. This is called when all files have been handled. It allows you to close open files or connections.

It also terminates the handlers and filters registered.

set_filter(file_filter)

Set the filter to file_filter. The filter will determine if a file is accepted, rejected, or ignored. If file_filter is None then all files are accepted.

set_accepted_handler(file_handler)

Set the accepted handler to file_handler, this is used to handle files that are accepted by the filter. An accepted file occurs when the filter returns True or when there is no filter.

set_rejected_handler(rejected_file_handler)

Set the rejected handler to rejected_file_handler, this is used to handle files that are rejected by the filter. A rejected file occurs when the filter returns False.

set_ignored_handler(ignored_file_handler)

Set the ignored handler to ignored_file_handler, this is used to handle files that are ignored by the filter. A ignored file occurs when the filter returns None.

set_recursive(recursive)

Search is recursive if recursive is True.

3.3.9.3. Filename filters

class pci.api.finder.AbstractFilenameFilter(quiet=False)

Bases: object

Abstract class that determines whether a file is acceptable. Any criteria can be used to determine whether the file is acceptable.

When quiet is False, information is logged using log.info() is called, otherwise information is logged using log.debug().

accept(folder, filename, meta_dict)

Determine whether the given path (the folder and filename combination) is acceptable. If meta_dict is a dictionary, it will be filled by this filter. The names and types of objects added to the dictionary are dependent upon the filter implementation. This dictionary is passed to all filename handlers so that you can pass information between the filters and handlers.

There are three possible return values:

  1. True: means the file is accepted.

  2. False: means the file is rejected.

  3. None: means the file is ignored.

This implementation always throws a NotImplementedError.

terminate()

Terminates the filter. This function is called when all files have been handled. It allows you to close open files or connections.

You should avoid opening and closing files in the filter because it could cause a problem if you have a chain of filters where more than one filter opens/closes the same file.

This implementation does nothing.

set_quiet(quiet)

When quiet is False, information is logged using log.info() is called, otherwise information is logged using log.debug().

class pci.api.finder.ChainedFilenameFilter(filter_chain, quiet=False)

Bases: AbstractFilenameFilter

Chains a series of file filters. A file must be accepted by all filters to be accepted by the chain.

filter_chain specifies a list of filters and filters are tested in order. If filter_chain is empty or None then this chain filter always returns True. If any element of this list is None, that element is ignored.

When quiet is False, information is logged using log.info() is called, otherwise information is logged using log.debug().

set_filters(filter_chain)

filter_chain specifies a list of filters and filters are tested in order. If filter_chain is empty or None then this chain filter always returns True. If any element of this list is None, that element is ignored.

accept(folder, filename, meta_dict)

Determine whether the given path (the folder and filename combination) is acceptable. If meta_dict is a dictionary, it may be filled by this filter.

There are three possible return values:

  1. True: means the file is accepted.

  2. False: means the file is rejected.

  3. None: means the file is ignored.

This implementation calls accept on all filters in the chain.

terminate()

Terminates the filter. This function is called when all files have been handled. It allows you to close open files or connections.

This implementation calls terminate on all filters in the chain.

set_quiet(quiet)

When quiet is False, information is logged using log.info() is called, otherwise information is logged using log.debug().

class pci.api.finder.FileExistsFilter(quiet=False)

Bases: AbstractFilenameFilter

A filter that tests whether the file exists on disk or not.

When quiet is False, information is logged using log.info() is called, otherwise information is logged using log.debug().

accept(folder, filename, meta_dict)

Determine whether the given path (the folder and filename combination) is acceptable. If meta_dict is a dictionary, it may be filled by this filter.

There are three possible return values:

  1. True: means the file is accepted.

  2. False: means the file is rejected.

  3. None: means the file is ignored.

set_quiet(quiet)

When quiet is False, information is logged using log.info() is called, otherwise information is logged using log.debug().

terminate()

Terminates the filter. This function is called when all files have been handled. It allows you to close open files or connections.

You should avoid opening and closing files in the filter because it could cause a problem if you have a chain of filters where more than one filter opens/closes the same file.

This implementation does nothing.

class pci.api.finder.MaskFilter(filename_masks, to_upper, ignore_rejected=False, quiet=False)

Bases: AbstractFilenameFilter

A filter that tests if the folder and file is matched by one of the masks in a list of masks. Masks only check the filename part, not the folder. Masking may be case-sensitive or case-insensitive.

The filename_masks can be a single mask or list of masks. This mask is used to test against filename in accept(). The Python library function fnmatch.fnmatch() is used to determine the match. For optimal performance the masks should be in order of most likely to least likely. If to_upper is set to True then file names and masks are converted to upper case during matching to perform a case-insensitive compare. If ignore_rejected is True then accept() will return None to indicate that files should be ignored rather than rejected.

When quiet is False, information is logged using log.info() is called, otherwise information is logged using log.debug().

accept(folder, filename, meta_dict)

Determine whether the given filename, matches the filename mask.

There are three possible return values:

  1. True: means the filename mask matched the filename.

  2. False: means the filename mask did not match the filename.

3) None: means the filename mask did not match the filename and ignore rejected is set.

set_quiet(quiet)

When quiet is False, information is logged using log.info() is called, otherwise information is logged using log.debug().

terminate()

Terminates the filter. This function is called when all files have been handled. It allows you to close open files or connections.

You should avoid opening and closing files in the filter because it could cause a problem if you have a chain of filters where more than one filter opens/closes the same file.

This implementation does nothing.

class pci.api.finder.ExclusionMaskFilter(exclusion_filename_masks, to_upper, ignore_rejected=False, quiet=False)

Bases: AbstractFilenameFilter

A filter that tests if the filename matches one of the exclusion masks. Masks only check the filename part, not the folder. It is possible to perform case insensitive masking.

The exclusion_filename_masks can be a single mask or list of masks. This mask is used to test against filename in accept(). The filename is accepted only if the file name does not match any of the exclusion masks. The Python library function fnmatch.fnmatch() is used to determine the match. For optimal performance the masks should be in order of most likely to least likely. If to_upper is set to True then file names and masks are converted to upper case during matching to perform a case-insensitive compare. If ignore_rejected is True then accept() will return None to indicate that files should be ignored rather than rejected.

When quiet is False, information is logged using log.info() is called, otherwise information is logged using log.debug().

accept(folder, filename, meta_dict)

Determine whether the given filename, does not match the exclusion filename mask.

There are three possible return values:

1) True: means the exclusion filename mask does not matched the filename.

  1. False: means the exclusion filename mask matched the filename.

3) None: means the exclusion filename mask matched the filename and ignore rejected is set

set_quiet(quiet)

When quiet is False, information is logged using log.info() is called, otherwise information is logged using log.debug().

terminate()

Terminates the filter. This function is called when all files have been handled. It allows you to close open files or connections.

You should avoid opening and closing files in the filter because it could cause a problem if you have a chain of filters where more than one filter opens/closes the same file.

This implementation does nothing.

class pci.api.finder.MaskedFileExistsFilter(inclusion_filename_masks, to_upper, exclusion_filename_masks=None, quiet=False)

Bases: ChainedFilenameFilter

A filter that tests whether the filename exists. It also checks that the file matches one of the inclusion or exclusion masks.

The parameter inclusion_filename_masks is an instance of MaskFilter, otherwise an instance of MaskFilter is created using inclusion_filename_masks, to_upper and quiet.

The parameter exclusion_filename_masks is an instance of ExclusionMaskFilter, otherwise an instance of ExclusionMaskFilter is created using exclusion_filename_masks, to_upper and quiet.

accept(folder, filename, meta_dict)

Determine whether the given path (the folder and filename combination) is acceptable. If meta_dict is a dictionary, it may be filled by this filter.

There are three possible return values:

  1. True: means the file is accepted.

  2. False: means the file is rejected.

  3. None: means the file is ignored.

This implementation calls accept on all filters in the chain.

set_filters(filter_chain)

filter_chain specifies a list of filters and filters are tested in order. If filter_chain is empty or None then this chain filter always returns True. If any element of this list is None, that element is ignored.

set_quiet(quiet)

When quiet is False, information is logged using log.info() is called, otherwise information is logged using log.debug().

terminate()

Terminates the filter. This function is called when all files have been handled. It allows you to close open files or connections.

This implementation calls terminate on all filters in the chain.

class pci.api.finder.ExclusionMaskedFileExistsFilter(exclusion_filename_masks, to_upper, quiet=False)

Bases: ChainedFilenameFilter

A filter that tests if the folder and filename match an exclusion mask in a list of masks. Masks only check the filename part, not the folder. It is possible to perform case insensitive masking.

Create a filter by chaining FileExistsFilter and ExclusionMaskFilter. The parameters exclusion_filename_masks, to_upper are used to create ExclusionMaskFilter.

When quiet is False, information is logged using log.info() is called, otherwise information is logged using log.debug().

accept(folder, filename, meta_dict)

Determine whether the given path (the folder and filename combination) is acceptable. If meta_dict is a dictionary, it may be filled by this filter.

There are three possible return values:

  1. True: means the file is accepted.

  2. False: means the file is rejected.

  3. None: means the file is ignored.

This implementation calls accept on all filters in the chain.

set_filters(filter_chain)

filter_chain specifies a list of filters and filters are tested in order. If filter_chain is empty or None then this chain filter always returns True. If any element of this list is None, that element is ignored.

set_quiet(quiet)

When quiet is False, information is logged using log.info() is called, otherwise information is logged using log.debug().

terminate()

Terminates the filter. This function is called when all files have been handled. It allows you to close open files or connections.

This implementation calls terminate on all filters in the chain.

3.3.9.4. Filename handlers

class pci.api.finder.AbstractFilenameHandler

Bases: object

Abstract class for handling file found by finders.

handle(folder, filename, meta_dict)

Handle the file filename in folder. The dictionary meta_dict has some extra information supplied by filters. The information contained in meta_dict and the way this information is handled is implementation dependent.

This implementation always throws a NotImplementedError.

terminate()

Terminate the handler. This function is called when all filenames have been handled. It allows you to close open file or connects.

This implementation does nothing.

class pci.api.finder.ListBuildingFilenameHandler

Bases: AbstractFilenameHandler

A handler that builds a list of (folder, filename) tuple pairs handled. Each pair handled is added to a list. Retrieve the list from the get_found_files() or get_found_filenames() methods.

handle(folder, filename, meta_dict)

Handles the given filename and folder. It adds the folder and filename, as a pair, to a list of found files. Use the method get_found_files() to get the list.

get_found_files()

Get the list of files. Each entry in the list is a pair tuple where tuple[0] is a folder and the tuple[1] is a filename.

get_found_filenames()

Get the list of files. Each entry in the list is a fully formed path and it is never None.

get_meta_dicts()

Get the list of meta_dict instance. There is one entry in the list for each entry in the list returned by get_found_files() or the list returned by get_found_filenames(). An entry in the meta_dict list may be None.

clear_found_files()

Clear the list of files that have been found.

terminate()

Terminate the handler. This function is called when all filenames have been handled. It allows you to close open file or connects.

This implementation does nothing.

class pci.api.finder.MFileCreatorHandler(mfilename, relativize=True, header=None, sort=False, key_func=None)

Bases: ListBuildingFilenameHandler

Creates an MFILE from the found files. The MFILE can be used as an input to Geomatica algorithms that have an mfile parameter. This implementation is simple in that it only adds file names to the MFILE. No other parameters are written to the MFILE.

Initialize with mfilename as the output MFILE.

If relativize is True, the filenames written to the MFILE are relative to the MFILE name, otherwise absolute paths are written to the MFILE. Relativization is done with pci.make_relative_path()

If header is set then it is written at the beginning of the created mfile. If sort is True, the filenames written into MFILE will be sorted based upon key_func - see Python documentation for built-in sorted().

get_mfile_name()

Get the name of the MFILE that is created by this class.

handle(folder, filename, meta_dict)

Handles the given filename and folder. The first call to this method is handled by _handle_first() and the rest by _handle_others().

get_mfile_entries()

Get the list of entries that will be written to the MFILE.

get_first_filename()

Returns the path to the fist file.

set_header(header)
terminate()

Terminates the handler. This is called when all filenames have been handled.

_handle_first(folder, filename, meta_dict)

Handle the first file. This implementation opens the MFILE and writes the path of the first file represented by folder and filename. If the MFILE exists it is overwritten.

clear_found_files()

Clear the list of files that have been found.

get_found_filenames()

Get the list of files. Each entry in the list is a fully formed path and it is never None.

get_found_files()

Get the list of files. Each entry in the list is a pair tuple where tuple[0] is a folder and the tuple[1] is a filename.

get_meta_dicts()

Get the list of meta_dict instance. There is one entry in the list for each entry in the list returned by get_found_files() or the list returned by get_found_filenames(). An entry in the meta_dict list may be None.

_handle_others(folder, filename, meta_dict)

Handles the rest of the files (excluding the first). This implementation writes the path of the file represented by folder and filename.

_determine_mfile_line_content(folder, filename, meta_dict)

This implementation either returns an absolute path, which is the full path of the file represented by folder and filename, or a relative path that is relative to the output MFILE itself. Relativization is done with pci.make_relative_path()