pyxx.files.File#

class pyxx.files.File(path: str | Path | None = None)#

Bases: object

Base class for processing files of any type (text or binary)

This class is intended to represent an arbitrary file (which can but does not necessarily exist). After creating a new instance of this class, it is possible to perform operations such as calculating file hashes and tracking whether the file has been modified.

Methods

__init__([path])

Define an arbitrary file

clear_file_hashes()

Clears any stored file hashes

compute_file_hashes([hash_functions, store])

Computes hashes of the file specified by the path attribute

has_changed()

Returns whether the file specified by the path attribute has changed since the last time file hashes were computed

set_read_metadata([path])

Configures metadata related to file to be read from disk

store_file_hashes([hash_functions])

Computes and stores hashes of the file specified by the path attribute

track_new_file(path[, hash_functions])

Shortcut for simultaneously modifying the path attribute and storing file hashes

Attributes

hashes

A copy of the dictionary containing any file hashes previously computed for the file specified by the path attribute

path

Path describing the location of the file on the disk

__init__(path: str | Path | None = None)#

Define an arbitrary file

Creates an object that represents and can be used to process a file of any type (text or binary).

Parameters:

path (str or pathlib.Path) – Path describing the location in the file system of the file that the object is to represent

property hashes: Dict[str, str]#

A copy of the dictionary containing any file hashes previously computed for the file specified by the path attribute

property path: Path | None#

Path describing the location of the file on the disk

Assigning a value to this attribute (regardless whether it matches the current value or is a different path) will save the value as a pathlib.Path and will automatically clear any saved file hashes.

clear_file_hashes() None#

Clears any stored file hashes

compute_file_hashes(hash_functions: tuple | str = ('md5', 'sha256'), store: bool = False) Dict[str, str]#

Computes hashes of the file specified by the path attribute

Computes and returns the hashes of the file specified by the path attribute, with the option to populate the hashes dictionary with their values.

Parameters:
  • hash_functions (tuple or str, optional) – Tuple of strings (or individual string) specifying which hash(es) to compute. Any hash functions supported by hashlib can be used. Default is ('md5', 'sha256')

  • store (bool, optional) – Whether to store the computed hashes in the hashes dictionary (default is False)

Returns:

A dictionary containing the file hashes specified by hash_functions

Return type:

dict

See also

pyxx.files.compute_file_hash

Function used to compute file hashes

Notes

Prior to calling this method, the path attribute must be defined. To simultaneously set the path attribute and store file hashes, use track_new_file().

has_changed() bool#

Returns whether the file specified by the path attribute has changed since the last time file hashes were computed

Returns:

Whether file has changed since the last time file hashes were computed

Return type:

bool

set_read_metadata(path: str | Path | None = None) None#

Configures metadata related to file to be read from disk

This method performs several pre-processing steps to prepare to read a file from the disk:

  1. Sets the path attribute. If the path argument was provided, the attribute is set to this value; otherwise, the existing value stored in the path attribute is used (or an error is thrown if not defined).

  2. Verifies that the file specified by the path attribute exists.

  3. Stores the hashes for the file.

It is advised that this method be called prior to reading any file.

Parameters:

path (str or pathlib.Path, optional) – Location of the file in the file system (default is None)

Raises:
  • AttributeError – If the both the path argument and the existing path attribute are None

  • FileNotFoundError – If the file specified by path (after completing Step 1 above) does not exist

store_file_hashes(hash_functions: tuple | str = ('md5', 'sha256')) None#

Computes and stores hashes of the file specified by the path attribute

Computes given hashes of the file specified by the path attribute and populates the hashes dictionary with their values.

Parameters:

hash_functions (tuple or str, optional) – Tuple of strings (or individual string) specifying which hash(es) to compute. Any hash functions supported by hashlib can be used. Default is ('md5', 'sha256')

See also

pyxx.files.compute_file_hash

Function used to compute file hashes

track_new_file

Use this method if you want to store file hashes but the path attribute isn’t yet defined

Notes

Prior to calling this method, the path attribute must be defined. To simultaneously set the path attribute and store file hashes, use track_new_file().

track_new_file(path: str | Path, hash_functions: tuple | str = ('md5', 'sha256')) None#

Shortcut for simultaneously modifying the path attribute and storing file hashes

This method functions as a “shortcut,” both modifying the path attribute and storing an optionally user-specified list of file hashes in the hashes attribute. The intention of this method is that if a File instance is tracking a given file, and user wants to switch to tracking another file, this provides a convenient way to do so with a single line of code.

Parameters:
  • file (str or pathlib.Path) – File that the object is to represent

  • hash_functions (tuple or str, optional) – Tuple of strings (or individual string) specifying which hash(es) to compute. Any hash functions supported by hashlib can be used. Default is ('md5', 'sha256')

See also

pyxx.files.compute_file_hash

Function used to compute file hashes