pyxx.files.TextFile¶
- class pyxx.files.TextFile(path: str | Path | None = None, comment_chars: Tuple[str, ...] | str | None = None)¶
Bases:
FileBase class for processing text files
This class can be used to represent text files (that is, files with a series of ASCII-based characters as content, that can be open and read with an editor such as Notepad). It provides the capability to read/write text files and perform processing operations such as removing commented lines.
Attributes
A tuple of all characters considered to denote comments
A reference to a list containing the (potentially modified) file content of each line of the file
The character(s) used to denote the end of lines in the text file
A copy of the raw file content
Whether the original file had a newline at the end of the file
Methods
__init__([path, comment_chars])Define a text file
clean_contents([remove_comments, ...])Clean
contentsin-placeoverwrite([prologue, epilogue, line_ending])parse()Parses the data in
contentsand stores it in class attributesread([path, parse])Read file from disk
set_contents(contents, trailing_newline[, ...])Add data to the
contentslistUpdates the
contentslist based on object attributeswrite(output_file[, write_mode, ...])Write file to disk
Inherited Attributes
A copy of the dictionary containing any file hashes previously computed for the file specified by the
pathattributePath describing the location of the file on the disk
Inherited Methods
Clears any stored file hashes
compute_file_hashes([hash_functions, store])Computes hashes of the file specified by the
pathattributeReturns whether the file specified by the
pathattribute has changed since the last time file hashes were computedset_read_metadata([path])Configures metadata related to file to be read from disk
store_file_hashes([hash_functions])Computes and stores hashes of the file specified by the
pathattributetrack_new_file(path[, hash_functions])Shortcut for simultaneously modifying the
pathattribute and storing file hashes- __init__(path: str | Path | None = None, comment_chars: Tuple[str, ...] | str | None = None) None¶
Define a text file
Creates an object that represents and can be used to process a text file.
- Parameters:
path (str or pathlib.Path, optional) – Location of the text file in the file system (default is
None)comment_chars (tuple or str, optional) – Character(s) considered to represent comments in the text file (default is
None, which considers no characters to denote comments in the file)
Notes
Passing an empty string (
'') or empty tuple (()) as thecomment_charsargument is equivalent to passingNone(or not providing this argument) – in all these cases, the file will be considered to have no characters denoting comments.
- property comment_chars: Tuple[str, ...] | None¶
A tuple of all characters considered to denote comments
- property contents: List[str]¶
A reference to a list containing the (potentially modified) file content of each line of the file
Warning
This attribute returns the list by reference. This means that if you set a variable equal to this reference, then editing this variable will edit the
contentsattribute (e.g., if you setmy_content = MyTextFile.contents, then editingmy_contentwill change the content stored inMyTextFile).Notes
If trying to set the
contentsattribute, do not try to set this attribute directly (i.e., don’t use code similar toMyTextFile.contents = ['line1', 'line2', 'line3']). Instead, use theset_contents()method, as it offers greater control over whether the contents are passed by reference or value.
- property line_ending: str | Tuple[str, ...]¶
The character(s) used to denote the end of lines in the text file
This property only applies to files that were read using the
read()method. After reading a file, this property stores the line ending(s) used in the file. Lines in text files can be terminated with'\n'(LF),'\r\n'(CRLF),'\r', or a combination of these characters (potentially with different line endings on different lines).After reading a file, this property stores either a string containing the line endings on every line of the file, or a tuple containing all line endings encountered throughout the file.
- property raw_contents: List[str] | None¶
A copy of the raw file content
If the file was read using the
read()method, this attribute stores the original, unaltered contents of each line of the input file, and it returns a copy of this list of lines. If the file was not read with theread()method, this attribute stores a value ofNone.
- property trailing_newline: bool¶
Whether the original file had a newline at the end of the file
- clean_contents(remove_comments: bool = False, skip_full_line_comments: bool = False, strip: bool = False, concat_lines: bool = False, remove_blank_lines: bool = False) None¶
Clean
contentsin-placeCleans
contents(removing comments, blank lines, etc.) based on user-defined rules. Modifications are made in-place (i.e., the resulting content is stored incontents).- Parameters:
remove_comments (bool, optional) – Whether to remove comments from file (default is
True)skip_full_line_comments (bool, optional) – Whether to skip removing comments where the comment is the only text on a line. Only applies if
remove_commentsisTrue(default isFalse)strip (bool, optional) – Whether to strip leading and trailing whitespace from each line (default is
True)concat_lines (bool, optional) – Whether to concatenate lines ending with a backslash with the following line (default is
True)remove_blank_lines (bool, optional) – Whether to remove lines that contain no content after other cleaning operations have completed (default is
True)
- overwrite(prologue: str = '', epilogue: str | None = None, line_ending: str = '\n') None¶
Write data in
contentsto the file specified bypathWrites the lines of content in the
contentsattribute to the (previously-defined) file specified by thepathattribute, suppressing warnings before overwriting the file. This is useful for cases when the file contents are manually populated and it is desired to “dump” them to a file. This method is also useful if a file’s contents need to be updated periodically based on the results of another process.- Parameters:
prologue (str, optional) – Content written at beginning of file (default is
'')epilogue (str, optional) – Content written at end of file (default is to use the value of the
line_endingargument iftrailing_newlineisTrueand''otherwise)line_ending (str, optional) – String written at the end of each line when writing file content (default is
'\n')
- parse() None¶
Parses the data in
contentsand stores it in class attributesThis method by default does nothing. However, it is intended that subclasses of
TextFileshould override this method and define file-specific behavior in this method for extracting data from the file and storing it in custom object attributes.For example, if defining a CSV-parser, the
parse()method might parse data from the file and store it as a NumPy array.
- clear_file_hashes() None¶
Clears any stored file hashes
- compute_file_hashes(hash_functions: tuple | str = ('md5', 'sha256'), store: bool = False) Dict[str, str]¶
Computes hashes of the file specified by the
pathattributeComputes and returns the hashes of the file specified by the
pathattribute, with the option to populate thehashesdictionary with their values.- Parameters:
hash_functions (tuple or str, optional) – Tuple of strings (or individual string) specifying which hash(es) to compute. Any hash functions supported by
hashlibcan be used. Default is('md5', 'sha256')store (bool, optional) – Whether to store the computed hashes in the
hashesdictionary (default isFalse)
- Returns:
A dictionary containing the file hashes specified by
hash_functions- Return type:
dict
See also
pyxx.files.compute_file_hashFunction used to compute file hashes
Notes
Prior to calling this method, the
pathattribute must be defined. To simultaneously set thepathattribute and store file hashes, usetrack_new_file().
- has_changed() bool¶
Returns whether the file specified by the
pathattribute has changed since the last time file hashes were computed- Returns:
Whether file has changed since the last time file hashes were computed
- Return type:
bool
- property hashes: Dict[str, str]¶
A copy of the dictionary containing any file hashes previously computed for the file specified by the
pathattribute
- property path: Path | None¶
Path describing the location of the file on the disk
Assigning a value to this attribute (regardless whether it matches the current value or is a different path) will save the value as a
pathlib.Pathand will automatically clear any saved file hashes.
- read(path: str | Path | None = None, parse: bool = True) None¶
Read file from disk
Calling this method reads the file specified by the
pathattribute from the disk, populatingcontentsandraw_contents. Additionally, the file hashes stored in thehashesattribute are updated (to make it easier to check if the file has been modified later).- Parameters:
path (str or pathlib.Path, optional) – Location of the text file in the file system (default is
None)parse (bool, optional) – Whether to call the
parse()method after reading the file (default isTrue)
- set_read_metadata(path: str | Path | None = None) None¶
Configures metadata related to file to be read from disk
This method performs several pre-processing steps to prepare to read a file from the disk:
Sets the
pathattribute. If thepathargument was provided, the attribute is set to this value; otherwise, the existing value stored in thepathattribute is used (or an error is thrown if not defined).Verifies that the file specified by the
pathattribute exists.Stores the hashes for the file.
It is advised that this method be called prior to reading any file.
- Parameters:
path (str or pathlib.Path, optional) – Location of the file in the file system (default is
None)- Raises:
- store_file_hashes(hash_functions: tuple | str = ('md5', 'sha256')) None¶
Computes and stores hashes of the file specified by the
pathattributeComputes given hashes of the file specified by the
pathattribute and populates thehashesdictionary with their values.- Parameters:
hash_functions (tuple or str, optional) – Tuple of strings (or individual string) specifying which hash(es) to compute. Any hash functions supported by
hashlibcan be used. Default is('md5', 'sha256')
See also
pyxx.files.compute_file_hashFunction used to compute file hashes
track_new_fileUse this method if you want to store file hashes but the
pathattribute isn’t yet defined
Notes
Prior to calling this method, the
pathattribute must be defined. To simultaneously set thepathattribute and store file hashes, usetrack_new_file().
- track_new_file(path: str | Path, hash_functions: tuple | str = ('md5', 'sha256')) None¶
Shortcut for simultaneously modifying the
pathattribute and storing file hashesThis method functions as a “shortcut,” both modifying the
pathattribute and storing an optionally user-specified list of file hashes in thehashesattribute. The intention of this method is that if aFileinstance is tracking a given file, and user wants to switch to tracking another file, this provides a convenient way to do so with a single line of code.- Parameters:
file (str or pathlib.Path) – File that the object is to represent
hash_functions (tuple or str, optional) – Tuple of strings (or individual string) specifying which hash(es) to compute. Any hash functions supported by
hashlibcan be used. Default is('md5', 'sha256')
See also
pyxx.files.compute_file_hashFunction used to compute file hashes
- set_contents(contents: List[str], trailing_newline: bool, pass_by_reference: bool = False) None¶
Add data to the
contentslistAllows users to manually fill the
contentslist with user-defined content. The input list must be a list of strings, and the user can optionally choose whether to pass the input by reference or value.- Parameters:
contents (list) – List of strings which are to be assigned to the
contentslisttrailing_newline (bool) – Whether the contents being added represent a file with a trailing newline (because the file wasn’t read, the object has no way to determine whether the file has a trailing newline, so users must provide this information)
pass_by_reference (bool, optional) – Whether to pass the
contentsargument by reference (default isFalse)
Notes
If passing
contentsby reference, this means that if subsequent changes are made to the originalcontentsobject, they will be reflected in thecontentsattribute. If passing by value, then a copy of thecontentsargument will be made, so changing the object outside the class instance will not affect thecontentsattribute.
- update_contents() None¶
Updates the
contentslist based on object attributesThis method by default does nothing. However, it is intended that subclasses of
TextFileshould override this method and define file-specific behavior in this method for converting custom object attributes to lines of text in the file, and storing these data incontents.For example, if defining a CSV-parser, the class might have an attribute that stores numerical data in a NumPy array, and the
update_contents()method might convert the data in this array to comma-separated strings and store them incontents.
- write(output_file: str | Path, write_mode: str = 'w', warn_before_overwrite: bool = True, prologue: str = '', epilogue: str | None = None, line_ending: str = '\n', update_contents: bool = True) None¶
Write file to disk
Calling this method writes the file contents stored in
contentsto the disk.- Parameters:
output_file (str or pathlib.Path) – Output file to which to write content
write_mode (str, optional) – Any mode (such as
'w'or'a') for the built-inopen()function for writing files (default is'w')warn_before_overwrite (bool, optional) – Whether to throw an error if
output_filealready exists (default isTrue)prologue (str, optional) – Content written at beginning of file (default is
'')epilogue (str, optional) – Content written at end of file (default is to use the value of the
line_endingargument iftrailing_newlineisTrueand''otherwise)line_ending (str, optional) – String written at the end of each line when writing file content (default is
'\n')update_contents (bool, optional) – Whether to call the
update_contents()method before writing the file (default isTrue)