Rigidity Documentation¶
Rigidity is a simple wrapper to Python’s built-in csv module that allows for validation and correction of data being read/written to/from CSV files.
With Rigidity, you can easily construct validation and correction rulesets to be applied automatically while preserving the csv interface. In other words, you can easily upgrade old code to better adhere to new output styles, or allow new code to better parse old files.
Installing¶
Installing via PyPI¶
Rigidity is listed on the Python Package Index. So, you may install it via pip:
pip install rigidity
Note that Rigidity only supports Python 3, so you may need to modify your pip command if your default Python version differs.
Installing from Source¶
If you have downloaded one of the source tarballs via Github, you may install it as follows:
tar -xzf rigidity-1.3.0.tar.gz
cd rigidity-1.3.0
sudo python3 setup.py install
Installing from Git¶
If you want to install the development version, you may clone our git repository:
git clone https://github.com/austinhartzheim/rigidity.git
cd rigidity
sudo python3 setup.py install1
Examples¶
This page includes examples of how to use Rigidity in your own projects.
Correcting Capitalization¶
Some spreadsheet providers insist on capitalizing all data. But, readability can be greatly enhanced by capitalizing words correctly.
Take the following CSV file as an example:
TITLE,AUTHOR
BRAVE NEW WORLD,ALDOUS HUXLEY
NINETEEN EIGHTY-FOUR,GEORGE ORWELL
It would be much more readable in the following form, and could even be included directly on a public-facing website:
Title,Author
Brave New World,Aldous Huxley
Nineteen Eighty-Four,George Orwell
Rigidity’s CapitalizeWords rule allows for selective capitalization of certain letters. By default, it capitalizes the characters following whitespace. But, we need to capitalize words following hyphens as well (in the case of Nineteen Eighty-Four). Here is how we do it:
import csv
import rigidity
reader = csv.reader(open('data.csv'))
rules = [
[rigidity.rules.Lower(), # Convert to lower-case first
rigidity.rules.CapitalizeWords(' -')], # Selectively capitalize
[rigidity.rules.Lower(), # Do the same for the author
rigidity.rules.CapitalizeWords(' -')]
]
r = rigidity.Rigidity(reader, rules)
for row in r:
print(', '.join(row))
The CapitalizeWords rule only performs selective capitalization. So, we need to use the Lower rule to convert the entire string to lower-case first. We also tell the rule to capitalize all letters immediately following a space character or a hyphen, which allows us to correctly capitalize “Nineteen Eighty-Four.”
UPC Validation¶
The following example demonstrates how to validate that a UPC-A code is correct by using the check digit. An additional test is also performed to ensure that the UPC is unique (which prevents accidental duplicates of what should be a unique identifier):
import rigidity
rules = [
[rigidity.rules.UpcA(strict=True),
rigidity.rules.Unique()]
]
r = rigidity.Rigidity(reader, rules)
for row in r:
print(row[0])
This example assumes that there is only one column in the CSV file - the column with the UPC code.
Activating strict on the UpcA rule causes the check bit of the UPC to be validated. If the digit is not valid, an error is raised. This can be deactivated to prevent check digit verification.
Creating Rules¶
This page details how you can create your own rules for use in Rigidity.
A Simple Example¶
Rigidty rules are all contained in classes subclassing Rule
. The simplest example has only an apply() method that validates or modifies data.
For example, a simple rule to check integers might look like this:
class Integer(rigidity.rules.Rule):
def apply(self, value):
return int(value)
When this rule is used, it attempts to convert the data passed in the value parameter to an integer. If it is successful, it returns the integer representation of value. If it fails, the ValueError from the cast propagates, preventing the invalid data from entering or exiting the program.
As a side effect, because the integer representation is returned, all future checks will then be operating on an integer value, regardless of the original type of value. This can be useful to automatically convert data as it enters the program rather than having to handle the casting logic later.
Dropping Rows¶
It is not always desirable to raise an error when invalid data is discovered. Sometimes the appropriate action is to ignore the offending row. Rigidity rules can cause a row to be ignored by raising the DropRow
exception.
The following code can be used to verify inventory data, preventing the inclusion of any products that are out of stock:
class Inventory(rigidity.rules.Rule):
def apply(self, value):
if isinstance(value, str):
value = int(str)
if not isinstance(value, int):
raise ValueError('Inventory was not an integer value.')
if value < 1:
raise rigidity.errors.DropRow()
return value
If we use this rule to validate this CSV file:
Product,Inventory
T-Shirt,12
Pants,4
Shorts,0
Shoes,-1
Gloves,3
We will get the following list of items that are in stock at the store:
Product,Inventory
T-Shirt,12
Pants,4
Gloves,3
Additionally, if any invalid data is located in the inventory column, an error will be raised to prevent other data from entering the CSV file.
Bidirectional Validation¶
Sometimes it is necessary to validate data differently depending on whether it is being read or written. This is why the Rule
class supports both the read()
and write()
methods. Implementing these methods in your rules can allow for greater flexibility of rulesets because the same rules can be used for both reading and writing data.
An example use of this functionality is the built-in Bytes
rule. This rule assumes that the data being read is raw binary data that is best represented as a Python bytes object:
class Bytes(Rule):
'''
When reading data, encode it as a bytes object using the given
encoding. When writing data, decode it using the given encoding.
'''
def __init__(self, encoding='utf8'):
self.encoding = encoding
def read(self, value):
return value.encode(self.encoding)
def write(self, value):
return value.decode(self.encoding)
When the data is read from a CSV file, the read()
method is called, which encodes the data using the selected encoding type and returns it. When it is time to write the data back into a CSV file, the write()
method is called to decode the data using the specified encoding scheme and return the value.
This rule could not be implemented as a unidirectional rule because the csv module would not know how to decode the bytes object.
Code Documentation¶
Errors¶
This submodule contains exception classes that are used by Rigidity to handle different actions from the rule classes.
-
exception
rigidity.errors.
DropRow
[source]¶ Bases:
rigidity.errors.RigidityException
When a rule raises this error, the row that is being processed is dropped from the output.
-
with_traceback
()¶ Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
-
Rigidity¶
This module contains the wrapper class that can be used to adapt your CSV parsing code to use rigidity.
Rigidity is a simple wrapper to the built-in csv module that allows for validation and correction of data being read/written from/to CSV files.
This module allows you to easily construct validation and correction rulesets to be applied automatically while preserving the csv interface. This allows you to easily upgrade old software to use new, strict rules.
-
class
rigidity.
Rigidity
(csvobj, rules=[], display=0)[source]¶ Bases:
object
A wrapper for CSV readers and writers that allows
-
DISPLAY_NONE
= 0¶ Do not display output at all.
-
DISPLAY_SIMPLE
= 1¶ Display simple warnings when ValueError is raised by a rule.
-
skip
()[source]¶ Return a row, skipping validation. This is useful when you want to skip validation of header information.
-
validate
(row)[source]¶ Warning
This method is deprecated and will be removed in a future release; it is included only to support old code. It will not produce consistent results with bi-directional rules. You should use
validate_read()
orvalidate_write()
instead.Validate that the row conforms with the specified rules, correcting invalid rows where the rule is able to do so.
If the row is valid or can be made valid through corrections, this method will return a row that can be written to the CSV file. If the row is invalid and cannot be corrected, then this method will raise an exception.
Parameters: row – a row object that can be passed to a CSVWriter’s writerow() method.
-
validate_read
(row)[source]¶ Validate that the row conforms with the specified rules, correcting invalid rows where the rule is able to do so.
If the row is valid or can be made valid through corrections, this method will return a row that can be written to the CSV file. If the row is invalid and cannot be corrected, then this method will raise an exception.
Parameters: row – a row object that can be returned from CSVReader’s readrow() method.
-
validate_write
(row)[source]¶ Validate that the row conforms with the specified rules, correcting invalid rows where the rule is able to do so.
If the row is valid or can be made valid through corrections, this method will return a row that can be written to the CSV file. If the row is invalid and cannot be corrected, then this method will raise an exception.
Parameters: row – a row object that can be passed to a CSVWriter’s __next__() method.
-
writeheader
()[source]¶ Plain pass-through to the given CSV object. It is assumed that header information is already valid when the CSV object is constructed.
-
writerow
(row)[source]¶ Validate and correct the data provided in row and raise an exception if the validation or correction fails. Then, write the row to the CSV file.
-
writerows
(rows)[source]¶ Validate and correct the data provided in every row and raise an exception if the validation or correction fails.
Note
Behavior in the case that the data is invalid and cannot be repaired is undefined. For example, the implementation may choose to write all valid rows up until the error, or it may choose to only conduct the write operation after all rows have been verified. Do not depend on the presence or absence of any of the rows in rows in the event that an exception occurs.
-
Rules¶
This submodule contains the built-in rules that are used for filtering and modifying data.
-
class
rigidity.rules.
Boolean
(allow_null=False, action=1, default=None)[source]¶ Bases:
rigidity.rules.Rule
Cast a string as a boolean value.
-
ACTION_DEFAULT
= 2¶ When invalid data is encountered, return a set defaut value.
-
ACTION_DROPROW
= 3¶ When invalid data is encountered, drop the row.
-
ACTION_ERROR
= 1¶ When invalid data is encountered, raise an exception.
-
__init__
(allow_null=False, action=1, default=None)[source]¶ Parameters: action – take the behavior indicated by ACTION_ERROR, ACTION_DEFAULT, or ACTION_DROPROW.
-
apply
(value)[source]¶ This is the default method for applying a rule to data. By default, the read() and write() methods will use this method to validate and modify data.
Parameters: value – the data to be validated. Returns: the validated and possibly modified value as documented by the rule. Raises: rigidity.errors.DropRow – when the rule wants to cancel processing of an entire row, it may do so with the DropRow error. This signifies to the rigidity.Rigidity
class that it should discontinue processing the row.
-
-
class
rigidity.rules.
Bytes
(encoding='utf8')[source]¶ Bases:
rigidity.rules.Rule
When reading data, encode it as a bytes object using the given encoding. When writing data, decode it using the given encoding.
-
read
(value)[source]¶ When reading data, it is validated with this method. By default, this method calls the apply() method of this class. However, you may override this method to achieve different behavior when reading and writing.
Parameters: value – the data to be validated. Returns: the validated and possibly modified value as documented by the rule. Raises: rigidity.errors.DropRow – when the rule wants to cancel processing of an entire row, it may do so with the DropRow error. This signifies to the rigidity.Rigidity
class that it should discontinue processing the row.
-
write
(value)[source]¶ When writing data, it is validated with this method. By default, this method calls the apply() method of this class. However, you may override this method to achieve different behavior when reading and writing.
Parameters: value – the data to be validated. Returns: the validated and possibly modified value as documented by the rule. Raises: rigidity.errors.DropRow – when the rule wants to cancel processing of an entire row, it may do so with the DropRow error. This signifies to the rigidity.Rigidity
class that it should discontinue processing the row.
-
-
class
rigidity.rules.
CapitalizeWords
(seperators=' tnr', cap_first=True)[source]¶ Bases:
rigidity.rules.Rule
Capitalize words in a string. By default, words are detected by searching for space, tab, new line, and carriage return characters. You may override this setting.
Also, by default, the first character is capitalized automatically.
-
__init__
(seperators=' \t\n\r', cap_first=True)[source]¶ Parameters: - seperators (str) – capitalize any character following a character in this string.
- cap_first (bool) – automatically capitalize the first character in the string.
-
apply
(value)[source]¶ This is the default method for applying a rule to data. By default, the read() and write() methods will use this method to validate and modify data.
Parameters: value – the data to be validated. Returns: the validated and possibly modified value as documented by the rule. Raises: rigidity.errors.DropRow – when the rule wants to cancel processing of an entire row, it may do so with the DropRow error. This signifies to the rigidity.Rigidity
class that it should discontinue processing the row.
-
-
class
rigidity.rules.
Cary
(action=1, default=None)[source]¶ Bases:
rigidity.rules.Rule
Cary values into subsequent rows lacking values in their column.
-
ACTION_DEFAULT
= 2¶ Until a value is encountered, use a default value to fill empty cells.
-
ACTION_DROPROW
= 3¶ When an empty cell is encountered and no other value is available to fill the cell, drop the row.
-
ACTION_ERROR
= 1¶ When an empty cell is encountered and no previous fill value is available, throw an error.
-
__init__
(action=1, default=None)[source]¶ Parameters: action – take the behavior indicated by ACTION_ERROR, ACTION_DEFAULT, or ACTION_DROPROW.
-
apply
(value)[source]¶ This is the default method for applying a rule to data. By default, the read() and write() methods will use this method to validate and modify data.
Parameters: value – the data to be validated. Returns: the validated and possibly modified value as documented by the rule. Raises: rigidity.errors.DropRow – when the rule wants to cancel processing of an entire row, it may do so with the DropRow error. This signifies to the rigidity.Rigidity
class that it should discontinue processing the row.
-
-
class
rigidity.rules.
Contains
(string)[source]¶ Bases:
rigidity.rules.Rule
Check that a string field value contains the string (or all strings in a list of strings) passed as a parameter to this rule.
-
apply
(value)[source]¶ This is the default method for applying a rule to data. By default, the read() and write() methods will use this method to validate and modify data.
Parameters: value – the data to be validated. Returns: the validated and possibly modified value as documented by the rule. Raises: rigidity.errors.DropRow – when the rule wants to cancel processing of an entire row, it may do so with the DropRow error. This signifies to the rigidity.Rigidity
class that it should discontinue processing the row.
-
-
class
rigidity.rules.
Drop
[source]¶ Bases:
rigidity.rules.Rule
Drop the data in this column, replacing all data with an empty string value.
-
apply
(value)[source]¶ This is the default method for applying a rule to data. By default, the read() and write() methods will use this method to validate and modify data.
Parameters: value – the data to be validated. Returns: the validated and possibly modified value as documented by the rule. Raises: rigidity.errors.DropRow – when the rule wants to cancel processing of an entire row, it may do so with the DropRow error. This signifies to the rigidity.Rigidity
class that it should discontinue processing the row.
-
-
class
rigidity.rules.
Float
(action=1)[source]¶ Bases:
rigidity.rules.Rule
Cast all data to floats or die trying.
-
ACTION_DROPROW
= 3¶ When invalid data is encountered, drop the row.
-
ACTION_ERROR
= 1¶ When invalid data is encountered, raise an exception.
-
ACTION_ZERO
= 2¶ When invalid data is encountered, return zero.
-
__init__
(action=1)[source]¶ Parameters: action – take the behavior indicated by ACTION_ERROR, ACTION_ZERO, or ACTION_DROPROW.
-
apply
(value)[source]¶ This is the default method for applying a rule to data. By default, the read() and write() methods will use this method to validate and modify data.
Parameters: value – the data to be validated. Returns: the validated and possibly modified value as documented by the rule. Raises: rigidity.errors.DropRow – when the rule wants to cancel processing of an entire row, it may do so with the DropRow error. This signifies to the rigidity.Rigidity
class that it should discontinue processing the row.
-
-
class
rigidity.rules.
Integer
(action=1)[source]¶ Bases:
rigidity.rules.Rule
Cast all data to ints or die trying.
-
ACTION_DROPROW
= 3¶ When invalid data is encountered, drop the row.
-
ACTION_ERROR
= 1¶ When invalid data is encountered, raise an exception.
-
ACTION_ZERO
= 2¶ When invalid data is encountered, return zero.
-
__init__
(action=1)[source]¶ Parameters: action – take the behavior indicated by ACTION_ERROR, ACTION_ZERO, or ACTION_DROPROW.
-
apply
(value)[source]¶ This is the default method for applying a rule to data. By default, the read() and write() methods will use this method to validate and modify data.
Parameters: value – the data to be validated. Returns: the validated and possibly modified value as documented by the rule. Raises: rigidity.errors.DropRow – when the rule wants to cancel processing of an entire row, it may do so with the DropRow error. This signifies to the rigidity.Rigidity
class that it should discontinue processing the row.
-
-
class
rigidity.rules.
Lower
[source]¶ Bases:
rigidity.rules.Rule
Convert a string value to lower-case.
-
apply
(value)[source]¶ This is the default method for applying a rule to data. By default, the read() and write() methods will use this method to validate and modify data.
Parameters: value – the data to be validated. Returns: the validated and possibly modified value as documented by the rule. Raises: rigidity.errors.DropRow – when the rule wants to cancel processing of an entire row, it may do so with the DropRow error. This signifies to the rigidity.Rigidity
class that it should discontinue processing the row.
-
-
class
rigidity.rules.
NoneToEmptyString
[source]¶ Bases:
rigidity.rules.Rule
Replace None values with an empty string. This is useful in cases where legacy software uses None to create an empty cell, but your other checks require a string.
-
apply
(value)[source]¶ This is the default method for applying a rule to data. By default, the read() and write() methods will use this method to validate and modify data.
Parameters: value – the data to be validated. Returns: the validated and possibly modified value as documented by the rule. Raises: rigidity.errors.DropRow – when the rule wants to cancel processing of an entire row, it may do so with the DropRow error. This signifies to the rigidity.Rigidity
class that it should discontinue processing the row.
-
-
class
rigidity.rules.
RemoveLinebreaks
[source]¶ Bases:
rigidity.rules.Rule
Remove linebreaks from the start and end of field values. These can sometimes be introduced into files and create problems for humans because they are invisible.to human users.
-
apply
(value)[source]¶ This is the default method for applying a rule to data. By default, the read() and write() methods will use this method to validate and modify data.
Parameters: value – the data to be validated. Returns: the validated and possibly modified value as documented by the rule. Raises: rigidity.errors.DropRow – when the rule wants to cancel processing of an entire row, it may do so with the DropRow error. This signifies to the rigidity.Rigidity
class that it should discontinue processing the row.
-
-
class
rigidity.rules.
ReplaceValue
(replacements={}, missing_action=4, default_value='')[source]¶ Bases:
rigidity.rules.Rule
Check if the value has a specified replacement. If it does, replace it with that value. If it does not, take one of the following configurable actions: pass it through unmodified, drop the row, or use a default value.
-
ACTION_BLANK
= 5¶ When no replacement is found, return an empty string.
-
ACTION_DEFAULT_VALUE
= 2¶ When no replacement is found, return a set default value.
-
ACTION_DROP
= 5¶ Warning
ACTION_DROP is deprecated due to the name being similar
to ACTION_DROPROW. Use ACTION_BLANK instead.
-
ACTION_DROPROW
= 1¶ When no replacement is found, drop the row.
-
ACTION_ERROR
= 4¶ When no replacement is found, raise an exception.
-
ACTION_PASSTHROUGH
= 3¶ When no replacement is found, allow the original to pass through.
-
__init__
(replacements={}, missing_action=4, default_value='')[source]¶ Parameters: - replacements (dict) – a mapping between original values and replacement values.
- missing_action – when a replacement is not found for a value, take the behavior specified by the specified value, such as ACTION_DROP, ACTION_DEFAULT_VALUE, ACTION_PASSTHROUGH, or ACTION_ERROR.
- default_value – if ACTION_DEFAULT_VALUE is the missing replacement behavior, use this variable as the default replacement value.
-
apply
(value)[source]¶ This is the default method for applying a rule to data. By default, the read() and write() methods will use this method to validate and modify data.
Parameters: value – the data to be validated. Returns: the validated and possibly modified value as documented by the rule. Raises: rigidity.errors.DropRow – when the rule wants to cancel processing of an entire row, it may do so with the DropRow error. This signifies to the rigidity.Rigidity
class that it should discontinue processing the row.
-
-
class
rigidity.rules.
Rule
[source]¶ Bases:
object
Base rule class implementing a simple apply() method that returns the given data unchanged.
-
apply
(value)[source]¶ This is the default method for applying a rule to data. By default, the read() and write() methods will use this method to validate and modify data.
Parameters: value – the data to be validated. Returns: the validated and possibly modified value as documented by the rule. Raises: rigidity.errors.DropRow – when the rule wants to cancel processing of an entire row, it may do so with the DropRow error. This signifies to the rigidity.Rigidity
class that it should discontinue processing the row.
-
read
(value)[source]¶ When reading data, it is validated with this method. By default, this method calls the apply() method of this class. However, you may override this method to achieve different behavior when reading and writing.
Parameters: value – the data to be validated. Returns: the validated and possibly modified value as documented by the rule. Raises: rigidity.errors.DropRow – when the rule wants to cancel processing of an entire row, it may do so with the DropRow error. This signifies to the rigidity.Rigidity
class that it should discontinue processing the row.
-
write
(value)[source]¶ When writing data, it is validated with this method. By default, this method calls the apply() method of this class. However, you may override this method to achieve different behavior when reading and writing.
Parameters: value – the data to be validated. Returns: the validated and possibly modified value as documented by the rule. Raises: rigidity.errors.DropRow – when the rule wants to cancel processing of an entire row, it may do so with the DropRow error. This signifies to the rigidity.Rigidity
class that it should discontinue processing the row.
-
-
class
rigidity.rules.
Static
(value)[source]¶ Bases:
rigidity.rules.Rule
Replace a field’s value with a static value declared during initialization.
-
apply
(value)[source]¶ This is the default method for applying a rule to data. By default, the read() and write() methods will use this method to validate and modify data.
Parameters: value – the data to be validated. Returns: the validated and possibly modified value as documented by the rule. Raises: rigidity.errors.DropRow – when the rule wants to cancel processing of an entire row, it may do so with the DropRow error. This signifies to the rigidity.Rigidity
class that it should discontinue processing the row.
-
-
class
rigidity.rules.
Strip
(chars=None)[source]¶ Bases:
rigidity.rules.Rule
Strip excess white space from the beginning and end of a value.
-
apply
(value)[source]¶ This is the default method for applying a rule to data. By default, the read() and write() methods will use this method to validate and modify data.
Parameters: value – the data to be validated. Returns: the validated and possibly modified value as documented by the rule. Raises: rigidity.errors.DropRow – when the rule wants to cancel processing of an entire row, it may do so with the DropRow error. This signifies to the rigidity.Rigidity
class that it should discontinue processing the row.
-
-
class
rigidity.rules.
Unique
(action=1)[source]¶ Bases:
rigidity.rules.Rule
Only allow unique values to pass. When a repeated value is found, the row may be dropped or an error may be raised.
-
ACTION_DROPROW
= 2¶ When repeat data is encountered, drop the row.
-
ACTION_ERROR
= 1¶ When repeat data is encountered, raise an exception.
-
-
class
rigidity.rules.
UpcA
(strict=False)[source]¶ Bases:
rigidity.rules.Rule
Validate UPC-A barscode numbers to ensure that they are 12 digits. Strict validation of the check digit may also be enabled.
-
class
rigidity.rules.
Upper
[source]¶ Bases:
rigidity.rules.Rule
Convert a string value to upper-case.
-
apply
(value)[source]¶ This is the default method for applying a rule to data. By default, the read() and write() methods will use this method to validate and modify data.
Parameters: value – the data to be validated. Returns: the validated and possibly modified value as documented by the rule. Raises: rigidity.errors.DropRow – when the rule wants to cancel processing of an entire row, it may do so with the DropRow error. This signifies to the rigidity.Rigidity
class that it should discontinue processing the row.
-
\ Sort by:\ best rated\ newest\ oldest\
\\
Add a comment\ (markup):
\``code``
, \ code blocks:::
and an indented block after blank line