reedsolo
Pure-Python Reed Solomon encoder/decoder
Rank: #2759Downloads: 2,019,553 (30 days)Stars: 407Forks: 90
Description
Reed Solomon
============
|PyPI-Status| |PyPI-Versions| |PyPI-Downloads|
|Build-Status| |Coverage|
|Conda-Forge-Status| |Conda-Forge-Platforms| |Conda-Forge-Downloads|
A pythonic `universal errors-and-erasures Reed-Solomon Codec <http://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction>`_ to protect your data from errors and bitrot. It includes a pure python implementation and an optional speed-optimized Cython/C extension.
This is a burst-type implementation, so that it supports any Galois field higher than 2^3, but not binary streams. Burst errors are non-random errors that more often happen on data storage mediums such as hard drives, hence this library is better suited for data storage protection, and less for streams noise correction, although it also works for this purpose but with a bit of overhead (since it works with bytes only, instead of bits).
Based on the wonderful tutorial at `Wikiversity <http://en.wikiversity.org/wiki/Reed%E2%80%93Solomon_codes_for_coders>`_, written by "Bobmath" and "LRQ3000". If you are just starting with Reed-Solomon error correction codes, the Wikiversity article is a good beginner's introduction.
------------------------------------
.. contents:: Table of contents
:backlinks: top
:local:
Installation
------------
For the latest stable release, install with:
.. code:: sh
pip install --upgrade reedsolo
For the latest development release (do not use in production!), use:
.. code:: sh
pip install --upgrade git+https://github.com/tomerfiliba/reedsolomon
If you have some issues installing through pip, maybe this command may help:
.. code:: sh
pip install reedsolo --no-binary={reedsolo}
By default, only a pure-python implementation is installed. If you have Cython and a C++ compiler, a faster cythonized binary can be optionally built with:
.. code:: sh
pip install --upgrade reedsolo --install-option="--cythonize" --verbose
or locally with:
.. code:: sh
python setup.py install --cythonize
The setup.py will then try to build the Cython optimized module ``creedsolo.pyx`` if Cython is installed, which can then be imported as `import creedsolo` instead of `import reedsolo`, with the same features between both modules.
As an alternative, use `conda <https://docs.conda.io/en/latest/>`_ to install a compiled version for various platforms:
.. code:: sh
conda install -c conda-forge reedsolo
Usage
-----
Basic usage with high-level RSCodec class
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: python
# Initialization
>>> from reedsolo import RSCodec, ReedSolomonError
>>> rsc = RSCodec(10) # 10 ecc symbols
# Encoding
# just a list of numbers/symbols:
>>> rsc.encode([1,2,3,4])
b'\x01\x02\x03\x04,\x9d\x1c+=\xf8h\xfa\x98M'
# bytearrays are accepted and the output will be matched:
>>> rsc.encode(bytearray([1,2,3,4]))
bytearray(b'\x01\x02\x03\x04,\x9d\x1c+=\xf8h\xfa\x98M')
# encoding a byte string is as easy:
>>> rsc.encode(b'hello world')
b'hello world\xed%T\xc4\xfd\xfd\x89\xf3\xa8\xaa'
# Note: strings of any length, even if longer than the Galois field, will be encoded as well using transparent chunking.
# Decoding (repairing)
>>> rsc.decode(b'hello world\xed%T\xc4\xfd\xfd\x89\xf3\xa8\xaa')[0] # original
b'hello world'
>>> rsc.decode(b'heXlo worXd\xed%T\xc4\xfdX\x89\xf3\xa8\xaa')[0] # 3 errors
b'hello world'
>>> rsc.decode(b'hXXXo worXd\xed%T\xc4\xfdX\x89\xf3\xa8\xaa')[0] # 5 errors
b'hello world'
>>> rsc.decode(b'hXXXo worXd\xed%T\xc4\xfdXX\xf3\xa8\xaa')[0] # 6 errors - fail
Traceback (most recent call last):
...
reedsolo.ReedSolomonError: Too many (or few) errors found by Chien Search for the errata locator polynomial!
**Important upgrade notice for pre-1.0 users:** Note that ``RSCodec.decode()`` returns 3 variables:
1. the decoded (corrected) message
2. the decoded message and error correction code (which is itself also corrected)
3. and the list of positions of the errata (errors and erasures)
Here is how to use these outputs:
.. code:: python
>>> tampered_msg = b'heXlo worXd\xed%T\xc4\xfdX\x89\xf3\xa8\xaa'
>>> decoded_msg, decoded_msgecc, errata_pos = rsc.decode(tampered_msg)
>>> print(decoded_msg) # decoded/corrected message
bytearray(b'hello world')
>>> print(decoded_msgecc) # decoded/corrected message and ecc symbols
bytearray(b'hello world\xed%T\xc4\xfd\xfd\x89\xf3\xa8\xaa')
>>> print(errata_pos) # errata_pos is returned as a bytearray, hardly intelligible
bytearray(b'\x10\t\x02')
>>> print(list(errata_pos)) # convert to a list to get the errata positions as integer indices
[16, 9, 2]
Since we failed to decode with 6 errors with a codec set with 10 error correction code (ecc) symbols, let's try to use a bigger codec, with 12 ecc symbols.
.. code:: python
>>> rsc = RSCodec(12) # using 2 more ecc symbols (to correct max 6 errors or 12 erasures)
>>> rsc.encode(b'hello world')
b'hello world?Ay\xb2\xbc\xdc\x01q\xb9\xe3\xe2='
>>> rsc.decode(b'hello worXXXXy\xb2XX\x01q\xb9\xe3\xe2=')[0] # 6 errors - ok, but any more would fail
b'hello world'
>>> rsc.decode(b'helXXXXXXXXXXy\xb2XX\x01q\xb9\xe3\xe2=', erase_pos=[3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 15, 16])[0] # 12 erasures - OK
b'hello world'
This shows that we can decode twice as many erasures (where we provide the location of errors ourselves) than errors (with unknown locations). This is the cost of error correction compared to erasure correction.
To get the maximum number of errors *or* erasures that can be independently corrected (ie, not simultaneously):
.. code:: python
>>> maxerrors, maxerasures = rsc.maxerrata(verbose=True)
This codec can correct up to 6 errors and 12 erasures independently
>>> print(maxerrors, maxerasures)
6 12
To get the maximum number of errors *and* erasures that can be simultaneously corrected, you need to specify the number of errors or erasures you expect:
.. code:: python
>>> maxerrors, maxerasures = rsc.maxerrata(erasures=6, verbose=True) # we know the number of erasures, will calculate how many errors we can afford
This codec can correct up to 3 errors and 6 erasures simultaneously
>>> print(maxerrors, maxerasures)
3 6
>>> maxerrors, maxerasures = rsc.maxerrata(errors=5, verbose=True) # we know the number of errors, will calculate how many erasures we can afford
This codec can correct up to 5 errors and 2 erasures simultaneously
>>> print(maxerrors, maxerasures)
5 2
Note that if a chunk has more errors and erasures than the Singleton Bound as calculated by the ``maxerrata()`` method, the codec will try to raise a ``ReedSolomonError`` exception,
but may very well not detect any error either (this is a theoretical limitation of error correction codes). In other words, error correction codes are unreliable to detect if a chunk of a message
is corrupted beyond the Singleton Bound. If you want more reliability in errata detection, use a checksum or hash such as SHA or MD5 on your message, these are much more reliable and have no bounds
on the number of errata (the only potential issue is with collision but the probability is very very low).
Note: to catch a ``ReedSolomonError`` exception, do not forget to import it first with: ``from reedsolo import ReedSolomonError``
To check if a message is tampered given its error correction symbols, without decoding, use the ``check()`` method:
.. code:: python
# Checking
>> rsc.check(b'hello worXXXXy\xb2XX\x01q\xb9\xe3\xe2=') # Tampered message will return False
[False]
>> rmes, rmesecc, errata_pos = rsc.decode(b'hello worXXXXy\xb2XX\x01q\xb9\xe3\xe2=')
>> rsc.check(rmesecc) # Corrected or untampered message will return True
[True]
>> print('Number of detected errors and erasures: %i, their positions: %s' % (len(errata_pos), list(errata_pos)))
Number of detected errors and erasures: 6, their positions: [16, 15, 12, 11, 10, 9]
By default, most Reed-Solomon codecs are limited to characters that can be encoded in 256 bits and with a length of maximum 256 characters. But this codec is universal, you can reduce or increase the length and maximum character value by increasing the Galois Field:
.. code:: python
# To use longer chunks or bigger values than 255 (may be very slow)
>> rsc = RSCodec(12, nsize=4095) # always use a power of 2 minus 1
>> rsc = RSCodec(12, c_exp=12) # alternative way to set nsize=4095
>> mes = 'a' * (4095-12)
>> mesecc = rsc.encode(mes)
>> mesecc[2] = 1
>> mesecc[-1] = 1
>> rmes, rmesecc, errata_pos = rsc.decode(mesecc)
>> rsc.check(mesecc)
[False]
>> rsc.check(rmesecc)
[True]
Note that the ``RSCodec`` class supports transparent chunking, so you don't need to increase the Galois Field to support longer messages, but characters will still be limited to 256 bits (or
whatever field you set with ``c_exp``).
Low-level usage via direct access to math functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you want full control, you can skip the API and directly use the library as-is. Here's how:
First you need to init the precomputed tables:
.. code:: python
>> import reedsolo as rs
>> rs.init_tables(0x11d)
Pro tip: if you get the error: ValueError: byte must be in range(0, 256), please check that your prime polynomial is correct for your field.
Pro tip2: by default, you can only encode messages of max length and max symbol value = 256. If you want to encode bigger messages,
please use the following (where c_exp is the exponent of your Galois Field, eg, 12 = max length 2^12 = 4096):
.. code:: python
>> prim = rs.find_prime_polys(c_exp=12, fast_primes=True, single=True)
>> rs.init_tables(c_exp=12, prim=prim)
Let's define our RS message and ecc size:
.. code:: python
>> n =