bitarray
efficient arrays of booleans -- C extension
Downloads: 0 (30 days)
Description
bitarray: efficient arrays of booleans
======================================
This library provides an object type which efficiently represents an array
of booleans. Bitarrays are sequence types and behave very much like usual
lists. Eight bits are represented by one byte in a contiguous block of
memory. The user can select between two representations: little-endian
and big-endian. All functionality is implemented in C.
Methods for accessing the machine representation are provided, including the
ability to import and export buffers. This allows creating bitarrays that
are mapped to other objects, including memory-mapped files.
Key features
------------
* The bit-endianness can be specified for each bitarray object, see below.
* Sequence methods: slicing (including slice assignment and deletion),
operations ``+``, ``*``, ``+=``, ``*=``, the ``in`` operator, ``len()``
* Bitwise operations: ``~``, ``&``, ``|``, ``^``, ``<<``, ``>>`` (as well as
their in-place versions ``&=``, ``|=``, ``^=``, ``<<=``, ``>>=``).
* Fast methods for encoding and decoding variable bit length prefix codes.
* Bitarray objects support the buffer protocol (both importing and
exporting buffers).
* Packing and unpacking to other binary data formats, e.g. ``numpy.ndarray``.
* Pickling and unpickling of bitarray objects.
* Immutable ``frozenbitarray`` objects which are hashable
* Sequential search
* Type hinting
* Extensive test suite with about 600 unittests
* Utility module ``bitarray.util``:
* conversion to and from hexadecimal strings
* generating random bitarrays
* pretty printing
* conversion to and from integers
* creating Huffman codes
* compression of sparse bitarrays
* (de-) serialization
* various count functions
* other helpful functions
Installation
------------
Python wheels are are available on PyPI for all major platforms and Python
versions. Which means you can simply:
.. code-block:: shell-session
$ pip install bitarray
Once you have installed the package, you may want to test it:
.. code-block:: shell-session
$ python -c 'import bitarray; bitarray.test()'
bitarray is installed in: /Users/ilan/bitarray/bitarray
bitarray version: 3.8.0
sys.version: 3.13.5 (main, Jun 16 2025) [Clang 18.1.8]
sys.prefix: /Users/ilan/miniforge
pointer size: 64 bit
sizeof(size_t): 8
sizeof(bitarrayobject): 80
HAVE_BUILTIN_BSWAP64: 1
default bit-endianness: big
machine byte-order: little
Py_GIL_DISABLED: 0
Py_DEBUG: 0
DEBUG: 0
.........................................................................
.........................................................................
................................................................
----------------------------------------------------------------------
Ran 595 tests in 0.165s
OK
The ``test()`` function is part of the API. It will return
a ``unittest.runner.TextTestResult`` object, such that one can verify that
all tests ran successfully by:
.. code-block:: python
import bitarray
assert bitarray.test().wasSuccessful()
Usage
-----
As mentioned above, bitarray objects behave very much like lists, so
there is not too much to learn. The biggest difference from list
objects (except that bitarray are obviously homogeneous) is the ability
to access the machine representation of the object.
When doing so, the bit-endianness is of importance; this issue is
explained in detail in the section below. Here, we demonstrate the
basic usage of bitarray objects:
.. code-block:: python
>>> from bitarray import bitarray
>>> a = bitarray() # create empty bitarray
>>> a.append(1)
>>> a.extend([1, 0])
>>> a
bitarray('110')
>>> x = bitarray(2 ** 20) # bitarray of length 1048576 (initialized to 0)
>>> len(x)
1048576
>>> bitarray('1001 011') # initialize from string (whitespace is ignored)
bitarray('1001011')
>>> lst = [1, 0, False, True, True]
>>> a = bitarray(lst) # initialize from iterable
>>> a
bitarray('10011')
>>> a[2] # indexing a single item will always return an integer
0
>>> a[2:4] # whereas indexing a slice will always return a bitarray
bitarray('01')
>>> a[2:3] # even when the slice length is just one
bitarray('0')
>>> a.count(1)
3
>>> a.remove(0) # removes first occurrence of 0
>>> a
bitarray('1011')
Like lists, bitarray objects support slice assignment and deletion:
.. code-block:: python
>>> a = bitarray(50)
>>> a.setall(0) # set all elements in a to 0
>>> a[11:37:3] = 9 * bitarray('1')
>>> a
bitarray('00000000000100100100100100100100100100000000000000')
>>> del a[12::3]
>>> a
bitarray('0000000000010101010101010101000000000')
>>> a[-6:] = bitarray('10011')
>>> a
bitarray('000000000001010101010101010100010011')
>>> a += bitarray('000111')
>>> a[9:]
bitarray('001010101010101010100010011000111')
In addition, slices can be assigned to booleans, which is easier (and
faster) than assigning to a bitarray in which all values are the same:
.. code-block:: python
>>> a = 20 * bitarray('0')
>>> a[1:15:3] = True
>>> a
bitarray('01001001001001000000')
This is easier and faster than:
.. code-block:: python
>>> a = 20 * bitarray('0')
>>> a[1:15:3] = 5 * bitarray('1')
>>> a
bitarray('01001001001001000000')
Note that in the latter we have to create a temporary bitarray whose length
must be known or calculated. Another example of assigning slices to Booleans,
is setting ranges:
.. code-block:: python
>>> a = bitarray(30)
>>> a[:] = 0 # set all elements to 0 - equivalent to a.setall(0)
>>> a[10:25] = 1 # set elements in range(10, 25) to 1
>>> a
bitarray('000000000011111111111111100000')
As of bitarray version 2.8, indices may also be lists of arbitrary
indices (like in NumPy), or bitarrays that are treated as masks,
see `Bitarray indexing <https://github.com/ilanschnell/bitarray/blob/master/doc/indexing.rst>`__.
Bitwise operators
-----------------
Bitarray objects support the bitwise operators ``~``, ``&``, ``|``, ``^``,
``<<``, ``>>`` (as well as their in-place versions ``&=``, ``|=``, ``^=``,
``<<=``, ``>>=``). The behavior is very much what one would expect:
.. code-block:: python
>>> a = bitarray('101110001')
>>> ~a # invert
bitarray('010001110')
>>> b = bitarray('111001011')
>>> a ^ b # bitwise XOR
bitarray('010111010')
>>> a &= b # inplace AND
>>> a
bitarray('101000001')
>>> a <<= 2 # in-place left-shift by 2
>>> a
bitarray('100000100')
>>> b >> 1 # return b right-shifted by 1
bitarray('011100101')
The C language does not specify the behavior of negative shifts and
of left shifts larger or equal than the width of the promoted left operand.
The exact behavior is compiler/machine specific.
This Python bitarray library specifies the behavior as follows:
* the length of the bitarray is never changed by any shift operation
* blanks are filled by 0
* negative shifts raise ``ValueError``
* shifts larger or equal to the length of the bitarray result in
bitarrays with all values 0
It is worth noting that (regardless of bit-endianness) the bitarray left
shift (``<<``) always shifts towards lower indices, and the right
shift (``>>``) always shifts towards higher indices.
Bit-endianness
--------------
For many purposes the bit-endianness is not of any relevance to the end user
and can be regarded as an implementation detail of bitarray objects.
However, there are use cases when the bit-endianness becomes important.
These use cases involve explicitly reading and writing the bitarray buffer
using ``.tobytes()``, ``.frombytes()``, ``.tofile()`` or ``.fromfile()``,
importing and exporting buffers. Also, a number of utility functions
in ``bitarray.util`` will return different results depending on
bit-endianness, such as ``ba2hex()`` or ``ba2int``.
To better understand this topic, please read `bit-endianness <https://github.com/ilanschnell/bitarray/blob/master/doc/endianness.rst>`__.
Buffer protocol
---------------
Bitarray objects support the buffer protocol. They can both export their
own buffer, as well as import another object's buffer. To learn more about
this topic, please read `buffer protocol <https://github.com/ilanschnell/bitarray/blob/master/doc/buffer.rst>`__. There is also an example that shows how
to memory-map a file to a bitarray: `mmapped-file.py <https://github.com/ilanschnell/bitarray/blob/master/examples/mmapped-file.py>`__
Variable bit length prefix codes
--------------------------------
The ``.encode()`` method takes a dictionary mapping symbols to bitarrays
and an iterable, and extends the bitarray object with the encoded symbols
found while iterating. For example:
.. code-block:: python
>>> d = {'H':bitarray('111'), 'e':bitarray('0'),
... 'l':bitarray('110'), 'o':bitarray('10')}
...
>>> a = bitarray()
>>> a.encode(d, 'Hello')
>>> a
bitarray('111011011010')
Note that the string ``'Hello'`` is an iterable, but the symbols are not
limited to characters, in fact any immutable Python object can be a symbol.
Taking the same dictionary, we can apply the ``.decode()`` method which will
return an iterable of the symbols:
.. code-block:: python
>>> list(a.decode(d))
['H', 'e', 'l', 'l', 'o']
>>> ''.join(a.decode(d))
'Hello'
Symbols are not limited to being characters.
The above dictionary ``d`` can be efficiently constructed using the function
``bitarray.util.huffman_code()``. I also wrote `Huffman coding in Python
using bitarray <http://ilan.schnell-web.net/prog/huffman/>`__ for more
background information.
When the codes are large, and you have many decode calls, most time will
be spent creating the (same) internal decode tree objects. In this case,
it will b