flexparser
Parsing made fun ... using typing.
Rank: #2044Downloads: 3,588,021 (30 days)Stars: 4Forks: 5
Description
.. image:: https://img.shields.io/pypi/v/flexparser.svg
:target: https://pypi.python.org/pypi/flexparser
:alt: Latest Version
.. image:: https://img.shields.io/pypi/l/flexparser.svg
:target: https://pypi.python.org/pypi/flexparser
:alt: License
.. image:: https://img.shields.io/pypi/pyversions/flexparser.svg
:target: https://pypi.python.org/pypi/flexparser
:alt: Python Versions
.. image:: https://github.com/hgrecco/flexparser/workflows/CI/badge.svg
:target: https://github.com/hgrecco/flexparser/actions?query=workflow%3ACI
:alt: CI
.. image:: https://github.com/hgrecco/flexparser/workflows/Lint/badge.svg
:target: https://github.com/hgrecco/flexparser/actions?query=workflow%3ALint
:alt: LINTER
.. image:: https://coveralls.io/repos/github/hgrecco/flexparser/badge.svg?branch=main
:target: https://coveralls.io/github/hgrecco/flexparser?branch=main
:alt: Coverage
flexparser
==========
Why write another parser? I have asked myself the same question while
working on this project. It is clear that there are excellent parsers out
there but I wanted to experiment with another way of writing them.
The idea is quite simple. You write a class for every type of content
(called here ``ParsedStatement``) you need to parse. Each class should
have a ``from_string`` constructor. We used extensively the ``typing``
module to make the output structure easy to use and less error prone.
For example:
.. code-block:: python
from dataclasses import dataclass
import flexparser as fp
@dataclass(frozen=True)
class Assigment(fp.ParsedStatement):
"""Parses the following `this <- other`
"""
lhs: str
rhs: str
@classmethod
def from_string(cls, s):
lhs, rhs = s.split("<-")
return cls(lhs.strip(), rhs.strip())
(using a frozen dataclass is not necessary but it convenient. Being a
dataclass you get the init, str, repr, etc for free. Being frozen, sort
of immutable, makes them easier to reason around)
In certain cases you might want to signal the parser
that his class is not appropriate to parse the statement.
.. code-block:: python
@dataclass(frozen=True)
class Assigment(fp.ParsedStatement):
"""Parses the following `this <- other`
"""
lhs: str
rhs: str
@classmethod
def from_string(cls, s):
if "<-" not in s:
# This means: I do not know how to parse it
# try with another ParsedStatement class.
return None
lhs, rhs = s.split("<-")
return cls(lhs.strip(), rhs.strip())
You might also want to indicate that this is the right ``ParsedStatement``
but something is not right:
.. code-block:: python
@dataclass(frozen=True)
class InvalidIdentifier(fp.ParsingError):
value: str
@dataclass(frozen=True)
class Assigment(fp.ParsedStatement):
"""Parses the following `this <- other`
"""
lhs: str
rhs: str
@classmethod
def from_string(cls, s):
if "<-" not in s:
# This means: I do not know how to parse it
# try with another ParsedStatement class.
return None
lhs, rhs = (p.strip() for p in s.split("<-"))
if not str.isidentifier(lhs):
return InvalidIdentifier(lhs)
return cls(lhs, rhs)
Put this into ``source.txt``
.. code-block:: text
one <- other
2two <- new
three <- newvalue
one == three
and then run the following code:
.. code-block:: python
parsed = fp.parse("source.txt", Assigment)
for el in parsed.iter_statements():
print(repr(el))
will produce the following output:
.. code-block:: text
BOF(start_line=0, start_col=0, end_line=0, end_col=0, raw=None, content_hash=Hash(algorithm_name='blake2b', hexdigest='37bc23cde7cad3ece96b7abf64906c84decc116de1e0486679eb6ca696f233a403f756e2e431063c82abed4f0e342294c2fe71af69111faea3765b78cb90c03f'), path=PosixPath('/Users/grecco/Documents/code/flexparser/examples/in_readme/source1.txt'), mtime=1658550284.9419456)
Assigment(start_line=1, start_col=0, end_line=1, end_col=12, raw='one <- other', lhs='one', rhs='other')
InvalidIdentifier(start_line=2, start_col=0, end_line=2, end_col=11, raw='2two <- new', value='2two')
Assigment(start_line=3, start_col=0, end_line=3, end_col=17, raw='three <- newvalue', lhs='three', rhs='newvalue')
UnknownStatement(start_line=4, start_col=0, end_line=4, end_col=12, raw='one == three')
EOS(start_line=5, start_col=0, end_line=5, end_col=0, raw=None)
The result is a collection of ``ParsedStatement`` or ``ParsingError`` (flanked by
``BOF`` and ``EOS`` indicating beginning of file and ending of stream respectively
Alternative, it can beginning with ``BOR`` with means beginning of resource and it
is used when parsing a Python Resource provided with a package).
Notice that there are two correctly parsed statements (``Assigment``), one
error found (``InvalidIdentifier``) and one unknown (``UnknownStatement``).
Cool, right? Just writing a ``from_string`` method that outputs a datastructure
produces a usable structure of parsed objects.
Now what? Let's say we want to support equality comparison. Simply do:
.. code-block:: python
@dataclass(frozen=True)
class EqualityComparison(fp.ParsedStatement):
"""Parses the following `this == other`
"""
lhs: str
rhs: str
@classmethod
def from_string(cls, s):
if "==" not in s:
return None
lhs, rhs = (p.strip() for p in s.split("=="))
return cls(lhs, rhs)
parsed = fp.parse("source.txt", (Assigment, Equality))
for el in parsed.iter_statements():
print(repr(el))
and run it again:
.. code-block:: text
BOF(start_line=0, start_col=0, end_line=0, end_col=0, raw=None, content_hash=Hash(algorithm_name='blake2b', hexdigest='37bc23cde7cad3ece96b7abf64906c84decc116de1e0486679eb6ca696f233a403f756e2e431063c82abed4f0e342294c2fe71af69111faea3765b78cb90c03f'), path=PosixPath('/Users/grecco/Documents/code/flexparser/examples/in_readme/source1.txt'), mtime=1658550284.9419456)
Assigment(start_line=1, start_col=0, end_line=1, end_col=12, raw='one <- other', lhs='one', rhs='other')
InvalidIdentifier(start_line=2, start_col=0, end_line=2, end_col=11, raw='2two <- new', value='2two')
Assigment(start_line=3, start_col=0, end_line=3, end_col=17, raw='three <- newvalue', lhs='three', rhs='newvalue')
EqualityComparison(start_line=4, start_col=0, end_line=4, end_col=12, raw='one == three', lhs='one', rhs='three')
EOS(start_line=5, start_col=0, end_line=5, end_col=0, raw=None)
You need to group certain statements together: welcome to ``Block``
This construct allows you to group
.. code-block:: python
class Begin(fp.ParsedStatement):
@classmethod
def from_string(cls, s):
if s == "begin":
return cls()
return None
class End(fp.ParsedStatement):
@classmethod
def from_string(cls, s):
if s == "end":
return cls()
return None
class ParserConfig:
pass
class AssigmentBlock(fp.Block[Begin, Assigment, End, ParserConfig]):
pass
parsed = fp.parse("source.txt", (AssigmentBlock, Equality))
Run the code:
.. code-block:: text
BOF(start_line=0, start_col=0, end_line=0, end_col=0, raw=None, content_hash=Hash(algorithm_name='blake2b', hexdigest='37bc23cde7cad3ece96b7abf64906c84decc116de1e0486679eb6ca696f233a403f756e2e431063c82abed4f0e342294c2fe71af69111faea3765b78cb90c03f'), path=PosixPath('/Users/grecco/Documents/code/flexparser/examples/in_readme/source1.txt'), mtime=1658550284.9419456)
UnknownStatement(start_line=1, start_col=0, end_line=1, end_col=12, raw='one <- other')
UnknownStatement(start_line=2, start_col=0, end_line=2, end_col=11, raw='2two <- new')
UnknownStatement(start_line=3, start_col=0, end_line=3, end_col=17, raw='three <- newvalue')
UnknownStatement(start_line=4, start_col=0, end_line=4, end_col=12, raw='one == three')
EOS(start_line=5, start_col=0, end_line=5, end_col=0, raw=None)
Notice that there are a lot of ``UnknownStatement`` now, because we instructed
the parser to only look for assignment within a block. So change your text file to:
.. code-block:: text
begin
one <- other
2two <- new
three <- newvalue
end
one == three
and try again:
.. code-block:: text
BOF(start_line=0, start_col=0, end_line=0, end_col=0, raw=None, content_hash=Hash(algorithm_name='blake2b', hexdigest='3d8ce0051dcdd6f0f80ef789a0df179509d927874f242005ac41ed886ae0b71a30b845b9bfcb30194461c0ef6a3ca324c36f411dfafc7e588611f1eb0269bb5a'), path=PosixPath('/Users/grecco/Documents/code/flexparser/examples/in_readme/source2.txt'), mtime=1658550707.1248093)
Begin(start_line=1, start_col=0, end_line=1, end_col=5, raw='begin')
Assigment(start_line=2, start_col=0, end_line=2, end_col=12, raw='one <- other', lhs='one', rhs='other')
InvalidIdentifier(start_line=3, start_col=0, end_line=3, end_col=11, raw='2two <- new', value='2two')
Assigment(start_line=4, start_col=0, end_line=4, end_col=17, raw='three <- newvalue', lhs='three', rhs='newvalue')
End(start_line=5, start_col=0, end_line=5, end_col=3, raw='end')
EqualityComparison(start_line=6, start_col=0, end_line=6, end_col=12, raw='one == three', lhs='one', rhs='three')
EOS(start_line=7, start_col=0, end_line=7, end_col=0, raw=None)
Until now we have used ``parsed.iter_statements`` to iterate over all parsed statements.
But let's look inside ``parsed``, an object of ``ParsedProject`` type. It is a thin wrapper
over a dictionary mapping files to parsed content. Because we have provided a single file
and this does not contain a link another, our `