Walkthrough of PyDelphin Features
=================================
This guide provides a tour of the main features offered by PyDelphin.
ACE and Web Interfaces
----------------------
PyDelphin works with a number of data types, and a simple way to get
some data to play with is to parse a sentence. PyDelphin doesn't parse
things on its own, but it provides two interfaces to external
processors: one for the `ACE `_
processor and another for the `HTTP-based "Web API"
`_. I'll first show the Web API
as it's the simplest for parsing a single sentence:
>>> from delphin.web import client
>>> response = client.parse('Abrams chased Browne', params={'mrs': 'json'})
>>> response.result(0).mrs()
The response object returned by interfaces is a basic dictionary that
has been augmented with convenient access methods (such as `result()`
and `mrs()` above). Note that the Web API is platform-neutral, and is
thus currently the only way to dynamically retrieve parses in PyDelphin
on a Windows machine.
.. seealso::
- Wiki for the Web API: https://github.com/delph-in/docs/wiki/ErgApi
- Bottlenose server: https://github.com/delph-in/bottlenose
- :mod:`delphin.web` module
- :mod:`delphin.interface` module
If you're on a Linux or Mac machine and have
`ACE `_ installed and a grammar
image available, you can use the ACE interface, which is faster than
the Web API and returns more complete response information.
>>> from delphin import ace
>>> grm = '~/grammars/erg-2018-x86-64-0.9.30.dat'
>>> response = ace.parse(grm, 'Abrams chased Browne')
NOTE: parsed 1 / 1 sentences, avg 2135k, time 0.01316s
>>> response.result(0).mrs()
.. seealso::
- ACE: http://sweaglesw.org/linguistics/ace/
- :mod:`delphin.ace` module
- :doc:`ace`
I will use the `response` object from ACE to illustrate some other
features below.
Inspecting Semantic Structures
------------------------------
The original motivation for PyDelphin and the area with the most work
is in modeling DELPH-IN Semantics representations such as MRS.
>>> m = response.result(0).mrs()
>>> [ep.predicate for ep in m.rels]
['proper_q', 'named', '_chase_v_1', 'proper_q', 'named']
>>> list(m.variables)
['h0', 'e2', 'h4', 'x3', 'h5', 'h6', 'h7', 'h1', 'x9', 'h10', 'h11', 'h12', 'h13']
>>> # get an EP by its ID (generally its intrinsic variable)
>>> m['x3']
>>> # quantifier IDs generally just replace 'x' with 'q'
>>> m['q3']
>>> # but if you want to be more careful you can do this...
>>> qmap = {p.iv: q for p, q in m.quantification_pairs()}
>>> qmap['x3']
>>> # EP arguments are available on the EPs
>>> m['e2'].args
{'ARG0': 'e2', 'ARG1': 'x3', 'ARG2': 'x9'}
>>> # While HCONS are available on the MRS
>>> [(hc.hi, hc.relation, hc.lo) for hc in m.hcons]
[('h0', 'qeq', 'h1'), ('h5', 'qeq', 'h7'), ('h11', 'qeq', 'h13')]
.. seealso::
- Wiki of MRS topics: https://github.com/delph-in/docs/wiki/RmrsTop
- :mod:`delphin.mrs` module
- :doc:`semantics`
Beyond the basic modeling of semantic structures, there are some
semantic operations defined in the :mod:`delphin.mrs` module.
>>> from delphin import mrs
>>> mrs.is_isomorphic(m, m)
True
>>> mrs.is_isomorphic(m, response.result(1).mrs())
False
>>> mrs.has_intrinsic_variable_property(m)
True
>>> mrs.is_connected(m)
True
.. seealso::
- MRS isomorphism wiki: https://github.com/delph-in/docs/wiki/MrsIsomorphism
Scoping semantic structures such as MRS and DMRS can make use of the
:mod:`delphin.scope` module, which allows for inspection of the scope
structures:
>>> from delphin import scope
>>> _response = ace.parse(grm, "Kim didn't think that Sandy left.")
>>> descendants = scope.descendants(_response.result(0).mrs())
>>> for id, ds in descendants.items():
... print(m[id].predicate, [d.predicate for d in ds])
...
proper_q ['named']
named []
neg ['_think_v_1', '_leave_v_1']
_think_v_1 ['_leave_v_1']
_leave_v_1 []
proper_q ['named']
named []
.. seealso::
- :mod:`delphin.scope` module
Converting Semantic Representations
-----------------------------------
Conversions between MRS, DMRS, and EDS representations are a single
function call in PyDelphin. The converted representation has its own
data structures so it can be inspected and manipulated in a natural
way for the respective formalism. Here is DMRS conversion from MRS:
>>> from delphin import dmrs
>>> dmrs.from_mrs(m)
And EDS conversion from MRS:
>>> from delphin import eds
>>> eds.from_mrs(m)
It is also possible to convert to MRS from DMRS.
Serializing Semantic Representations
------------------------------------
The DELPH-IN community has designed many serialization formats of the
semantic representations for various uses. For instance, the JSON
formats are used in the Web API, and the PENMAN formats are sometimes
used in machine learning applications. PyDelphin implements almost all
of these formats, available in the :doc:`../api/delphin.codecs`
namespace.
>>> from delphin.codecs import simplemrs, mrx
>>> print(simplemrs.encode(m, indent=True))
[ TOP: h0
INDEX: e2 [ e SF: prop TENSE: past MOOD: indicative PROG: - PERF: - ]
RELS: < [ proper_q<0:6> LBL: h4 ARG0: x3 [ x PERS: 3 NUM: sg IND: + ] RSTR: h5 BODY: h6 ]
[ named<0:6> LBL: h7 ARG0: x3 CARG: "Abrams" ]
[ _chase_v_1<7:13> LBL: h1 ARG0: e2 ARG1: x3 ARG2: x9 [ x PERS: 3 NUM: sg IND: + ] ]
[ proper_q<14:20> LBL: h10 ARG0: x9 RSTR: h11 BODY: h12 ]
[ named<14:20> LBL: h13 ARG0: x9 CARG: "Browne" ] >
HCONS: < h0 qeq h1 h5 qeq h7 h11 qeq h13 > ]
>>> print(mrx.encode(m, indent=True))
[...]
To serialize a different representation you must convert it first:
>>> d = dmrs.from_mrs(m)
>>> from delphin.codecs import dmrx
>>> print(dmrx.encode(d, indent=True))
[...]
>>> e = eds.from_mrs(m)
>>> from delphin.codecs import eds as edsnative # avoid name collision
>>> print(edsnative.encode(e, indent=True))
{e2:
_1:proper_q<0:6>[BV x3]
x3:named<0:6>("Abrams")[]
e2:_chase_v_1<7:13>[ARG1 x3, ARG2 x9]
_2:proper_q<14:20>[BV x9]
x9:named<14:20>("Browne")[]
}
.. seealso::
- Wiki of MRS formats: https://github.com/delph-in/docs/wiki/MrsRfc
- :doc:`../api/delphin.codecs` namespace
Some formats are currently export-only:
>>> from delphin.codecs import mrsprolog
>>> print(mrsprolog.encode(m, indent=True))
psoa(h0,e2,
[rel('proper_q',h4,
[attrval('ARG0',x3),
attrval('RSTR',h5),
attrval('BODY',h6)]),
rel('named',h7,
[attrval('CARG','Abrams'),
attrval('ARG0',x3)]),
rel('_chase_v_1',h1,
[attrval('ARG0',e2),
attrval('ARG1',x3),
attrval('ARG2',x9)]),
rel('proper_q',h10,
[attrval('ARG0',x9),
attrval('RSTR',h11),
attrval('BODY',h12)]),
rel('named',h13,
[attrval('CARG','Browne'),
attrval('ARG0',x9)])],
hcons([qeq(h0,h1),qeq(h5,h7),qeq(h11,h13)]))
Tokens and Token Lattices
-------------------------
The Response object from the interface can return both the initial
(string-level tokenization) and internal (token-mapped) tokens:
>>> response.tokens('initial')
>>> print('\n'.join(map(str,response.tokens('initial').tokens)))
(1, 0, 1, <0:6>, 1, "Abrams", 0, "null", "NNP" 1.0000)
(2, 1, 2, <7:13>, 1, "chased", 0, "null", "NNP" 1.0000)
(3, 2, 3, <14:20>, 1, "Browne", 0, "null", "NNP" 1.0000)
.. seealso::
- Wiki about YY tokens: https://github.com/delph-in/docs/wiki/PetInput
- :mod:`delphin.tokens` module
Derivations
-----------
[incr tsdb()] derivations (unambiguous "recipes" for an analysis with a
specific grammar version) are fully modeled:
>>> d = response.result(0).derivation()
>>> d.derivation().entity
'sb-hd_mc_c'
>>> d.derivation().daughters
[, ]
>>> d.derivation().terminals()
[, , ]
>>> d.derivation().preterminals()
[, , ]
.. seealso::
- Wiki about derivations: https://github.com/delph-in/docs/wiki/ItsdbDerivations
- :mod:`delphin.derivation` module
[incr tsdb()] TestSuites
------------------------
PyDelphin has full support for reading and writing [incr tsdb()]
testsuites:
>>> from delphin import itsdb
>>> ts = itsdb.TestSuite('~/grammars/erg/tsdb/gold/mrs')
>>> len(ts['item'])
107
>>> ts['item'][0]['i-input']
'It rained.'
>>> # modify a test suite in-memory
>>> ts['item'].update(0, {'i-input': 'It snowed.'})
>>> ts['item'][0]['i-input']
'It snowed.'
>>> # TestSuite.commit() writes changes to disk
>>> ts.commit()
>>> # TestSuites can be parsed with a processor like ACE
>>> from delphin import ace
>>> with ace.ACEParser('~/grammars/erg-2018-x86-64-0.9.30.dat') as cpu:
... ts.process(cpu)
...
NOTE: parsed 107 / 107 sentences, avg 4744k, time 2.93924s
.. seealso::
- [incr tsdb()] wiki: https://github.com/delph-in/docs/wiki/ItsdbTop
- :mod:`delphin.itsdb` module
- :mod:`delphin.tsdb` module, for a low-level API
- :doc:`itsdb`
TSQL Queries
------------
Partial support of the Test Suite Query Language (TSQL) allows for
easy selection of [incr tsdb()] test suite data.
>>> from delphin import tsql
>>> selection = tsql.select('i-id i-input where i-length > 5 && readings > 0', ts)
>>> next(iter(selection))
(61, 'Abrams handed the cigarette to Browne.')
.. seealso::
- TSQL documentation: http://www.delph-in.net/tsnlp/ftp/manual/volume2.ps.gz
- :mod:`delphin.tsql` module
Regular Expression Preprocessors (REPP)
---------------------------------------
PyDelphin provides a full implementation of Regular Expression
Preprocessors (REPP), including correct characterization and the
loading from `PET `_ configuration
files. Unique to PyDelphin (I think) is the ability to trace through
an application of the tokenization rules.
>>> from delphin import repp
>>> r = repp.REPP.from_config('~/grammars/erg/pet/repp.set')
>>> for tok in r.tokenize("Abrams didn't chase Browne.").tokens:
... print(tok.form, tok.lnk)
...
Abrams <0:6>
did <7:10>
n’t <10:13>
chase <14:19>
Browne <20:26>
. <26:27>
>>> for step in r.trace("Abrams didn't chase Browne."):
... if isinstance(step, repp.REPPStep):
... print('{}\t-> {}\t{}'.format(step.input, step.output, step.operation))
...
Abrams didn't chase Browne. -> Abrams didn't chase Browne. !^(.+)$ \1
Abrams didn't chase Browne. -> Abrams didn’t chase Browne. !' ’
Abrams didn't chase Browne. -> Abrams didn’t chase Browne. Internal group #1
Abrams didn't chase Browne. -> Abrams didn’t chase Browne. Internal group #1
Abrams didn't chase Browne. -> Abrams didn’t chase Browne. Module quotes
Abrams didn’t chase Browne. -> Abrams didn’t chase Browne. !^(.+)$ \1
Abrams didn’t chase Browne. -> Abrams didn’t chase Browne. ! +
Abrams didn’t chase Browne. -> Abrams didn’t chase Browne . !([^ ])(\.) ([])}”"’'… ]*)$ \1 \2 \3
Abrams didn’t chase Browne. -> Abrams didn’t chase Browne . Internal group #1
Abrams didn’t chase Browne. -> Abrams didn’t chase Browne . Internal group #1
Abrams didn’t chase Browne . -> Abrams did n’t chase Browne . !([^ ])([nN])[’']([tT]) \1 \2’\3
Abrams didn't chase Browne. -> Abrams did n’t chase Browne . Module tokenizer
Note that the trace shows the sequential order of rule applications,
but not the tree-like branching of REPP modules.
.. seealso::
- REPP wiki: https://github.com/delph-in/docs/wiki/ReppTop
- Wiki for PET's REPP configuration: https://github.com/delph-in/docs/wiki/ReppPet
- :mod:`delphin.repp` module
Type Description Language (TDL)
-------------------------------
The TDL language is fairly simple, but the interpretation of type
hierarchies (feature inheritance, re-entrancies, unification and
subsumption) can be very complex. PyDelphin has partial support for
reading TDL files. It can read nearly any kind of TDL in a DELPH-IN
grammar (type definitions, lexicons, transfer rules, etc.), but it does
not do any interpretation. It can be useful for static code analysis.
>>> from delphin import tdl
>>> lex = {}
>>> for event, obj, lineno in tdl.iterparse('~/grammars/erg/lexicon.tdl'):
... if event == 'TypeDefinition':
... lex[obj.identifier] = obj
...
>>> len(lex)
40234
>>> lex['cactus_n1']
>>> lex['cactus_n1'].supertypes
[]
>>> lex['cactus_n1'].features()
[('ORTH', ), ('SYNSEM', )]
>>> lex['cactus_n1']['ORTH'].features()
[('FIRST', ), ('REST', None)]
>>> lex['cactus_n1']['ORTH'].values()
[]
>>> lex['cactus_n1']['ORTH.FIRST']
>>> print(tdl.format(lex['cactus_n1']))
cactus_n1 := n_-_c_le &
[ ORTH < "cactus" >,
SYNSEM [ LKEYS.KEYREL.PRED "_cactus_n_1_rel",
LOCAL.AGR.PNG png-irreg,
PHON.ONSET con ] ].
.. seealso::
- A semi-formal specification of TDL: https://github.com/delph-in/docs/wiki/TdlRfc
- A grammar-engineering FAQ about TDL: https://github.com/delph-in/docs/wiki/GeFaqTdlSyntax
- :mod:`delphin.tdl` module
Semantic Interfaces (SEM-I)
---------------------------
A grammar's semantic model is encoded in the predicate inventory and
constraints of the grammar, but as the interpretation of a grammar is
non-trivial (see `Type Description Language (TDL)`_ above), using the
grammar to validate semantic representations is a significant burden. A
semantic interface (SEM-I) is a distilled and simplified representation
of a grammar's semantic model, and is thus a useful way to ensure that
grammar-external semantic representations are valid with respect to the
grammar. PyDelphin supports the reading and inspection of SEM-Is.
>>> from delphin import semi
>>> s = semi.load('~/grammars/erg/etc/erg.smi')
>>> list(s.variables)
['u', 'i', 'p', 'h', 'e', 'x']
>>> list(s.roles)
['ARG0', 'ARG1', 'ARG2', 'ARG3', 'ARG4', 'ARG', 'RSTR', 'BODY', 'CARG']
>>> s.roles['ARG2']
'u'
>>> list(s.properties)
['bool', 'tense', 'mood', 'gender', 'number', 'person', 'pt', 'sf', '+', '-', 'tensed', 'untensed', 'subjunctive', 'indicative', 'm-or-f', 'n', 'sg', 'pl', '1', '2', '3', 'refl', 'std', 'zero', 'prop-or-ques', 'comm', 'past', 'pres', 'fut', 'm', 'f', 'prop', 'ques']
>>> s.properties.children('tense')
{'untensed', 'tensed'}
>>> s.properties.descendants('tense')
{'past', 'untensed', 'tensed', 'fut', 'pres'}
>>> len(s.predicates)
23403
>>> s.predicates['_cactus_n_1']
[Synopsis([SynopsisRole(ARG0, x, {'IND': '+'}, False)])]
>>> s.predicates.descendants('some_q')
{'_what+a_q', '_some_q_indiv', '_an+additional_q', '_another_q', '_many+a_q', '_a_q', '_some_q', '_such+a_q'}
.. seealso::
- The SEM-I wikis:
- https://github.com/delph-in/docs/wiki/SemiRfc
- https://github.com/delph-in/docs/wiki/RmrsSemi
- :mod:`delphin.semi` module