Working with Semantic Structures

PyDelphin accommodates three kinds of semantic structures:

MRS is the original underspecified representation in DELPH-IN, and is the only one directly output when parsing with DELPH-IN grammars. In PyDelphin, all three implement the SemanticStructure interface, while MRS and DMRS additionally implement the ScopingSemanticStructure interface. Common properties of SemanticStructure include a notion of the top of the graph and a list of Predications. The following ASCII-diagram illustrates the class hierarchy of these representations:

+----------------------+
| delphin.lnk.LnkMixin |--------------------------+
+----------------------+                          |
  |                                               |
  |  +-----------------------------------+        |  +-----------------------------+
  +--| delphin.sembase.SemanticStructure |        +--| delphin.sembase.Predication |
     +-----------------------------------+           +-----------------------------+
       |                                               |
       |  +-----------------+                          |  +------------------+
       +--| delphin.eds.EDS |                          +--| delphin.eds.Node |
       |  +-----------------+                          |  +------------------+
       |                                               |
       |  +----------------------------------------+   |
       +--| delphin.scope.ScopingSemanticStructure |   |
          +----------------------------------------+   |
            |                                          |
            |  +-----------------+                     |  +----------------+
            +--| delphin.mrs.MRS |                     +--| delphin.mrs.EP |
            |  +-----------------+                     |  +----------------+
            |                                          |
            |  +-------------------+                   |  +-------------------+
            +--| delphin.dmrs.DMRS |                   +--| delphin.dmrs.Node |
               +-------------------+                      +-------------------+

Basic Semantic Structures

The basic SemanticStructure interface provides methods for inspecting a structure’s predications and arguments, morphosemantic properties, and quantification structure. First let’s load an MRS to play with:

>>> from delphin.codecs import simplemrs
>>> # Load MRS for "They have enough capital to build a second factory."
>>> # (Tanaka Corpus i-id=30000034)
>>> m = simplemrs.decode('''
...   [ LTOP: h0 INDEX: e2 [ e SF: prop TENSE: pres MOOD: indicative PROG: - PERF: - ]
...     RELS: < [ pron<0:4> LBL: h4 ARG0: x3 [ x PERS: 3 NUM: pl IND: + PT: std ] ]
...             [ pronoun_q<0:4> LBL: h5 ARG0: x3 RSTR: h6 BODY: h7 ]
...             [ _have_v_1<5:9> LBL: h1 ARG0: e2 ARG1: x3 ARG2: x8 [ x PERS: 3 NUM: sg ] ]
...             [ _enough_q<10:16> LBL: h9 ARG0: x8 RSTR: h10 BODY: h11 ]
...             [ _capital_n_1<17:24> LBL: h12 ARG0: x8 ]
...             [ with_p<25:51> LBL: h12 ARG0: e13 [ e SF: prop ] ARG1: e14 [ e SF: prop-or-ques TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG2: x8 ]
...             [ _build_v_1<28:33> LBL: h12 ARG0: e14 ARG1: i15 ARG2: x16 [ x PERS: 3 NUM: sg IND: + ] ]
...             [ _a_q<34:35> LBL: h17 ARG0: x16 RSTR: h18 BODY: h19 ]
...             [ ord<36:42> LBL: h20 CARG: "2" ARG0: e22 [ e SF: prop TENSE: untensed MOOD: indicative PROG: bool PERF: - ] ARG1: x16 ]
...             [ _factory_n_1<43:51> LBL: h20 ARG0: x16 ] >
...     HCONS: < h0 qeq h1 h6 qeq h4 h10 qeq h12 h18 qeq h20 >
...     ICONS: < > ]''')

Then the basic structure can be inspected as follows:

>>> m.top
'h0'
>>> len(m.predications)
10

These two attributes are the only two described by the SemanticStructure interface and subclasses then define additional data structures. For instance, MRS has several additional attributes:

>>> m.index
'e2'
>>> len(m.rels)  # m.rels is equivalent to m.predications
10
>>> len(m.hcons)
4
>>> len(m.icons)
0
>>> list(m.variables)
['e2', 'x3', 'h6', 'h7', 'x8', 'h10', 'h11', 'e13', 'e14', 'i15', 'x16', 'h18', 'h19', 'e22', 'h0', 'h1', 'h4', 'h12', 'h20', 'h5', 'h9', 'h17']

The basic interface for predications is defined by the Predication class:

>>> p = m.predications[2]  # for MRS, same as m.rels[2]
>>> p.id  # see note below
'e2'
>>> p.predicate
'_have_v_1'
>>> p.type
'e'

Note that while EDS and DMRS have unique ids for each node, MRS does not formally guarantee unique ids for each of its Elementary Predications, but PyDelphin creates one for each EP in an MRS. These ids are used for some methods on SemanticStructure instances, as exemplified in a later example.

For MRS, the EP subclass is used for predications, defining some additional attributes:

>>> p.label
'h1'
>>> p.iv  # intrinsic variable
'e2'
>>> p.args
{'ARG0': 'e2', 'ARG1': 'x3', 'ARG2': 'x8'}

SemanticStructure also defines methods for getting at information that may be implemented differently by subclasses. For instance, MRS and EDS define arguments (or edges) on their respective Predication objects, while DMRS lists them separately as links, but the SemanticStructure.arguments method works for all representations, and returns a dictionary mapping predication ids to lists of role-argument pairs for all outgoing arguments (MRS has ARG0 intrinsic arguments and CARG constant arguments which are not represented as arguments in EDS and DMRS, so these are accessed separately).

>>> for id, args in m.arguments().items():
...     print(id, args)
...
x3 []
q3 [('RSTR', 'h6'), ('BODY', 'h7')]
e2 [('ARG1', 'x3'), ('ARG2', 'x8')]
q8 [('RSTR', 'h10'), ('BODY', 'h11')]
x8 []
e13 [('ARG1', 'e14'), ('ARG2', 'x8')]
e14 [('ARG1', 'i15'), ('ARG2', 'x16')]
q16 [('RSTR', 'h18'), ('BODY', 'h19')]
e22 [('ARG1', 'x16')]
x16 []

Testing for and listing quantifiers also happens at the semantic structure level as it is more reliable than testing individual predications:

>>> m.is_quantifier('x3')
False
>>> m.is_quantifier('q3')  # use id, not intrinsic variable
True
>>> for p, q in m.quantification_pairs():
...     if q is None:  # unquantified predication
...         print('{}:{} (none)'.format(p.id, p.predicate))
...     else:
...         print('{}:{} ({}:{})'.format(p.id, p.predicate, q.id, q.predicate))
...
x3:pron (q3:pronoun_q)
e2:_have_v_1 (none)
x8:_capital_n_1 (q8:_enough_q)
e13:with_p (none)
e14:_build_v_1 (none)
e22:ord (none)
x16:_factory_n_1 (q16:_a_q)

Morphosemantic properties can be retrieved by a predication’s id:

>>> p = m.predications[2]
>>> m.properties(p.id)
{'SF': 'prop', 'TENSE': 'pres', 'MOOD': 'indicative', 'PROG': '-', 'PERF': '-'}

In MRS, they are also available via the variables attribute with the intrinsic variable of an EP:

>>> m.variables[p.iv]
{'SF': 'prop', 'TENSE': 'pres', 'MOOD': 'indicative', 'PROG': '-', 'PERF': '-'}

EDS and DMRS objects also implement the same attributes and methods (with their own relevant additions).

>>> from delphin import eds
>>> e = eds.from_mrs(m)
>>> len(e.predications) == len(e.nodes)
True
>>> e.nodes[2].predicate
'_have_v_1'
>>> for id, args in e.arguments().items():
...     print(id, args)
x3 []
_1 [('BV', 'x3')]
e2 [('ARG1', 'x3'), ('ARG2', 'x8')]
_2 [('BV', 'x8')]
x8 []
e13 [('ARG1', 'e14'), ('ARG2', 'x8')]
e14 [('ARG2', 'x16')]
_3 [('BV', 'x16')]
e22 [('ARG1', 'x16')]
x16 []

Note that there may be some differences in identifier forms or special role names (BV above for quantifiers).

Scoping Semantic Structures

MRS and DMRS are scoping semantic representations, meaning they encode the quantifier scope, although they do so rather differently. The ScopingSemanticStructure class normalizes an interface to the scoping information via some additional methods, such as for inspecting the labeled scopes:

>>> top, scopes = m.scopes()
>>> top  # the label of the top scope, not the top handle (MRS.top)
'h1'
>>> for label, predications in scopes.items():
...     print(label, [p.predicate for p in predications])
...
h4 ['pron']
h5 ['pronoun_q']
h1 ['_have_v_1']
h9 ['_enough_q']
h12 ['_capital_n_1', 'with_p', '_build_v_1']
h17 ['_a_q']
h20 ['ord', '_factory_n_1']

The scopal argument structure is also available:

>>> for id, args in m.scopal_arguments().items():
...     print(id, args)
...
x3 []
q3 [('RSTR', 'qeq', 'h4')]
e2 []
q8 [('RSTR', 'qeq', 'h12')]
x8 []
e13 []
e14 []
q16 [('RSTR', 'qeq', 'h20')]
e22 []
x16 []

Note that unlike arguments(), these return triples whose second member is the scopal relationship between the id and the scope label.

DMRS works similarly:

>>> from delphin import dmrs
>>> d = dmrs.from_mrs(m)
>>> top, scopes = d.scopes()
>>> top
'h2'
>>> for label, predications in scopes.items():
...     print(label, [p.predicate for p in predications])
...
h0 ['pron']
h1 ['pronoun_q']
h2 ['_have_v_1']
h3 ['_enough_q']
h6 ['_build_v_1', '_capital_n_1', 'with_p']
h7 ['_a_q']
h9 ['_factory_n_1', 'ord']

Because DMRS does not natively have scope labels, they are generated by DMRS.scopes. It is thus recommended to pass these generated scopes to other methods rather than generating them over again, both for computational efficiency and consistency:

>>> for id, args in d.scopal_arguments(scopes=scopes).items():
...     print(id, args)
...
10000 []
10001 [('RSTR', 'qeq', 'h8')]
10002 []
10003 [('RSTR', 'qeq', 'h8')]
10004 []
10005 []
10006 []
10007 [('RSTR', 'qeq', 'h8')]
10008 []
10009 []

Well-formed Structures

While it is possible to manipulate and create MRS, EDS, and DMRS objects, there is no guarantee that these actions result in a well-formed semantic structure. Well-formedness is crucial for certain operations, such as realizing sentences with a grammar or converting between representations. The delphin.mrs module has a number of functions for testing various facets of well-formedness:

>>> mrs.is_connected(m)
True
>>> mrs.has_intrinsic_variable_property(m)
True
>>> mrs.plausibly_scopes(m)
True
>>> mrs.is_well_formed(m)
True