delphin.lnk

Surface alignment for semantic entities.

In DELPH-IN semantic representations, entities are aligned to the input surface string is through the so-called “lnk” (pronounced “link”) values. There are four types of lnk values which align to the surface in different ways:

  • Character spans (also called “characterization pointers”); e.g., <0:4>

  • Token indices; e.g., <0 1 3>

  • Chart vertex spans; e.g., <0#2>

  • Edge identifier; e.g., <@42>

The latter two are unlikely to be encountered by users. Chart vertices were used by the PET parser but are now essentially deprecated and edge identifiers are only used internally in the LKB for generation. I will therefore focus on the first two kinds.

Character spans (sometimes called “characterization pointers”) are by far the most commonly used type—possibly even the only type most users will encounter. These spans indicate the positions between characters in the input string that correspond to a semantic entity, similar to how Python and Perl do string indexing. For example, <0:4> would capture the first through fourth characters—a span that would correspond to the first word in a sentence like “Dogs bark”. These spans assume the input is a flat, or linear, string and can only select contiguous chunks. Character spans are used by REPP (the Regular Expression PreProcessor; see delphin.repp) to track the surface alignment prior to string changes introduced by tokenization.

Token indices select input tokens rather than characters. This method, though not widely used, is more suitable for input sources that are not flat strings (e.g., a lattice of automatic speech recognition (ASR) hypotheses), or where non-contiguous sequences are needed (e.g., from input containing markup or other noise).

Note

Much of this background is from comments in the LKB source code: See: http://svn.emmtee.net/trunk/lingo/lkb/src/mrs/lnk.lisp

Support for lnk values in PyDelphin is rather simple. The Lnk class is able to parse lnk strings and model the contents for serialization of semantic representations. In addition, semantic entities such as DMRS Nodes and MRS EPs have cfrom and cto attributes which are the start and end pointers for character spans (defaulting to -1 if a character span is not specified for the entity).

Classes

class delphin.lnk.Lnk(arg, data=None)[source]

Surface-alignment information for predications.

Lnk objects link predicates to the surface form in one of several ways, the most common of which being the character span of the original string.

Valid types and their associated data shown in the table below.

type

data

example

Lnk.CHARSPAN

surface string span

(0, 5)

Lnk.CHARTSPAN

chart vertex span

(0, 5)

Lnk.TOKENS

token identifiers

(0, 1, 2)

Lnk.EDGE

edge identifier

1

Parameters:
  • arg – Lnk type or the string representation of a Lnk

  • data – alignment data (assumes arg is a Lnk type)

type

the way the Lnk relates the semantics to the surface form

Type:

int

data

the alignment data (depends on the Lnk type)

Type:

int | Tuple[int, …]

Example

>>> Lnk('<0:5>').data
(0, 5)
>>> str(Lnk.charspan(0,5))
'<0:5>'
>>> str(Lnk.chartspan(0,5))
'<0#5>'
>>> str(Lnk.tokens([0,1,2]))
'<0 1 2>'
>>> str(Lnk.edge(1))
'<@1>'
classmethod charspan(start, end)[source]

Create a Lnk object for a character span.

Parameters:
  • start – the initial character position (cfrom)

  • end – the final character position (cto)

classmethod chartspan(start, end)[source]

Create a Lnk object for a chart span.

Parameters:
  • start – the initial chart vertex

  • end – the final chart vertex

classmethod default()[source]

Create a Lnk object for when no information is given.

classmethod edge(edge)[source]

Create a Lnk object for an edge (used internally in generation).

Parameters:

edge – an edge identifier

classmethod tokens(tokens)[source]

Create a Lnk object for a token range.

Parameters:

tokens – a list of token identifiers

class delphin.lnk.LnkMixin(lnk=None, surface=None)[source]

A mixin class for adding cfrom and cto properties on structures.

property cfrom

The initial character position in the surface string.

Defaults to -1 if there is no valid cfrom value.

property cto

The final character position in the surface string.

Defaults to -1 if there is no valid cto value.

Exceptions

exception delphin.lnk.LnkError(*args, **kwargs)[source]

Bases: PyDelphinException

Raised on invalid Lnk values or operations.