delphin.tdl

Classes and functions for parsing and inspecting TDL.

This module makes it easy to inspect what is written on definitions in Type Description Language (TDL), but it doesn’t interpret type hierarchies (such as by performing unification, subsumption calculations, or creating GLB types). That is, while it wouldn’t be useful for creating a parser, it is useful if you want to statically inspect the types in a grammar and the constraints they apply.

TDL was originally described in Krieger and Schäfer, 1994 [KS1994], but it describes many features not in use by the DELPH-IN variant, such as disjunction. Copestake, 2002 [COP2002] better describes the subset in use by DELPH-IN, but it has become outdated and its TDL syntax description is inaccurate in places, but it is still a great resource for understanding the interpretation of TDL grammar descriptions. The TdlRfc page of the DELPH-IN Wiki contains the most up-to-date description of the TDL syntax used by DELPH-IN grammars, including features such as documentation strings and regular expressions.

[KS1994]Hans-Ulrich Krieger and Ulrich Schäfer. TDL: a type description language for constraint-based grammars. In Proceedings of the 15th conference on Computational linguistics, volume 2, pages 893–899. Association for Computational Linguistics, 1994.
[COP2002]Ann Copestake. Implementing typed feature structure grammars, volume 110. CSLI publications Stanford, 2002.

Module Parameters

Some aspects of TDL parsing can be customized per grammar, and the following module variables may be reassigned to accommodate those differences. For instance, in the ERG, the type used for list feature structures is *list*, while for Matrix-based grammars it is list. PyDelphin defaults to the values used by the ERG.

delphin.tdl.LIST_TYPE = '*list*'

type of lists in TDL

delphin.tdl.EMPTY_LIST_TYPE = '*null*'

type of list terminators

delphin.tdl.LIST_HEAD = 'FIRST'

feature for list items

delphin.tdl.LIST_TAIL = 'REST'

feature for list tails

delphin.tdl.DIFF_LIST_LIST = 'LIST'

feature for diff-list lists

delphin.tdl.DIFF_LIST_LAST = 'LAST'

feature for the last path in a diff-list

Functions

delphin.tdl.iterparse(source, encoding='utf-8')[source]

Parse the TDL file source and iteratively yield parse events.

If source is a filename, the file is opened and closed when the generator has finished, otherwise source is an open file object and will not be closed when the generator has finished.

Parse events are (event, object, lineno) tuples, where event is a string (“TypeDefinition”, “TypeAddendum”, “LexicalRuleDefinition”, “LetterSet”, “WildCard”, “LineComment”, or “BlockComment”), object is the interpreted TDL object, and lineno is the line number where the entity began in source.

Parameters:
  • source (str, file) – a filename or open file object
  • encoding (str) – the encoding of the file (default: “utf-8”; ignored if source is an open file)
Yields:

(event, object, lineno) tuples

Example

>>> lex = {}
>>> for event, obj, lineno in tdl.iterparse('erg/lexicon.tdl'):
...     if event == 'TypeDefinition':
...         lex[obj.identifier] = obj
...
>>> lex['eucalyptus_n1']['SYNSEM.LKEYS.KEYREL.PRED']
<String object (_eucalyptus_n_1_rel) at 140625748595960>
delphin.tdl.format(obj, indent=0)[source]

Serialize TDL objects to strings.

Parameters:
Returns:

str – serialized form of obj

Example

>>> conj = tdl.Conjunction([
...     tdl.TypeIdentifier('lex-item'),
...     tdl.AVM([('SYNSEM.LOCAL.CAT.HEAD.MOD',
...               tdl.ConsList(end=tdl.EMPTY_LIST_TYPE))])
... ])
>>> t = tdl.TypeDefinition('non-mod-lex-item', conj)
>>> print(format(t))
non-mod-lex-item := lex-item &
  [ SYNSEM.LOCAL.CAT.HEAD.MOD < > ].

Classes

The TDL entity classes are the objects returned by iterparse(), but they may also be used directly to build TDL structures, e.g., for serialization.

Terms

class delphin.tdl.Term(docstring=None)[source]

Base class for the terms of a TDL conjunction.

All terms are defined to handle the binary ‘&’ operator, which puts both into a Conjunction:

>>> TypeIdentifier('a') & TypeIdentifier('b')
<Conjunction object at 140008950372168>
Parameters:docstring (str) – documentation string
docstring

documentation string

Type:str
class delphin.tdl.TypeTerm(string, docstring=None)[source]

Bases: delphin.tdl.Term, str

Base class for type terms (identifiers, strings and regexes).

This subclass of Term also inherits from str and forms the superclass of the string-based terms TypeIdentifier, String, and Regex. Its purpose is to handle the correct instantiation of both the Term and str supertypes and to define equality comparisons such that different kinds of type terms with the same string value are not considered equal:

>>> String('a') == String('a')
True
>>> String('a') == TypeIdentifier('a')
False
class delphin.tdl.TypeIdentifier(string, docstring=None)[source]

Bases: delphin.tdl.TypeTerm

Type identifiers, or type names.

Unlike other TypeTerms, TypeIdentifiers use case-insensitive comparisons:

>>> TypeIdentifier('MY-TYPE') == TypeIdentifier('my-type')
True
Parameters:
  • string (str) – type name
  • docstring (str) – documentation string
docstring

documentation string

Type:str
class delphin.tdl.String(string, docstring=None)[source]

Bases: delphin.tdl.TypeTerm

Double-quoted strings.

Parameters:
  • string (str) – type name
  • docstring (str) – documentation string
docstring

documentation string

Type:str
class delphin.tdl.Regex(string, docstring=None)[source]

Bases: delphin.tdl.TypeTerm

Regular expression patterns.

Parameters:
  • string (str) – type name
  • docstring (str) – documentation string
docstring

documentation string

Type:str
class delphin.tdl.AVM(featvals=None, docstring=None)[source]

Bases: delphin.tfs.FeatureStructure, delphin.tdl.Term

A feature structure as used in TDL.

Parameters:
  • featvals (list, dict) – a sequence of (attribute, value) pairs or an attribute to value mapping
  • docstring (str) – documentation string
docstring

documentation string

Type:str
features(expand=False)[source]

Return the list of tuples of feature paths and feature values.

Parameters:expand (bool) – if True, expand all feature paths

Example

>>> avm = AVM([('A.B', TypeIdentifier('1')),
...            ('A.C', TypeIdentifier('2')])
>>> avm.features()
[('A', <AVM object at ...>)]
>>> avm.features(expand=True)
[('A.B', <TypeIdentifier object (1) at ...>),
 ('A.C', <TypeIdentifier object (2) at ...>)]
normalize()[source]

Reduce trivial AVM conjunctions to just the AVM.

For example, in [ ATTR1 [ ATTR2 val ] ] the value of ATTR1 could be a conjunction with the sub-AVM [ ATTR2 val ]. This method removes the conjunction so the sub-AVM nests directly (equivalent to [ ATTR1.ATTR2 val ] in TDL).

class delphin.tdl.ConsList(values=None, end='*list*', docstring=None)[source]

Bases: delphin.tdl.AVM

AVM subclass for cons-lists (< ... >)

This provides a more intuitive interface for creating and accessing the values of list structures in TDL. Some combinations of the values and end parameters correspond to various TDL forms as described in the table below:

TDL form values end state
< > None EMPTY_LIST_TYPE closed
< > None LIST_TYPE open
< a > [a] EMPTY_LIST_TYPE closed
< a, b > [a, b] EMPTY_LIST_TYPE closed
< a, > [a] LIST_TYPE open
< a . b > [a] b closed
Parameters:
  • values (list) – a sequence of Conjunction or Term objects to be placed in the AVM of the list.
  • end (str, Conjunction, Term) – last item in the list (default: LIST_TYPE) which determines if the list is open or closed
  • docstring (str) – documentation string
terminated

if False, the list can be further extended by following the LIST_TAIL features.

Type:bool
docstring

documentation string

Type:str
append(value)[source]

Append an item to the end of an open ConsList.

Parameters:value (Conjunction, Term) – item to add
Raises:TdlError – when appending to a closed list
terminate(end)[source]

Set the value of the tail of the list.

Adding values via append() places them on the FIRST feature of some level of the feature structure (e.g., REST.FIRST), while terminate() places them on the final REST feature (e.g., REST.REST). If end is a Conjunction or Term, it is typically a Coreference, otherwise end is set to tdl.EMPTY_LIST_TYPE or tdl.LIST_TYPE. This method does not necessarily close the list; if end is tdl.LIST_TYPE, the list is left open, otherwise it is closed.

Parameters:
  • end (str, Conjunction, Term) – value to
  • as the end of the list. (use) –
values()[source]

Return the list of values in the ConsList feature structure.

class delphin.tdl.DiffList(values=None, docstring=None)[source]

Bases: delphin.tdl.AVM

AVM subclass for diff-lists (<! ... !>)

As with ConsList, this provides a more intuitive interface for creating and accessing the values of list structures in TDL. Unlike ConsList, DiffLists are always closed lists with the last item coreferenced with the LAST feature, which allows for the joining of two diff-lists.

Parameters:
  • values (list) – a sequence of Conjunction or Term objects to be placed in the AVM of the list
  • docstring (str) – documentation string
last

the feature path to the list position coreferenced by the value of the DIFF_LIST_LAST feature.

Type:str
docstring

documentation string

Type:str
values()[source]

Return the list of values in the DiffList feature structure.

class delphin.tdl.Coreference(identifier, docstring=None)[source]

Bases: delphin.tdl.Term

TDL coreferences, which represent re-entrancies in AVMs.

Parameters:
  • identifier (str) – identifier or tag associated with the coreference; for internal use (e.g., in DiffList objects), the identifier may be None
  • docstring (str) – documentation string
identifier

corefernce identifier or tag

Type:str
docstring

documentation string

Type:str

Conjunctions

class delphin.tdl.Conjunction(terms=None)[source]

Conjunction of TDL terms.

Parameters:terms (list) – sequence of Term objects
add(term)[source]

Add a term to the conjunction.

Parameters:term (Term, Conjunction) – term to add; if a Conjunction, all of its terms are added to the current conjunction.
Raises:TypeError – when term is an invalid type
features(expand=False)[source]

Return the list of feature-value pairs in the conjunction.

get(key, default=None)[source]

Get the value of attribute key in any AVM in the conjunction.

Parameters:
  • key – attribute path to search
  • default – value to return if key is not defined on any AVM
normalize()[source]

Rearrange the conjunction to a conventional form.

This puts any coreference(s) first, followed by type terms, then followed by AVM(s) (including lists). AVMs are normalized via AVM.normalize().

string()[source]

Return the first string term in the conjunction, or None.

terms

The list of terms in the conjunction.

types()[source]

Return the list of type terms in the conjunction.

Type and Instance Definitions

class delphin.tdl.TypeDefinition(identifier, conjunction, docstring=None)[source]

A top-level Conjunction with an identifier.

Parameters:
  • identifier (str) – type name
  • conjunction (Conjunction, Term) – type constraints
  • docstring (str) – documentation string
identifier

type identifier

Type:str
conjunction

type constraints

Type:Conjunction
docstring

documentation string

Type:str
documentation(level='first')[source]

Return the documentation of the type.

By default, this is the first docstring on a top-level term. By setting level to “top”, the list of all docstrings on top-level terms is returned, including the type’s docstring value, if not None, as the last item. The docstring for the type itself is available via TypeDefinition.docstring.

Parameters:level (str) – “first” or “top”
Returns:a single docstring or a list of docstrings
features(expand=False)[source]

Return the list of feature-value pairs in the conjunction.

supertypes

The list of supertypes for the type.

class delphin.tdl.TypeAddendum(identifier, conjunction=None, docstring=None)[source]

Bases: delphin.tdl.TypeDefinition

An addendum to an existing type definition.

Type addenda, unlike type definitions, do not require supertypes, or even any feature constraints. An addendum, however, must have at least one supertype, AVM, or docstring.

Parameters:
  • identifier (str) – type name
  • conjunction (Conjunction, Term) – type constraints
  • docstring (str) – documentation string
identifier

type identifier

Type:str
conjunction

type constraints

Type:Conjunction
docstring

documentation string

Type:str
class delphin.tdl.LexicalRuleDefinition(identifier, affix_type, patterns, conjunction, **kwargs)[source]

Bases: delphin.tdl.TypeDefinition

An inflecting lexical rule definition.

Parameters:
  • identifier (str) – type name
  • affix_type (str) – “prefix” or “suffix”
  • patterns (list) – sequence of (match, replacement) pairs
  • conjunction (Conjunction, Term) – conjunction of constraints applied by the rule
  • docstring (str) – documentation string
identifier

type identifier

Type:str
affix_type

“prefix” or “suffix”

Type:str
patterns

sequence of (match, replacement) pairs

Type:list
conjunction

type constraints

Type:Conjunction
docstring

documentation string

Type:str

Morphological Patterns

class delphin.tdl.LetterSet(var, characters)[source]

A capturing character class for inflectional lexical rules.

LetterSets define a pattern (e.g., “!a”) that may match any one of its associated characters. Unlike WildCard patterns, LetterSet variables also appear in the replacement pattern of an affixing rule, where they insert the character matched by the corresponding letter set.

Parameters:
  • var (str) – variable used in affixing rules (e.g., “!a”)
  • characters (str) – string or collection of characters that may match an input character
var

letter-set variable

Type:str
characters

characters included in the letter-set

Type:str
class delphin.tdl.WildCard(var, characters)[source]

A non-capturing character class for inflectional lexical rules.

WildCards define a pattern (e.g., “?a”) that may match any one of its associated characters. Unlike LetterSet patterns, WildCard variables may not appear in the replacement pattern of an affixing rule.

Parameters:
  • var (str) – variable used in affixing rules (e.g., “!a”)
  • characters (str) – string or collection of characters that may match an input character
var

wild-card variable

Type:str
characters

characters included in the wild-card

Type:str

Deprecated

Use of the following functions are classes is no longer recommended, and they will be removed in a future version.

delphin.tdl.parse(f, encoding='utf-8')[source]

Parse the TDL file f and yield the interpreted contents.

If f is a filename, the file is opened and closed when the generator has finished, otherwise f is an open file object and will not be closed when the generator has finished.

Parameters:
  • f (str, file) – a filename or open file object
  • encoding (str) – the encoding of the file (default: “utf-8”; ignored if f is an open file)
delphin.tdl.lex(stream)[source]
delphin.tdl.tokenize(s)[source]

Tokenize a string s of TDL code.

class delphin.tdl.TdlDefinition(supertypes=None, featvals=None)[source]

Bases: delphin.tfs.FeatureStructure

A typed feature structure with supertypes.

A TdlDefinition is like a FeatureStructure but each structure may have a list of supertypes.

local_constraints()[source]

Return the constraints defined in the local AVM.

class delphin.tdl.TdlConsList(supertypes=None, featvals=None)[source]

Bases: delphin.tdl.TdlDefinition

A TdlDefinition for cons-lists (< ... >)

Navigating the feature structure for lists can be cumbersome, so this subclass of TdlDefinition provides the values() method to collect the items nested inside the list and return them as a Python list.

values()[source]

Return the list of values.

class delphin.tdl.TdlDiffList(supertypes=None, featvals=None)[source]

Bases: delphin.tdl.TdlDefinition

A TdlDefinition for diff-lists (<! ... !>)

Navigating the feature structure for lists can be cumbersome, so this subclass of TdlDefinition provides the values() method to collect the items nested inside the list and return them as a Python list.

values()[source]

Return the list of values.

class delphin.tdl.TdlType(identifier, definition, coreferences=None, docstring=None)[source]

Bases: delphin.tdl.TdlDefinition

A top-level TdlDefinition with an identifier.

Parameters:
  • identifier (str) – type name
  • definition (TdlDefinition) – definition of the type
  • coreferences (list) – (tag, paths) tuple of coreferences, where paths is a list of feature paths that share the tag
  • docstring (list) – list of documentation strings
class delphin.tdl.TdlInflRule(identifier, affix=None, **kwargs)[source]

Bases: delphin.tdl.TdlType

TDL inflectional rule.

Parameters:
  • identifier (str) – type name
  • affix (str) – inflectional affixes