delphin.tdl¶
Type Description Language (TDL) is a declarative language for describing type systems, mainly for the creation of DELPH-IN HPSG grammars. TDL was originally described in Krieger and Schäfer, 1994 [KS1994], but it describes many features not in use by the DELPH-IN variant, such as disjunction. Copestake, 2002 [COP2002] better describes the subset in use by DELPH-IN, but this publication has become outdated to the current usage of TDL in DELPH-IN grammars and its TDL syntax description is inaccurate in places. It is, however, still a great resource for understanding the interpretation of TDL grammar descriptions. The TdlRfc page of the DELPH-IN Wiki contains the most up-to-date description of the TDL syntax used by DELPH-IN grammars, including features such as documentation strings and regular expressions.
Below is an example of a basic type from the English Resource Grammar (ERG):
basic_word := word_or_infl_rule & word_or_punct_rule &
[ SYNSEM [ PHON.ONSET.--TL #tl,
LKEYS.KEYREL [ CFROM #from,
CTO #to ] ],
ORTH [ CLASS #class, FROM #from, TO #to, FORM #form ],
TOKENS [ +LIST #tl & < [ +CLASS #class, +FROM #from, +FORM #form ], ... >,
+LAST.+TO #to ] ].
The delphin.tdl module makes it easy to inspect what is written on
definitions in Type Description Language (TDL), but it doesn’t
interpret type hierarchies (such as by performing unification,
subsumption calculations, or creating GLB types). That is, while it
wouldn’t be useful for creating a parser, it is useful if you want to
statically inspect the types in a grammar and the constraints they
apply.
Hans-Ulrich Krieger and Ulrich Schäfer. TDL: a type description language for constraint-based grammars. In Proceedings of the 15th conference on Computational linguistics, volume 2, pages 893–899. Association for Computational Linguistics, 1994.
Ann Copestake. Implementing typed feature structure grammars, volume 110. CSLI publications Stanford, 2002.
Module Parameters¶
Some aspects of TDL parsing can be customized per grammar, and the
following module variables may be reassigned to accommodate those
differences. For instance, in the ERG, the type used for list
feature structures is *list*, while for Matrix-based grammars
it is list. PyDelphin defaults to the values used by the ERG.
- delphin.tdl.LIST_TYPE = '*list*'¶
str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.
- delphin.tdl.EMPTY_LIST_TYPE = '*null*'¶
str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.
- delphin.tdl.LIST_HEAD = 'FIRST'¶
str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.
- delphin.tdl.LIST_TAIL = 'REST'¶
str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.
- delphin.tdl.DIFF_LIST_LIST = 'LIST'¶
str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.
- delphin.tdl.DIFF_LIST_LAST = 'LAST'¶
str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.
Functions¶
- delphin.tdl.iterparse(path: str | Path, encoding: str = 'utf-8') Generator[tuple[str, str | TypeDefinition | _MorphSet | _Environment | FileInclude, int], None, None][source]¶
Parse the TDL file at path and iteratively yield parse events.
Parse events are
(event, object, lineno)tuples, whereeventis a string ("TypeDefinition","TypeAddendum","LexicalRuleDefinition","LetterSet","WildCard","BeginEnvironment","EndEnvironment","FileInclude","LineComment", or"BlockComment"),objectis the interpreted TDL object, andlinenois the line number where the entity began in path.- Parameters:
path – path to a TDL file
encoding (str) – the encoding of the file (default:
"utf-8")
- Yields:
(event, object, lineno)tuples
Example
>>> lex = {} >>> for event, obj, lineno in tdl.iterparse("erg/lexicon.tdl"): ... if event == "TypeDefinition": ... lex[obj.identifier] = obj >>> lex["eucalyptus_n1"]["SYNSEM.LKEYS.KEYREL.PRED"] <String object (_eucalyptus_n_1_rel) at 140625748595960>
- delphin.tdl.format(obj, indent=0)[source]¶
Serialize TDL objects to strings.
- Parameters:
obj – instance of
Term,Conjunction, orTypeDefinitionclasses or subclassesindent (int) – number of spaces to indent the formatted object
- Returns:
str – serialized form of obj
Example
>>> conj = tdl.Conjunction( ... [ ... tdl.TypeIdentifier("lex-item"), ... tdl.AVM( ... [ ... ( ... "SYNSEM.LOCAL.CAT.HEAD.MOD", ... tdl.ConsList(end=tdl.EMPTY_LIST_TYPE), ... ) ... ] ... ), ... ] ... ) >>> t = tdl.TypeDefinition("non-mod-lex-item", conj) >>> print(format(t)) non-mod-lex-item := lex-item & [ SYNSEM.LOCAL.CAT.HEAD.MOD < > ].
Classes¶
The TDL entity classes are the objects returned by
iterparse(), but they may also be used directly to build TDL
structures, e.g., for serialization.
Terms¶
- class delphin.tdl.Term(docstring=None)[source]¶
Base class for the terms of a TDL conjunction.
All terms are defined to handle the binary ‘&’ operator, which puts both into a Conjunction:
>>> TypeIdentifier("a") & TypeIdentifier("b") <Conjunction object at 140008950372168>
- Parameters:
docstring (str) – documentation string
- class delphin.tdl.TypeIdentifier(string, docstring=None)[source]¶
Bases:
TypeTermType identifiers, or type names.
Unlike other
TypeTerms, TypeIdentifiers use case-insensitive comparisons:>>> TypeIdentifier("MY-TYPE") == TypeIdentifier("my-type") True
- class delphin.tdl.Regex(string, docstring=None)[source]¶
Bases:
TypeTermRegular expression patterns.
- class delphin.tdl.AVM(featvals: Sequence[tuple[str, Conjunction | Term]] | Mapping[str, Conjunction | Term] | None = None, docstring=None)[source]¶
Bases:
FeatureStructure,TermA feature structure as used in TDL.
- Parameters:
- aggregate(featvals: Sequence[tuple[str, Conjunction | Term]] | Mapping[str, Conjunction | Term]) None[source]¶
Combine features in a single AVM.
This function takes feature paths and values and merges them into the AVM, but does not do full unification. For example:
>>> avm = tdl.AVM([("FEAT", tdl.TypeIdentifier("val1"))]) >>> avm.aggregate( ... [ ... ("FEAT", tdl.TypeIdentifier("val2")), ... ("FEAT.SUB", tdl.TypeIdentifier("val3")), ... ] ... ) >>> print(tdl.format(avm)) [ FEAT val1 & val2 & [ SUB val3 ] ]
The featvals argument may be an sequence of (feature, value) pairs or a mapping of features to values.
- features(expand=False)[source]¶
Return the list of tuples of feature paths and feature values.
- Parameters:
expand (bool) – if
True, expand all feature paths
Example
>>> avm = AVM([('A.B', TypeIdentifier('1')), ... ('A.C', TypeIdentifier('2')]) >>> avm.features() [('A', <AVM object at ...>)] >>> avm.features(expand=True) [('A.B', <TypeIdentifier object (1) at ...>), ('A.C', <TypeIdentifier object (2) at ...>)]
- class delphin.tdl.ConsList(values=None, end='*list*', docstring=None)[source]¶
Bases:
AVMAVM subclass for cons-lists (
< ... >)This provides a more intuitive interface for creating and accessing the values of list structures in TDL. Some combinations of the values and end parameters correspond to various TDL forms as described in the table below:
TDL form
values
end
state
< >NoneEMPTY_LIST_TYPEclosed
< ... >NoneLIST_TYPEopen
< a >[a]EMPTY_LIST_TYPEclosed
< a, b >[a, b]EMPTY_LIST_TYPEclosed
< a, ... >[a]LIST_TYPEopen
< a . b >[a]bclosed
- Parameters:
values (list) – a sequence of
ConjunctionorTermobjects to be placed in the AVM of the list.end (str,
Conjunction,Term) – last item in the list (default:LIST_TYPE) which determines if the list is open or closeddocstring (str) – documentation string
- append(value)[source]¶
Append an item to the end of an open ConsList.
- Parameters:
value (
Conjunction,Term) – item to add- Raises:
TDLError – when appending to a closed list
- terminate(end)[source]¶
Set the value of the tail of the list.
Adding values via
append()places them on theFIRSTfeature of some level of the feature structure (e.g.,REST.FIRST), whileterminate()places them on the finalRESTfeature (e.g.,REST.REST). If end is aConjunctionorTerm, it is typically aCoreference, otherwise end is set totdl.EMPTY_LIST_TYPEortdl.LIST_TYPE. This method does not necessarily close the list; if end istdl.LIST_TYPE, the list is left open, otherwise it is closed.- Parameters:
end (str,
Conjunction,Term) – value tolist. (use as the end of the)
- class delphin.tdl.DiffList(values=None, docstring=None)[source]¶
Bases:
AVMAVM subclass for diff-lists (
<! ... !>)As with
ConsList, this provides a more intuitive interface for creating and accessing the values of list structures in TDL. UnlikeConsList, DiffLists are always closed lists with the last item coreferenced with theLASTfeature, which allows for the joining of two diff-lists.- Parameters:
values (list) – a sequence of
ConjunctionorTermobjects to be placed in the AVM of the listdocstring (str) – documentation string
- last¶
the feature path to the list position coreferenced by the value of the
DIFF_LIST_LASTfeature.- Type:
Conjunctions¶
- class delphin.tdl.Conjunction(terms=None)[source]¶
Conjunction of TDL terms.
- add(term)[source]¶
Add a term to the conjunction.
- Parameters:
term (
Term,Conjunction) – term to add; if aConjunction, all of its terms are added to the current conjunction.- Raises:
TypeError – when term is an invalid type
- get(key, default=None)[source]¶
Get the value of attribute key in any AVM in the conjunction.
- Parameters:
key – attribute path to search
default – value to return if key is not defined on any AVM
- normalize()[source]¶
Rearrange the conjunction to a conventional form.
This puts any coreference(s) first, followed by type terms, then followed by AVM(s) (including lists). AVMs are normalized via
AVM.normalize().
- property terms¶
The list of terms in the conjunction.
Type and Instance Definitions¶
- class delphin.tdl.TypeDefinition(identifier, conjunction, docstring=None)[source]¶
A top-level Conjunction with an identifier.
- Parameters:
identifier (str) – type name
conjunction (
Conjunction,Term) – type constraintsdocstring (str) – documentation string
- conjunction¶
type constraints
- Type:
- documentation(level='first')[source]¶
Return the documentation of the type.
By default, this is the first docstring on a top-level term. By setting level to
"top", the list of all docstrings on top-level terms is returned, including the type’sdocstringvalue, if notNone, as the last item. The docstring for the type itself is available viaTypeDefinition.docstring.- Parameters:
level (str) –
"first"or"top"- Returns:
a single docstring or a list of docstrings
- property supertypes¶
The list of supertypes for the type.
- class delphin.tdl.TypeAddendum(identifier, conjunction=None, docstring=None)[source]¶
Bases:
TypeDefinitionAn addendum to an existing type definition.
Type addenda, unlike
type definitions, do not require supertypes, or even any feature constraints. An addendum, however, must have at least one supertype, AVM, or docstring.- Parameters:
identifier (str) – type name
conjunction (
Conjunction,Term) – type constraintsdocstring (str) – documentation string
- conjunction¶
type constraints
- Type:
- class delphin.tdl.LexicalRuleDefinition(identifier, affix_type, patterns, conjunction, **kwargs)[source]¶
Bases:
TypeDefinitionAn inflecting lexical rule definition.
- Parameters:
- conjunction¶
type constraints
- Type:
Morphological Patterns¶
- class delphin.tdl.LetterSet(var, characters)[source]¶
A capturing character class for inflectional lexical rules.
LetterSets define a pattern (e.g.,
"!a") that may match any one of its associated characters. UnlikeWildCardpatterns, LetterSet variables also appear in the replacement pattern of an affixing rule, where they insert the character matched by the corresponding letter set.- Parameters:
- class delphin.tdl.WildCard(var, characters)[source]¶
A non-capturing character class for inflectional lexical rules.
WildCards define a pattern (e.g.,
"?a") that may match any one of its associated characters. UnlikeLetterSetpatterns, WildCard variables may not appear in the replacement pattern of an affixing rule.- Parameters:
Environments and File Inclusion¶
- class delphin.tdl.TypeEnvironment(entries=None)[source]¶
TDL type environment.
- Parameters:
entries (list) – TDL entries
- class delphin.tdl.FileInclude(value: str = '', basedir: str | Path = '')[source]¶
Include other TDL files in the current environment.
- Parameters:
value – quoted value of the TDL include statement
basedir – directory containing the file with the include statement
- value¶
The quoted value of TDL include statement.
- path¶
The path to the TDL file to include.
Exceptions and Warnings¶
- exception delphin.tdl.TDLError(*args, **kwargs)[source]¶
Bases:
PyDelphinExceptionRaised when there is an error in processing TDL.
- exception delphin.tdl.TDLSyntaxError(message=None, filename=None, lineno=None, offset=None, text=None)[source]¶
Bases:
PyDelphinSyntaxErrorRaised when parsing TDL text fails.
- exception delphin.tdl.TDLWarning(*args, **kwargs)[source]¶
Bases:
PyDelphinWarningRaised when parsing unsupported TDL features.
Comments¶
Single-line comments in TDL.
Multi-line comments in TDL.