delphin.codecs¶
Serialization Codecs for Semantic Representations
The delphin.codecs package is a namespace package for modules used in the
serialization and deserialization of semantic representations. All
modules included in this namespace must follow the common API (based
on Python’s pickle and json modules) in order to
work correctly with PyDelphin. This document describes that API.
Included Codecs¶
MRS:
DMRS:
EDS:
Codec API¶
Module Constants¶
There is one required module constant for codecs: CODEC_INFO. Its
purpose is primarily to specify which representation (MRS, DMRS, EDS)
it serializes. A codec without CODEC_INFO will work for programmatic
usage, but it will not work with the delphin.commands.convert()
function or at the command line with the delphin convert
command, which use the representation key in CODEC_INFO to
determine when and how to convert representations.
- CODEC_INFO¶
A dictionary containing information about the codec. While codec authors may put arbitrary data here, there are two keys used by PyDelphin’s conversion features:
representationanddescription. Onlyrepresentationis required, and should be set to one ofmrs,dmrs, oreds. For example, themrsjsoncodec uses the following:CODEC_INFO = { 'representation': 'mrs', 'description': 'JSON-serialized MRS for the Web API' }
The following module constants are optional and are used to describe
strings that must appear in valid documents when serializing multiple
semantics representations at a time, as with dump() and
dumps(). It is used by delphin.commands.convert() to
provide a streaming serialization rather than dumping the entire file
at once. If the values are not defined in the codec module, default
values will be used.
- HEADER¶
The string to output before any of semantic representations are serialized. For example, in
delphin.codecs.mrx, the value ofHEADERis<mrs-list>, and indelphin.codecs.dmrstikzit is an entire LaTeX preamble followed bybegin{document}.
- JOINER¶
The string used to join multiple serialized semantic representations. For example, in
delphin.codecs.mrsjson, it is a comma (,) following JSON’s syntax. Normally it is either an empty string, a space, or a newline, depending on the conventions for the format and if theindentargument is set.
- FOOTER¶
The string to output after all semantic representations have been serialized. For example, in
delphin.codecs.mrx, it is</mrs-list>, and indelphin.codecs.dmrstikzit isend{document}.
Deserialization Functions¶
The deserialization functions load(), loads(), and
decode() accept textual serializations and return the
interpreted semantic representation. Both load() and
loads() expect full documents (including headers and footers,
such as <mrs-list> and </mrs-list> around a
mrx serialization) and return lists of semantic
structure objects. The decode() function expects single
representations (without headers and footers) and returns a single
semantic structure object.
Reading from a file or stream¶
- load(source)¶
Deserialize and return semantic representations from source.
- Parameters:
source – path-like object or file handle of a source containing serialized semantic representations
- Return type:
Reading from a string¶
Decoding from a string¶
- decode(s)¶
Deserialize and return the semantic representation from string s.
- Parameters:
s – string containing a serialized semantic representation
- Return type:
subclass of
delphin.sembase.SemanticStructure
Serialization Functions¶
The serialization functions dump(), dumps(), and
encode() take semantic representations as input as either return
a string or print to a file or stream. Both dump() and
dumps() will provide the appropriate HEADER,
JOINER, and FOOTER values to make the result a valid
document. The encode() function only serializes a single
semantic representation, which is generally useful when working with
single representations, but is also useful when headers and footers
are not desired (e.g., if you want the dmrx
representation of a DMRS without <dmrs-list> and </dmrs-list>
surrounding it).
Writing to a file or stream¶
- dump(xs, destination, properties=True, lnk=True, indent=False, encoding='utf-8')¶
Serialize semantic representations in xs to destination.
- Parameters:
xs – iterable of
SemanticStructureobjects to serializedestination –
path-like object or file object where data will be written to
properties (bool) – if
False, suppress morphosemantic propertieslnk (bool) – if
False, suppress surface alignments and stringsindent – if
Trueor an integer value, add newlines and indentation; some codecs may support an integer value forindent, which specifies how many columns to indentencoding (str) – if destination is a filename, write to the file with the given encoding; otherwise it is ignored
Writing to a string¶
Encoding to a string¶
Variations¶
All serialization codecs should use the function signatures above, but
some variations are possible. Codecs should not remove any positional
or keyword arguments from functions, but they can be ignored. If any
new positional arguments are added, they should appear after the last
positional argument in its function, before the keyword arguments. New
keyword arguments may be added in any order. Finally, a codec may
omit some functions entirely, such as for export-only codecs that do
not provide load(), loads(), or decode(). The module
constants HEADER, JOINER, and FOOTER are also
optional. Here are some examples of variations in PyDelphin:
delphin.codecs.indexedmrsrequires asemipositional argument.delphin.codecs.mrsjson,delphin.codecs.dmrsjson, anddelphin.codecs.edsjsonintroduceto_dict()andfrom_dict()functions in their public API as they may be generally useful.delphin.codecs.dmrspenmananddelphin.codecs.edspenmanintroduceto_triples()andfrom_triples()functions in their public API.delphin.codecs.edsallows ashow_statuskeyword argument to turn on graph connectedness markers on serialization.delphin.codecs.mrsprologanddelphin.codecs.dmrstikzare export-only codecs and do not provideload(),loads(), ordecode()functions.delphin.aceis an import-only codec and does not providedump(),dumps(), orencode()functions.