delphin.interfaces

Interface modules for external data providers.

PyDelphin interfaces manage the communication between PyDelphin and external DELPH-IN data providers. A data provider could be a local process, such as the ACE parser/generator, or a remote service, such as the DELPH-IN RESTful web server. The interfaces send requests to the providers, then receive and interpret the response. The interfaces may also detect and deserialize supported DELPH-IN formats.

Common Classes

class delphin.interfaces.base.FieldMapper[source]

A class for mapping responses to [incr tsdb()] fields.

This class provides two methods for mapping responses to fields:

  • map() - takes a response and returns a list of (table, data)
    tuples for the data in the response, as well as aggregating any necessary information
  • cleanup() - returns any (table, data) tuples resulting from
    aggregated data over all runs, then clears this data

In addition, the affected_tables attribute should list the names of tables that become invalidated by using this FieldMapper to process a profile. Generally this is the list of tables that map() and cleanup() create records for, but it may also include those that rely on the previous set (e.g., treebanking preferences, etc.).

Alternative [incr tsdb()] schema can be handled by overriding these two methods and the __init__() method.

affected_tables

list of tables that are affected by the processing

cleanup()[source]

Return aggregated (table, rowdata) tuples and clear the state.

map(response)[source]

Process response and return a list of (table, rowdata) tuples.

class delphin.interfaces.base.ParseResponse[source]

A wrapper around the response dictionary for more convenient access to results.

result(i)[source]

Return a ParseResult object for the ith result.

results()[source]

Return ParseResult objects for each result.

tokens(tokenset='internal')[source]

Deserialize and return a YyTokenLattice object for the initial or internal token set, if provided, from the YY format or the JSON-formatted data; otherwise return the original string.

Parameters:tokenset (str) – return ‘initial’ or ‘internal’ tokens (default: ‘internal’)
Returns:YyTokenLattice
class delphin.interfaces.base.ParseResult[source]

A wrapper around a result dictionary to automate deserialization for supported formats. A ParseResult is still a dictionary, so the raw data can be obtained using dict access.

derivation()[source]

Deserialize and return a Derivation object for UDF- or JSON-formatted derivation data; otherwise return the original string.

dmrs()[source]

Deserialize and return a Dmrs object for JSON-formatted DMRS data; otherwise return the original string.

eds()[source]

Deserialize and return an Eds object for native- or JSON-formatted EDS data; otherwise return the original string.

mrs()[source]

Deserialize and return an Mrs object for simplemrs or JSON-formatted MRS data; otherwise return the original string.

tree()[source]

Deserialize and return a labeled syntax tree. The tree data may be a standalone datum, or embedded in the derivation.

class delphin.interfaces.base.Processor[source]

Base class for processors.

This class defines the basic interface for all PyDelphin processors, such as AceProcess and DelphinRestClient. It can also be used to define preprocessor wrappers of other processors such that it has the same interface, allowing it to be used, e.g., with TestSuite.process().

task

name of the task the processor performs (e.g. “parse”, “transfer”, or “generate”)

process_item(datum, keys=None)[source]

Send datum to the processor and return the result.

This method is a generic wrapper around a processor-specific processing method that keeps track of additional item and processor information. Specifically, if keys is provided, it is copied into the keys key of the response object, and if the processor object’s task member is non-None, it is copied into the task key of the response. These help with keeping track of items when many are processed at once, and to help downstream functions identify what the process did.

Parameters:
  • datum – the item content to process
  • keys – a mapping of item identifiers which will be copied into the response

Wrapping a Processor for Preprocessing

The Processor class can be used to implement a preprocessor that maintains the same interface as the underlying processor. The following example wraps an AceParser instance of the English Resource Grammar with a REPP instance.

>>> from delphin.interfaces import ace, base
>>> from delphin import repp
>>>
>>> class REPPWrapper(base.Processor):
...     def __init__(self, cpu, rpp):
...         self.cpu = cpu
...         self.task = cpu.task
...         self.rpp = rpp
...     def process_item(self, datum, keys=None):
...         preprocessed_datum = str(self.rpp.tokenize(datum))
...         return self.cpu.process_item(preprocessed_datum, keys=keys)
...
>>> # The preprocessor can be used like a normal Processor:
>>> rpp = repp.REPP.from_config('../../grammars/erg/pet/repp.set')
>>> grm = '../../grammars/erg-1214-x86-64-0.9.27.dat'
>>> with ace.AceParser(grm, cmdargs=['-y']) as _cpu:
...     cpu = REPPWrapper(_cpu, rpp)
...     response = cpu.process_item('Abrams hired Browne.')
...     for result in response.results():
...         print(result.mrs())
...
<Mrs object (proper named hire proper named) at 140488735960480>
<Mrs object (unknown compound udef named hire parg addressee proper named) at 140488736005424>
<Mrs object (unknown proper compound udef named hire parg named) at 140488736004864>
NOTE: parsed 1 / 1 sentences, avg 1173k, time 0.00986s

A similar technique could be used to manage external processes, such as MeCab for morphological segmentation of Japanese for Jacy. It could also be used to make a postprocessor, a backoff mechanism in case an input fails to parse, etc.