delphin.interfaces¶
Interface modules for external data providers.
PyDelphin interfaces manage the communication between PyDelphin and external DELPH-IN data providers. A data provider could be a local process, such as the ACE parser/generator, or a remote service, such as the DELPH-IN RESTful web server. The interfaces send requests to the providers, then receive and interpret the response. The interfaces may also detect and deserialize supported DELPH-IN formats.
Common Classes¶
-
class
delphin.interfaces.base.
FieldMapper
[source]¶ A class for mapping responses to [incr tsdb()] fields.
This class provides two methods for mapping responses to fields:
- map() - takes a response and returns a list of (table, data)
- tuples for the data in the response, as well as aggregating any necessary information
- cleanup() - returns any (table, data) tuples resulting from
- aggregated data over all runs, then clears this data
In addition, the
affected_tables
attribute should list the names of tables that become invalidated by using this FieldMapper to process a profile. Generally this is the list of tables thatmap()
andcleanup()
create records for, but it may also include those that rely on the previous set (e.g., treebanking preferences, etc.).Alternative [incr tsdb()] schema can be handled by overriding these two methods and the __init__() method.
-
affected_tables
¶ list of tables that are affected by the processing
-
class
delphin.interfaces.base.
ParseResponse
[source]¶ A wrapper around the response dictionary for more convenient access to results.
-
tokens
(tokenset='internal')[source]¶ Deserialize and return a YyTokenLattice object for the initial or internal token set, if provided, from the YY format or the JSON-formatted data; otherwise return the original string.
Parameters: tokenset (str) – return ‘initial’
or‘internal’
tokens (default:‘internal’
)Returns: YyTokenLattice
-
-
class
delphin.interfaces.base.
ParseResult
[source]¶ A wrapper around a result dictionary to automate deserialization for supported formats. A ParseResult is still a dictionary, so the raw data can be obtained using dict access.
-
derivation
()[source]¶ Deserialize and return a Derivation object for UDF- or JSON-formatted derivation data; otherwise return the original string.
-
dmrs
()[source]¶ Deserialize and return a Dmrs object for JSON-formatted DMRS data; otherwise return the original string.
-
eds
()[source]¶ Deserialize and return an Eds object for native- or JSON-formatted EDS data; otherwise return the original string.
-
-
class
delphin.interfaces.base.
Processor
[source]¶ Base class for processors.
This class defines the basic interface for all PyDelphin processors, such as
AceProcess
andDelphinRestClient
. It can also be used to define preprocessor wrappers of other processors such that it has the same interface, allowing it to be used, e.g., withTestSuite.process()
.-
task
¶ name of the task the processor performs (e.g.
“parse”
,“transfer”
, or“generate”
)
-
process_item
(datum, keys=None)[source]¶ Send datum to the processor and return the result.
This method is a generic wrapper around a processor-specific processing method that keeps track of additional item and processor information. Specifically, if keys is provided, it is copied into the
keys
key of the response object, and if the processor object’stask
member is non-None
, it is copied into thetask
key of the response. These help with keeping track of items when many are processed at once, and to help downstream functions identify what the process did.Parameters: - datum – the item content to process
- keys – a mapping of item identifiers which will be copied into the response
-
Wrapping a Processor for Preprocessing¶
The Processor
class can be used to
implement a preprocessor that maintains the same interface as the
underlying processor. The following example wraps an
AceParser
instance of the
English Resource Grammar with a
REPP
instance.
>>> from delphin.interfaces import ace, base
>>> from delphin import repp
>>>
>>> class REPPWrapper(base.Processor):
... def __init__(self, cpu, rpp):
... self.cpu = cpu
... self.task = cpu.task
... self.rpp = rpp
... def process_item(self, datum, keys=None):
... preprocessed_datum = str(self.rpp.tokenize(datum))
... return self.cpu.process_item(preprocessed_datum, keys=keys)
...
>>> # The preprocessor can be used like a normal Processor:
>>> rpp = repp.REPP.from_config('../../grammars/erg/pet/repp.set')
>>> grm = '../../grammars/erg-1214-x86-64-0.9.27.dat'
>>> with ace.AceParser(grm, cmdargs=['-y']) as _cpu:
... cpu = REPPWrapper(_cpu, rpp)
... response = cpu.process_item('Abrams hired Browne.')
... for result in response.results():
... print(result.mrs())
...
<Mrs object (proper named hire proper named) at 140488735960480>
<Mrs object (unknown compound udef named hire parg addressee proper named) at 140488736005424>
<Mrs object (unknown proper compound udef named hire parg named) at 140488736004864>
NOTE: parsed 1 / 1 sentences, avg 1173k, time 0.00986s
A similar technique could be used to manage external processes, such as MeCab for morphological segmentation of Japanese for Jacy. It could also be used to make a postprocessor, a backoff mechanism in case an input fails to parse, etc.