delphin.itsdb¶
See also
See Working with [incr tsdb()] Test Suites for a more user-friendly introduction
[incr tsdb()] Test Suites
Note
This module implements high-level structures and operations on
top of TSDB test suites. For the basic, low-level functionality,
see delphin.tsdb
. For complex queries of the databases,
see delphin.tsql
.
[incr tsdb()] is a tool built on top of TSDB databases for the purpose of profiling and comparing grammar versions using test suites. This module is named after that tool as it also builds higher-level operations on top of TSDB test suites but it has a much narrower scope. The aim of this module is to assist users with creating, processing, or manipulating test suites.
The typical test suite contains these files:
testsuite/
analysis fold item-set parse relations run tree
decision item output phenomenon result score update
edge item-phenomenon parameter preference rule set
Test Suite Classes¶
PyDelphin has three classes for working with [incr tsdb()] test suite databases:
-
class
delphin.itsdb.
TestSuite
(path=None, schema=None, encoding='utf-8')[source]¶ Bases:
delphin.tsdb.Database
A [incr tsdb()] test suite database.
- Parameters
-
commit
()[source]¶ Commit the current changes to disk.
This method writes the current state of the test suite to disk. The effect is similar to using
tsdb.write_database()
, except that it also updates the test suite’s internal bookkeeping so that it is aware that the current transaction is complete. It also may be more efficient if the only changes are adding new rows to existing tables.
-
property
in_transaction
¶ Return
True
is there are uncommitted changes.
-
property
path
¶ The database directory’s path.
-
process
(cpu, selector=None, source=None, fieldmapper=None, gzip=False, buffer_size=1000, callback=None)[source]¶ Process each item in a [incr tsdb()] test suite.
The output rows will be flushed to disk when the number of new rows in a table is buffer_size.
The callback parameter can be used, for example, to update a progress indicator.
- Parameters
selector – a pair of (table_name, column_name) that specify the table and column used for processor input (e.g.,
(‘item’, ‘i-input’)
)source (
Database
) – test suite from which inputs are taken; ifNone
, use the current test suitefieldmapper (
FieldMapper
) – object for mapping response fields to [incr tsdb()] fields; ifNone
, use a default mapper for the standard schemagzip – if
True
, compress non-empty tables with gzipbuffer_size (int) – number of output rows to hold in memory before flushing to disk; ignored if the test suite is all in-memory; if
None
, do not flush to diskcallback – a function that is called with the response for each item processed; the return value is ignored
Examples
>>> ts.process(ace_parser) >>> ts.process(ace_generator, 'result:mrs', source=ts2)
-
select_from
(name, columns=None, cast=True)[source]¶ Select fields given by names from each row in table name.
If no field names are given, all fields are returned.
If cast is
False
, simple tuples of raw data are returned instead ofRow
objects.- Yields
Row
Examples
>>> next(ts.select_from('item')) Row(10, 'unknown', 'formal', 'none', 1, 'S', 'It rained.', ...) >>> next(ts.select_from('item', ('i-id'))) Row(10) >>> next(ts.select_from('item', ('i-id', 'i-input'))) Row(10, 'It rained.') >>> next(ts.select_from('item', ('i-id', 'i-input'), cast=False)) ('10', 'It rained.')
-
class
delphin.itsdb.
Table
(dir, name, fields, encoding='utf-8')[source]¶ Bases:
delphin.tsdb.Relation
A [incr tsdb()] table.
- Parameters
dir – path to the database directory
name – name of the table
fields – the table schema; an iterable of
tsdb.Field
objectsencoding – character encoding of the table file
-
dir
¶ The path to the database directory.
-
name
¶ The name of the table.
-
fields
¶ The table’s schema.
-
encoding
¶ The character encoding of table files.
-
append
(row)[source]¶ Add row to the end of the table.
- Parameters
row – a
Row
or other iterable containing column values
-
extend
(rows)[source]¶ Add each row in rows to the end of the table.
- Parameters
row – an iterable of
Row
or other iterables containing column values
-
select
(*names, cast=True)[source]¶ Select fields given by names from each row in the table.
If no field names are given, all fields are returned.
If cast is
False
, simple tuples of raw data are returned instead ofRow
objects.- Yields
Row
Examples
>>> next(table.select()) Row(10, 'unknown', 'formal', 'none', 1, 'S', 'It rained.', ...) >>> next(table.select('i-id')) Row(10) >>> next(table.select('i-id', 'i-input')) Row(10, 'It rained.') >>> next(table.select('i-id', 'i-input', cast=False)) ('10', 'It rained.')
-
class
delphin.itsdb.
Row
(fields, data, field_index=None)[source]¶ A row in a [incr tsdb()] table.
The third argument, field_index, is optional. Its purpose is to reduce memory usage because the same field index can be shared by all rows for a table, but using an incompatible index can yield unexpected results for value retrieval by field names (
row[field_name]
).- Parameters
fields – column descriptions; an iterable of
tsdb.Field
objectsdata – raw column values
field_index – mapping of field name to its index in fields; if not given, it will be computed from fields
-
fields
¶ The fields of the row.
-
data
¶ The raw column values.
Processing Test Suites¶
The TestSuite.process()
method takes an optional
FieldMapper
object which manages the mapping of data in
Response
objects from a
Processor
to the tables and columns of a
test suite. In most cases the user will not need to customize or
instantiate these objects as the default works with standard [incr
tsdb()] schemas, but FieldMapper
can be subclassed in order
to handle non-standard schemas, e.g., for machine translation
workflows.
-
class
delphin.itsdb.
FieldMapper
(source=None)[source]¶ A class for mapping between response objects and test suites.
If source is given, it is the test suite providing the inputs used to create the responses, and it is used to provide some contextual information that may not be present in the response.
This class provides two methods for mapping responses to fields:
map()
– takes a response and returns a list of (table, data) tuples for the data in the response, as well as aggregating any necessary informationcleanup()
– returns any (table, data) tuples resulting from aggregated data over all runs, then clears this data
And one method for mapping test suites to responses:
In addition, the
affected_tables
attribute should list the names of tables that become invalidated by using this FieldMapper to process a profile. Generally this is the list of tables thatmap()
andcleanup()
create rows for, but it may also include those that rely on the previous set (e.g., treebanking preferences, etc.).Alternative [incr tsdb()] schemas can be handled by overriding these three methods and the __init__() method. Note that overriding
collect()
is only necessary for mapping back from test suites to responses.-
affected_tables
¶ list of tables that are affected by the processing
-
collect
(ts)[source]¶ Map from test suites to response objects.
The data in the test suite must be ordered.
Note
This method stores the ‘item’, ‘parse’, and ‘result’ tables in memory during operation, so it is not recommended when a test suite is very large as it may exhaust the system’s available memory.
Utility Functions¶
-
delphin.itsdb.
match_rows
(rows1, rows2, key, sort_keys=True)[source]¶ Yield triples of
(value, left_rows, right_rows)
whereleft_rows
andright_rows
are lists of rows that share the same column value for key. This means that both rows1 and rows2 must have a column with the same name key.Warning
Both rows1 and rows2 will exist in memory for this operation, so it is not recommended for very large tables on low-memory systems.
- Parameters
- Yields
tuple –
- a triple containing the matched value for key, the
list of any matching rows from rows1, and the list of any matching rows from rows2
Exceptions¶
-
exception
delphin.itsdb.
ITSDBError
(*args, **kwargs)[source]¶ Bases:
delphin.tsdb.TSDBError
Raised when there is an error processing a [incr tsdb()] profile.