Available Extractors¶
These pages detail all of the extractors currently available in Scythe.
Quick Summary¶
The extractors that are configured to work with the stevedore plugin are:
ase – Parse information from atomistic simulation input files using ASE.
crystal – Extract information about a crystal structure from many types of files.
csv – Describe the contents of a comma-separated value (CSV) file
dft – Extract metadata from Density Functional Theory calculation results
em – Extract metadata specific to electron microscopy.
filename – Extracts metadata in a filename, according to user-supplied patterns.
generic – Gather basic file information
image – Retrieves basic information about an image
json – Extracts fields in JSON into a user-defined new schema.
noop – Determine whether files exist, used for debugging
tdb – Extract metadata from a Thermodynamic Database (TBD) file.
xml – Extracts fields in XML into a user-defined new schema in JSON.
yaml – Extracts fields in YAML into a user-defined new schema in JSON.
Detailed Listing¶
Generic File Extractors¶
Extractors that work for any kind of file
Image Extractors¶
Extractors that read image data
Electron Microscopy Extractors¶
Extractors that read electron microscopy data of various sorts (images, spectra, spectrum images, etc.) using the HyperSpy package.
- class scythe.electron_microscopy.ElectronMicroscopyExtractor[source]¶
Extract metadata specific to electron microscopy.
This parser handles any file supported by HyperSpy’s I/O capabilities. Extract both the metadata interpreted by HyperSpy directly, but also any important values we can pick out manually.
For each value (if it is known), return a subdict with two keys:
value
, containing the actual value of the metadata parameter, andunit
, a string containing a unit name from the QUDT vocabulary. Including aunit
is optional, but highly recommended, if it is known.The allowed metadata values are controlled by the JSONSchema specification in the
schemas/electron_microscopy.json
file.
Atomistic Data Extractors¶
Extractors related to data files that encode atom-level structure
- class scythe.crystal_structure.CrystalStructureExtractor[source]¶
Extract information about a crystal structure from many types of files.
Uses either ASE or Pymatgen on the back end
- class scythe.ase.ASEExtractor[source]
Parse information from atomistic simulation input files using ASE.
ASE can read many file types. These can be found at https://wiki.fysik.dtu.dk/ase/ase/io/io.html
Metadata are generated as ASE JSON DB format: https://wiki.fysik.dtu.dk/ase/ase/db/db.html
Calculation Extractors¶
Extractors that retrieve results from calculations
- class scythe.dft.DFTExtractor(quality_report=False)[source]¶
Extract metadata from Density Functional Theory calculation results
Uses the dfttopif parser to extract metadata from each file
Initialize the extractor
- Parameters:
quality_report (bool) – Whether to generate a quality report
- class scythe.ase.ASEExtractor[source]
Parse information from atomistic simulation input files using ASE.
ASE can read many file types. These can be found at https://wiki.fysik.dtu.dk/ase/ase/io/io.html
Metadata are generated as ASE JSON DB format: https://wiki.fysik.dtu.dk/ase/ase/db/db.html
Structured Data Files¶
Extractors that read data from structured files
- class scythe.csv.CSVExtractor(return_records=True, **kwargs)[source]¶
Describe the contents of a comma-separated value (CSV) file
- The context dictionary for the CSV parser includes several fields:
schema
: Dictionary defining the schema for this dataset, following that of FrictionlessIOna_values
: Any values that should be interpreted as missing
- Parameters:
return_records (bool) – Whether to return each row in the CSV file
- Keyword:
All kwargs as passed to TableSchema’s infer method