Functions for Data Import
All of the below functions can be found in the data_import
module of the package.
The below functions can be used to import delimited data files into Numpy or
Matlab database format.
-
import_del
(in_file, force=False, deli='\t', dec_mark='.', out_ext='npz', out_dir='', pad=0, colheadlines=1)[source]
Import a delimited data file into Numpy or Matlab database format. The file
must have at least two data columns that are separated by deli
.
Parameters: |
- in_file (str) – The file handle of the delimited file that is to be imported.
- force (bool, optional) – If
True , existing output files will be overwritten during
import. Default is False .
- deli (str, optional) – The delimiter used to separate data columns in the delimited file.
Default is tab.
- dec_mark (str, optional) – The decimal mark of the data file. Default is dot.
- out_ext (str, optional) – The file extension (format) of the output file. Default is
npz
for Numpy database format. Alternative is mat for Matlab
database format.
- out_dir (str, optional) – The absolute or relative path to the output directory. Default is the
current working directory.
- pad (positive int) – The numbers of data columns to skip. For
pad = n , the first
n data columns will not be imported.
- colheadlines (int, optional) – The number of lines spanned by the column headers. If several lines are
spanned, the lines will be merged to generate the column keys in the
output dictionary.
|
Returns: |
- out_file (str) – A handle to the output file that was generated during import.
- import_status (str) – The import status of
in_file . If True , the file was
successfully imported. If False , file import was attempted and
failed. If None , file import was not attempted (most likely
because an output file with the same name already exists).
- out_dict (dict) – The data that was imported from
in_file .
|
-
import_dir
(in_dir, in_ext='txt', recursive=False, force=False, deli='\t', dec_mark='.', out_ext='npz', out_dir='', print_stat=False, pcs=False, colheadlines=1)[source]
Import all delimited data files in a directory into Numpy or Matlab
database format. Optionally, all data files in a directory and all its
child directories can be imported. The method can be applied to regular
delimited files as well as files generated by test rigs made by PCS
Instruments. All files must have at least two data columns that are
separated by deli
.
Parameters: |
- in_dir (str) – Path to directory for which to import all files with extension
in_ext . If recursive=True , imports are performed for all
files with extension in_ext in the directory tree with parent
in_dir .
- in_ext (str, optional) – File extension of files to import (without dot). Default is
txt .
- recursive (bool, optional) – If
True , all files in in_dir and all its child
directories are imported. Default is False .
- force (bool, optional) – If
True , existing output files will be overwritten during
import. Default is False .
- deli (str, optional) – The delimiter used to separate data columns in the delimited file.
Default is tab.
- dec_mark (str, optional) – The decimal mark of the data file. Default is dot.
- out_ext (str, optional) – The file extension (format) of the output file. Default is
npz
for Numpy database format. Alternative is mat for Matlab
database format.
- out_dir (str, optional) – The path to the output directory where output databases are stored after
import. By default, files are stored in
in_dir if
recursive=False . If recursive=True , files are stored in
the respective child directories of in_dir if out_dir
is not specified.
- print_stat (bool, optional) – If
True , the current import status is printed to the console.
Default is False .
- pcs (bool, optional) – If
True , the delimited files are treated like files that were
generated using an MTM or EHD2 test rig manufactured by PCS Instruments.
- colheadlines (int, optional) – The number of lines spanned by the column headers. If several lines are
spanned, the lines will be merged to generate the column keys in the
output dictionary.
|
Returns: |
- in_files (ls of strings) – The file handles of all files for which import was attempted.
- out_files (ls of strings) – The file handles of all output files that were generated during the
import process.
- import_status (ls of bools) – The import status of each file in
in_files . If True ,
the file was successfully imported. If False , file import was
attempted and failed. If None , file import was not attempted
(most likely because an output file with the same name already exists).
|
-
import_pcs
(in_file, force=False, out_ext='npz', out_dir='')[source]
Import a delimited data file that was produced by an MTM, ETM or EHD2 test
rig manufactured by PCS Instruments. The method calls the import_del
method to perform a basic import of a delimited text file, and generates
additional output variables that simplify data analysis.
Parameters: |
- in_file (str) – The file handle of the delimited file that is to be imported.
- force (bool, optional) – If
True , existing output files will be overwritten during
import. Default is False .
- out_ext (str, optional) – The file extension (format) of the output file. Default is
npz
for Numpy database format. Alternative is mat for Matlab
database format.
- out_dir (str, optional) – The absolute or relative path to the output directory. Default is the
current working directory.
|
Returns: |
- out_file (str) – A handle to the output file that was generated during import.
- import_status (str) – The import status of
in_file . If True , the file was
successfully imported. If False , file import was attempted and
failed. If None , file import was not attempted (most likely
because an output file with the same name already exists).
- out_dict (dict) – The data that was imported from
in_file .
|
-
merge_del
(in_files, out_file=None)[source]
Merge several delimited data files into a single file. The merged
file contains all data from the data files, in the order given in the
in_files
argument.
No checks are performed to ensure that the data files
have a compatible format, for example the same number of data columns.
Parameters: |
- in_files (list) – File paths to the files to be merged. Files will be merged in order.
- out_file (str, optional) – Path to output file, including file extension. If no path is provided,
a file name is generated based on the input file names.
|
Returns: | out_file_abs – Absolute path to the merged file.
|
Return type: | str
|
-
merge_npz
(in_files, accum=None, safe=True)[source]
Merge npz databases by concatenating all databases in in_files
.
Databases are concatenated in the order given in in_files
.
Database keys for which values are to be accumulated can be given as a list
using the accum
argument. For examples, if all databases have the
key time
, then accum=['time']
will produce a continuous
time axis, adding the last time value of the first database to all time
values of the second database (and so on).
Parameters: |
- in_files (list) – Paths to database files to merge. Files are merged in order.
- accum (list) – Database keys for which values should be accumulated. Values must be
numeric.
- safe (bool) – If True, checks will be performed to ensure that all databases share the
exact same set of keys and that all keys in
accum are in all
databases. An exception (type KeyError) will be raised if not.
|
Returns: | merged – Merged data.
|
Return type: | dict
|
-
split_del
(file, deli='\t', ext='txt', cmin=3, hspan=1, outdir=None, force=False)[source]
Split a delimited data file into several separate data files, if the file
contains more than one block of data. Blocks of data are typically
separated by at least one line of column headers. The first data column
of each data block has to be numeric.
This function is meant to be used on data files where different blocks of
data have different numbers of columns or different column headers. After
splitting the data file into individual data files, import methods like
import_del
can be used on the individual files. If all data should
be merged into a single database afterwards, the merge_npz
function
can be used.
Parameters: |
- file (str) – Path to the data file.
- deli (str, optional) – Delimiter used to separate data columns in
file
- ext (str, optional) – File extension of output files. Default is
txt
- cmin (int, optional) – Minimum number of columns that a line of data needs to have in order to
be classified as data.
- hspan (int, optional) – Maximum number of non-data lines above each data block that should be
written to individual data files (usually equal to number of lines
spanned by the column headers).
- outdir (str, optional) – Path to output directory. Default is current working directory.
- force (bool) – If True, existing output files will be overwritten. Will raise an
exception if file exists and force is False.
|
Returns: | outfiles – Paths to output files.
|
Return type: | list
|