Functions for Data Import

All of the below functions can be found in the data_import module of the package.

The below functions can be used to import delimited data files into Numpy or Matlab database format.

import_del(in_file, force=False, deli='\t', dec_mark='.', out_ext='npz', out_dir='', pad=0, colheadlines=1)[source]

Import a delimited data file into Numpy or Matlab database format. The file must have at least two data columns that are separated by deli.

Parameters:
  • in_file (str) – The file handle of the delimited file that is to be imported.
  • force (bool, optional) – If True, existing output files will be overwritten during import. Default is False.
  • deli (str, optional) – The delimiter used to separate data columns in the delimited file. Default is tab.
  • dec_mark (str, optional) – The decimal mark of the data file. Default is dot.
  • out_ext (str, optional) – The file extension (format) of the output file. Default is npz for Numpy database format. Alternative is mat for Matlab database format.
  • out_dir (str, optional) – The absolute or relative path to the output directory. Default is the current working directory.
  • pad (positive int) – The numbers of data columns to skip. For pad = n, the first n data columns will not be imported.
  • colheadlines (int, optional) – The number of lines spanned by the column headers. If several lines are spanned, the lines will be merged to generate the column keys in the output dictionary.
Returns:

  • out_file (str) – A handle to the output file that was generated during import.
  • import_status (str) – The import status of in_file. If True, the file was successfully imported. If False, file import was attempted and failed. If None, file import was not attempted (most likely because an output file with the same name already exists).
  • out_dict (dict) – The data that was imported from in_file.

import_dir(in_dir, in_ext='txt', recursive=False, force=False, deli='\t', dec_mark='.', out_ext='npz', out_dir='', print_stat=False, pcs=False, colheadlines=1)[source]

Import all delimited data files in a directory into Numpy or Matlab database format. Optionally, all data files in a directory and all its child directories can be imported. The method can be applied to regular delimited files as well as files generated by test rigs made by PCS Instruments. All files must have at least two data columns that are separated by deli.

Parameters:
  • in_dir (str) – Path to directory for which to import all files with extension in_ext. If recursive=True, imports are performed for all files with extension in_ext in the directory tree with parent in_dir.
  • in_ext (str, optional) – File extension of files to import (without dot). Default is txt.
  • recursive (bool, optional) – If True, all files in in_dir and all its child directories are imported. Default is False.
  • force (bool, optional) – If True, existing output files will be overwritten during import. Default is False.
  • deli (str, optional) – The delimiter used to separate data columns in the delimited file. Default is tab.
  • dec_mark (str, optional) – The decimal mark of the data file. Default is dot.
  • out_ext (str, optional) – The file extension (format) of the output file. Default is npz for Numpy database format. Alternative is mat for Matlab database format.
  • out_dir (str, optional) – The path to the output directory where output databases are stored after import. By default, files are stored in in_dir if recursive=False. If recursive=True, files are stored in the respective child directories of in_dir if out_dir is not specified.
  • print_stat (bool, optional) – If True, the current import status is printed to the console. Default is False.
  • pcs (bool, optional) – If True, the delimited files are treated like files that were generated using an MTM or EHD2 test rig manufactured by PCS Instruments.
  • colheadlines (int, optional) – The number of lines spanned by the column headers. If several lines are spanned, the lines will be merged to generate the column keys in the output dictionary.
Returns:

  • in_files (ls of strings) – The file handles of all files for which import was attempted.
  • out_files (ls of strings) – The file handles of all output files that were generated during the import process.
  • import_status (ls of bools) – The import status of each file in in_files. If True, the file was successfully imported. If False, file import was attempted and failed. If None, file import was not attempted (most likely because an output file with the same name already exists).

import_pcs(in_file, force=False, out_ext='npz', out_dir='')[source]

Import a delimited data file that was produced by an MTM, ETM or EHD2 test rig manufactured by PCS Instruments. The method calls the import_del method to perform a basic import of a delimited text file, and generates additional output variables that simplify data analysis.

Parameters:
  • in_file (str) – The file handle of the delimited file that is to be imported.
  • force (bool, optional) – If True, existing output files will be overwritten during import. Default is False.
  • out_ext (str, optional) – The file extension (format) of the output file. Default is npz for Numpy database format. Alternative is mat for Matlab database format.
  • out_dir (str, optional) – The absolute or relative path to the output directory. Default is the current working directory.
Returns:

  • out_file (str) – A handle to the output file that was generated during import.
  • import_status (str) – The import status of in_file. If True, the file was successfully imported. If False, file import was attempted and failed. If None, file import was not attempted (most likely because an output file with the same name already exists).
  • out_dict (dict) – The data that was imported from in_file.

merge_del(in_files, out_file=None)[source]

Merge several delimited data files into a single file. The merged file contains all data from the data files, in the order given in the in_files argument.

No checks are performed to ensure that the data files have a compatible format, for example the same number of data columns.

Parameters:
  • in_files (list) – File paths to the files to be merged. Files will be merged in order.
  • out_file (str, optional) – Path to output file, including file extension. If no path is provided, a file name is generated based on the input file names.
Returns:

out_file_abs – Absolute path to the merged file.

Return type:

str

merge_npz(in_files, accum=None, safe=True)[source]

Merge npz databases by concatenating all databases in in_files. Databases are concatenated in the order given in in_files.

Database keys for which values are to be accumulated can be given as a list using the accum argument. For examples, if all databases have the key time, then accum=['time'] will produce a continuous time axis, adding the last time value of the first database to all time values of the second database (and so on).

Parameters:
  • in_files (list) – Paths to database files to merge. Files are merged in order.
  • accum (list) – Database keys for which values should be accumulated. Values must be numeric.
  • safe (bool) – If True, checks will be performed to ensure that all databases share the exact same set of keys and that all keys in accum are in all databases. An exception (type KeyError) will be raised if not.
Returns:

merged – Merged data.

Return type:

dict

split_del(file, deli='\t', ext='txt', cmin=3, hspan=1, outdir=None, force=False)[source]

Split a delimited data file into several separate data files, if the file contains more than one block of data. Blocks of data are typically separated by at least one line of column headers. The first data column of each data block has to be numeric.

This function is meant to be used on data files where different blocks of data have different numbers of columns or different column headers. After splitting the data file into individual data files, import methods like import_del can be used on the individual files. If all data should be merged into a single database afterwards, the merge_npz function can be used.

Parameters:
  • file (str) – Path to the data file.
  • deli (str, optional) – Delimiter used to separate data columns in file
  • ext (str, optional) – File extension of output files. Default is txt
  • cmin (int, optional) – Minimum number of columns that a line of data needs to have in order to be classified as data.
  • hspan (int, optional) – Maximum number of non-data lines above each data block that should be written to individual data files (usually equal to number of lines spanned by the column headers).
  • outdir (str, optional) – Path to output directory. Default is current working directory.
  • force (bool) – If True, existing output files will be overwritten. Will raise an exception if file exists and force is False.
Returns:

outfiles – Paths to output files.

Return type:

list