pnpl.datasets.libribrain2025.phoneme_dataset.LibriBrainPhoneme#
- class pnpl.datasets.libribrain2025.phoneme_dataset.LibriBrainPhoneme(data_path, partition=None, label_type='phoneme', preprocessing_str='bads+headpos+sss+notch+bp+ds', tmin=0.0, tmax=0.5, include_run_keys=[], exclude_run_keys=[], exclude_tasks=[], standardize=True, clipping_boundary=10, channel_means=None, channel_stds=None, include_info=False, preload_files=False)[source]#
- Parameters:
data_path (str)
partition (str | None)
label_type (str)
preprocessing_str (str | None)
tmin (float)
tmax (float)
include_run_keys (list[str])
exclude_run_keys (list[str])
exclude_tasks (list[str])
standardize (bool)
clipping_boundary (float | None)
channel_means (ndarray | None)
channel_stds (ndarray | None)
include_info (bool)
preload_files (bool)
- __init__(data_path, partition=None, label_type='phoneme', preprocessing_str='bads+headpos+sss+notch+bp+ds', tmin=0.0, tmax=0.5, include_run_keys=[], exclude_run_keys=[], exclude_tasks=[], standardize=True, clipping_boundary=10, channel_means=None, channel_stds=None, include_info=False, preload_files=False)[source]#
data_path: path to serialized dataset. label_type: “phoneme” or “voicing”. Voicing labels are derived from phoneme labels and indicate voiced and unvoiced phonemes. See https://en.wikipedia.org/wiki/Voice_(phonetics) for more information. preprocessing_str: Preprocessing string in the file name. Indicates Preprocessing steps applied to the data. tmin: start time of the sample in seconds in reference to the onset of the phoneme. tmax: end time of the sample in seconds in reference to the onset of the phoneme. standardize: Whether to standardize the data. Uses channel_means and channel_stds if provided. Otherwise it calculates mean and std for each channel of the dataset. clipping_boundary: Min and max values to clip the data by. channel_means: Standardize using these channel means. channel_stds: Standardize using these channel stds. include_info: Whether to include info dict in the output. Info dict contains dataset name, subject, session, task, run, onset time of the sample, and full phoneme label that indicates if a phoneme is at the onset or offset of a word. preload_files: If true start parallel downloads of all sessions and runs into data_path. Otherwise it will download files as they are needed.
returns Channels x Time
- Parameters:
data_path (str)
partition (str | None)
label_type (str)
preprocessing_str (str | None)
tmin (float)
tmax (float)
include_run_keys (list[str])
exclude_run_keys (list[str])
exclude_tasks (list[str])
standardize (bool)
clipping_boundary (float | None)
channel_means (ndarray | None)
channel_stds (ndarray | None)
include_info (bool)
preload_files (bool)
Methods
__init__(data_path[, partition, label_type, ...])data_path: path to serialized dataset.
ensure_file_download(fpath, data_path[, repo_id])Class method to download a file without requiring dataset instantiation.
load_phonemes_from_tsv(subject, session, ...)prefetch_files(file_paths)Prefetch multiple files in parallel.