pnpl.datasets.libribrain2025.compat.LibriBrainWord

pnpl.datasets.libribrain2025.compat.LibriBrainWord#

class pnpl.datasets.libribrain2025.compat.LibriBrainWord(data_path, partition=None, preprocessing_str='bads+headpos+sss+notch+bp+ds', tmin=None, tmax=None, include_run_keys=None, exclude_run_keys=None, exclude_tasks=None, standardize=True, clipping_boundary=10.0, channel_means=None, channel_stds=None, include_info=False, preload_files=True, download=True, preload_h5=False, min_word_length=1, max_word_length=None, keyword_detection=None, negative_buffer=0.0, positive_buffer=0.0)[source]#

Word classification dataset wrapper.

Multi-class word classification or binary keyword detection.

Parameters:

data_path (str) – Path to store/load the dataset
partition (str | None) – train/validation/test split
preprocessing_str (str) – Preprocessing string for filenames
tmin (float | None) – Start time relative to word onset
tmax (float | None) – End time relative to word onset
include_run_keys (list) – Specific runs to include
exclude_run_keys (list) – Specific runs to exclude
exclude_tasks (list) – Task names to exclude
standardize (bool) – Whether to z-score normalize
clipping_boundary (float | None) – Clip values to [-boundary, boundary]
channel_means (ndarray | None) – Pre-computed channel means
channel_stds (ndarray | None) – Pre-computed channel stds
include_info (bool) – Include metadata in samples
preload_files (bool) – Eagerly download files
download (bool) – Enable HuggingFace downloads
min_word_length (int) – Minimum word length to include
max_word_length (int | None) – Maximum word length to include
keyword_detection (str | None) – Keyword(s) for binary detection
negative_buffer (float) – Extra time before word onset
positive_buffer (float) – Extra time after word end
preload_h5 (bool)

__init__(data_path, partition=None, preprocessing_str='bads+headpos+sss+notch+bp+ds', tmin=None, tmax=None, include_run_keys=None, exclude_run_keys=None, exclude_tasks=None, standardize=True, clipping_boundary=10.0, channel_means=None, channel_stds=None, include_info=False, preload_files=True, download=True, preload_h5=False, min_word_length=1, max_word_length=None, keyword_detection=None, negative_buffer=0.0, positive_buffer=0.0)[source]#

Parameters:

data_path (str)
partition (str | None)
preprocessing_str (str)
tmin (float | None)
tmax (float | None)
include_run_keys (list)
exclude_run_keys (list)
exclude_tasks (list)
standardize (bool)
clipping_boundary (float | None)
channel_means (ndarray | None)
channel_stds (ndarray | None)
include_info (bool)
preload_files (bool)
download (bool)
preload_h5 (bool)
min_word_length (int)
max_word_length (int | None)
keyword_detection (str | None)
negative_buffer (float)
positive_buffer (float)

Methods

`__init__`(data_path[, partition, ...])
`calculate_standardization_params`(h5_data_loader)	Calculate channel means and stds across all runs.
`clip_sample`(sample, boundary)	Clip sample values to [-boundary, boundary].
`close_h5_files`()	Close all open H5 file handles and drop preloaded arrays.
`ensure_file`(fpath)	Ensure a file exists locally, downloading if needed.
`ensure_file_download`(fpath, data_path[, repo_id])	Class method to download a file without requiring dataset instantiation.
`get_bids_raw_path`(subject, session, task, run)	Construct path to raw BIDS MEG file.
`get_calibration_files`()	Get paths to Maxwell filter calibration files.
`get_derivatives_path`(subject, session[, ...])	Construct path to derivatives directory.
`get_events_path`(subject, session, task, run)	Construct path to events TSV file.
`get_h5_dataset`(run_key)	Get (cached) H5 dataset for a run.
`get_h5_path`(subject, session, task, run[, ...])	Construct path to H5 file.
`get_headpos_path`(subject, session, task, run)	Construct path to cached head position file.
`get_preprocessed_path`(subject, session, ...)	Construct path to preprocessed file in derivatives.
`get_sfreq_from_h5`(h5_path)	Get sampling frequency from H5 file.
`init_continuous_h5`([preload_h5])	Initialize the H5 data cache.
`load_continuous_window`(subject, session, ...)	Load a time window from continuous H5 data.
`load_continuous_window_from_sample`(sample)	Load time window from a sample tuple.
`load_head_positions`(subject, session, task, run)	Load cached head positions from CSV file.
`load_preprocessed_bids`(subject, session, ...)	Load a preprocessed FIF file from the derivatives directory.
`load_raw_bids`(subject, session, task, run[, ...])	Load raw MEG data from BIDS structure.
`prefetch_files`(file_paths)	Prefetch multiple files in parallel.
`raw_bids_exists`(subject, session, task, run)	Check if raw BIDS data exists for given identifiers.
`setup_standardization`([standardize, ...])	Set up standardization parameters.
`standardize`(data)	Apply z-score normalization and optional clipping to data.

Attributes

`HUGGINGFACE_FALLBACK_REPOS`
`HUGGINGFACE_REPO`
`broadcasted_means`
`broadcasted_stds`
`channel_means`
`channel_stds`
`label_info`	Get label information from the task.
`n_channels`	Number of MEG channels (306 for Elekta/MEGIN).
`n_times`	Number of time points per sample.

pnpl.datasets.libribrain2025.compat.LibriBrainWord

Contents

pnpl.datasets.libribrain2025.compat.LibriBrainWord#