pnpl.datasets.libribrain2025.compat.LibriBrainWord

pnpl.datasets.libribrain2025.compat.LibriBrainWord#

class pnpl.datasets.libribrain2025.compat.LibriBrainWord(data_path, partition=None, preprocessing_str='bads+headpos+sss+notch+bp+ds', tmin=None, tmax=None, include_run_keys=None, exclude_run_keys=None, exclude_tasks=None, standardize=True, clipping_boundary=10.0, channel_means=None, channel_stds=None, include_info=False, preload_files=True, download=True, preload_h5=False, min_word_length=1, max_word_length=None, keyword_detection=None, negative_buffer=0.0, positive_buffer=0.0)[source]#

Word classification dataset wrapper.

Multi-class word classification or binary keyword detection.

Parameters:
  • data_path (str) – Path to store/load the dataset

  • partition (str | None) – train/validation/test split

  • preprocessing_str (str) – Preprocessing string for filenames

  • tmin (float | None) – Start time relative to word onset

  • tmax (float | None) – End time relative to word onset

  • include_run_keys (list) – Specific runs to include

  • exclude_run_keys (list) – Specific runs to exclude

  • exclude_tasks (list) – Task names to exclude

  • standardize (bool) – Whether to z-score normalize

  • clipping_boundary (float | None) – Clip values to [-boundary, boundary]

  • channel_means (ndarray | None) – Pre-computed channel means

  • channel_stds (ndarray | None) – Pre-computed channel stds

  • include_info (bool) – Include metadata in samples

  • preload_files (bool) – Eagerly download files

  • download (bool) – Enable HuggingFace downloads

  • min_word_length (int) – Minimum word length to include

  • max_word_length (int | None) – Maximum word length to include

  • keyword_detection (str | None) – Keyword(s) for binary detection

  • negative_buffer (float) – Extra time before word onset

  • positive_buffer (float) – Extra time after word end

  • preload_h5 (bool)

__init__(data_path, partition=None, preprocessing_str='bads+headpos+sss+notch+bp+ds', tmin=None, tmax=None, include_run_keys=None, exclude_run_keys=None, exclude_tasks=None, standardize=True, clipping_boundary=10.0, channel_means=None, channel_stds=None, include_info=False, preload_files=True, download=True, preload_h5=False, min_word_length=1, max_word_length=None, keyword_detection=None, negative_buffer=0.0, positive_buffer=0.0)[source]#
Parameters:
  • data_path (str)

  • partition (str | None)

  • preprocessing_str (str)

  • tmin (float | None)

  • tmax (float | None)

  • include_run_keys (list)

  • exclude_run_keys (list)

  • exclude_tasks (list)

  • standardize (bool)

  • clipping_boundary (float | None)

  • channel_means (ndarray | None)

  • channel_stds (ndarray | None)

  • include_info (bool)

  • preload_files (bool)

  • download (bool)

  • preload_h5 (bool)

  • min_word_length (int)

  • max_word_length (int | None)

  • keyword_detection (str | None)

  • negative_buffer (float)

  • positive_buffer (float)

Methods

__init__(data_path[, partition, ...])

calculate_standardization_params(h5_data_loader)

Calculate channel means and stds across all runs.

clip_sample(sample, boundary)

Clip sample values to [-boundary, boundary].

close_h5_files()

Close all open H5 file handles and drop preloaded arrays.

ensure_file(fpath)

Ensure a file exists locally, downloading if needed.

ensure_file_download(fpath, data_path[, repo_id])

Class method to download a file without requiring dataset instantiation.

get_bids_raw_path(subject, session, task, run)

Construct path to raw BIDS MEG file.

get_calibration_files()

Get paths to Maxwell filter calibration files.

get_derivatives_path(subject, session[, ...])

Construct path to derivatives directory.

get_events_path(subject, session, task, run)

Construct path to events TSV file.

get_h5_dataset(run_key)

Get (cached) H5 dataset for a run.

get_h5_path(subject, session, task, run[, ...])

Construct path to H5 file.

get_headpos_path(subject, session, task, run)

Construct path to cached head position file.

get_preprocessed_path(subject, session, ...)

Construct path to preprocessed file in derivatives.

get_sfreq_from_h5(h5_path)

Get sampling frequency from H5 file.

init_continuous_h5([preload_h5])

Initialize the H5 data cache.

load_continuous_window(subject, session, ...)

Load a time window from continuous H5 data.

load_continuous_window_from_sample(sample)

Load time window from a sample tuple.

load_head_positions(subject, session, task, run)

Load cached head positions from CSV file.

load_preprocessed_bids(subject, session, ...)

Load a preprocessed FIF file from the derivatives directory.

load_raw_bids(subject, session, task, run[, ...])

Load raw MEG data from BIDS structure.

prefetch_files(file_paths)

Prefetch multiple files in parallel.

raw_bids_exists(subject, session, task, run)

Check if raw BIDS data exists for given identifiers.

setup_standardization([standardize, ...])

Set up standardization parameters.

standardize(data)

Apply z-score normalization and optional clipping to data.

Attributes

HUGGINGFACE_FALLBACK_REPOS

HUGGINGFACE_REPO

broadcasted_means

broadcasted_stds

channel_means

channel_stds

label_info

Get label information from the task.

n_channels

Number of MEG channels (306 for Elekta/MEGIN).

n_times

Number of time points per sample.