pnpl.datasets.libribrain2025.speech_dataset.LibriBrainSpeech

pnpl.datasets.libribrain2025.speech_dataset.LibriBrainSpeech#

class pnpl.datasets.libribrain2025.speech_dataset.LibriBrainSpeech(data_path, partition=None, preprocessing_str='bads+headpos+sss+notch+bp+ds', tmin=0.0, tmax=0.5, include_run_keys=None, exclude_run_keys=None, exclude_tasks=None, standardize=True, clipping_boundary=10.0, channel_means=None, channel_stds=None, include_info=False, oversample_silence_jitter=0, preload_files=True, stride=None, download=True, preload_h5=False)[source]#

Speech detection dataset wrapper.

Binary classification of speech vs silence segments.

Parameters:
  • data_path (str) – Path to store/load the dataset

  • partition (str | None) – train/validation/test split

  • preprocessing_str (str) – Preprocessing string for filenames

  • tmin (float) – Start time relative to window position

  • tmax (float) – End time relative to window position

  • include_run_keys (list) – Specific runs to include

  • exclude_run_keys (list) – Specific runs to exclude

  • exclude_tasks (list) – Task names to exclude

  • standardize (bool) – Whether to z-score normalize

  • clipping_boundary (float | None) – Clip values to [-boundary, boundary]

  • channel_means (ndarray | None) – Pre-computed channel means

  • channel_stds (ndarray | None) – Pre-computed channel stds

  • include_info (bool) – Include metadata in samples

  • oversample_silence_jitter (int) – Stride for oversampling silence

  • preload_files (bool) – Eagerly download files

  • stride (int | None) – Custom stride for sliding window

  • download (bool) – Enable HuggingFace downloads

  • preload_h5 (bool)

__init__(data_path, partition=None, preprocessing_str='bads+headpos+sss+notch+bp+ds', tmin=0.0, tmax=0.5, include_run_keys=None, exclude_run_keys=None, exclude_tasks=None, standardize=True, clipping_boundary=10.0, channel_means=None, channel_stds=None, include_info=False, oversample_silence_jitter=0, preload_files=True, stride=None, download=True, preload_h5=False)[source]#
Parameters:
  • data_path (str)

  • partition (str | None)

  • preprocessing_str (str)

  • tmin (float)

  • tmax (float)

  • include_run_keys (list)

  • exclude_run_keys (list)

  • exclude_tasks (list)

  • standardize (bool)

  • clipping_boundary (float | None)

  • channel_means (ndarray | None)

  • channel_stds (ndarray | None)

  • include_info (bool)

  • oversample_silence_jitter (int)

  • preload_files (bool)

  • stride (int | None)

  • download (bool)

  • preload_h5 (bool)

Methods

__init__(data_path[, partition, ...])

ensure_file_download(fpath, data_path[, repo_id])

Class method to download a file without requiring dataset instantiation.

prefetch_files(file_paths)

Prefetch multiple files in parallel.