Quickstart#
Every dataset in pnpl.datasets follows the same shape: pass a task
object from pnpl.tasks and (optionally) a preprocessing string,
then iterate samples like any PyTorch Dataset.
LibriBrain (Hugging Face, no auth)#
from pnpl.datasets import LibriBrain
from pnpl.tasks import SpeechDetection
ds = LibriBrain(
data_path="./data/LibriBrain",
task=SpeechDetection(tmin=0.0, tmax=0.2),
partition="train",
standardize=True,
include_info=True,
)
print(len(ds), "samples")
x, y, info = ds[0]
print(x.shape, y.shape, info["dataset"]) # (channels,time), (time,), "libribrain2025"
LibriBrain has dataset-specific wrapper classes that don’t require a separate task object:
from pnpl.datasets.libribrain2025 import constants
from pnpl.datasets import LibriBrainSpeech, LibriBrainPhoneme
include_run_keys = [constants.RUN_KEYS[0]]
speech_ds = LibriBrainSpeech(
data_path="./data/LibriBrain",
include_run_keys=include_run_keys,
tmin=0.0,
tmax=0.2,
)
phoneme_ds = LibriBrainPhoneme(
data_path="./data/LibriBrain",
preprocessing_str="bads+headpos+sss+notch+bp+ds",
include_run_keys=include_run_keys,
tmin=-0.2,
tmax=0.6,
standardize=True,
)
MEG-MASC / Gwilliams 2022 (OSF, no auth)#
from pnpl.datasets import Gwilliams2022
from pnpl.tasks.gwilliams2022 import PhonemeClassification
ds = Gwilliams2022(
data_path="./data/meg_masc",
task=PhonemeClassification(tmin=-0.2, tmax=0.6),
include_subjects=["01"],
include_sessions=["0"],
include_tasks=["0"], # story 0 = "lw1"
preprocessing="notch+bp+ds",
download=True,
standardize=True,
)
x, y = ds[0]
Armeni 2022 (Radboud, auth required)#
import os
os.environ["RADBOUD_USERNAME"] = "you@orcid.org"
os.environ["RADBOUD_PASSWORD"] = "..."
from pnpl.datasets import Armeni2022
from pnpl.tasks.armeni2022 import PhonemeClassification
ds = Armeni2022(
data_path="./data/armeni",
task=PhonemeClassification(tmin=-0.2, tmax=0.6),
include_subjects=["001"],
include_sessions=["001"],
preprocessing="notch+bp+ds",
standardize=True,
)
Schöffelen 2019 / MOUS (Radboud, auth required)#
from pnpl.datasets import Schoffelen2019
from pnpl.tasks.schoffelen2019 import TrialEpoching
ds = Schoffelen2019(
data_path="./data/schoffelen",
task=TrialEpoching(tmin=0.0, tmax=1.0, label_type="trigger"),
include_subjects=["A2002"],
include_tasks=["auditory"],
preprocessing="notch+bp+ds",
standardize=True,
)
Note
The first time you instantiate a non-LibriBrain dataset, files are
downloaded from the appropriate remote (OSF for MEG-MASC, Radboud
WebDAV for Armeni / MOUS) and the preprocessing pipeline runs against
the raw recording, caching the result as H5 under
data_path/derivatives/serialised/.... Subsequent constructions read
the cached H5 directly.
For LibriBrain, files are downloaded from Hugging Face and cached
under data_path on first use.
See Datasets for the full list of arguments and Preprocessing for how to customize the pipeline.