Aloha to the PNPL library! 🍍#
PNPL is a friendly Python toolkit for loading and processing brain
datasets for deep learning. It ships PyTorch Dataset classes for
several public MEG corpora — LibriBrain, MEG-MASC (Gwilliams 2022),
Armeni 2022, and the MOUS corpus (Schöffelen 2019) — together with a
composable preprocessing pipeline and task abstractions so you can
focus on modeling, not file plumbing.
Get Started#
Install PNPL
pip install pnpl
Load LibriBrain with the task-based API
from pnpl.datasets import LibriBrain
from pnpl.tasks import SpeechDetection
ds = LibriBrain(
data_path="./data/LibriBrain",
task=SpeechDetection(tmin=0.0, tmax=0.2),
partition="train",
)
x, y, info = ds[0]
print(x.shape, y.shape) # (channels,time), (time,)
The same task-based pattern works for the other datasets
(Gwilliams2022, Armeni2022, Schoffelen2019); they each take a
matching task object from pnpl.tasks.<dataset> and an optional
preprocessing string. Wrapper classes such as LibriBrainSpeech and
LibriBrainPhoneme are also available.
Explore PNPL#
Future Plans#
While LibriBrain is still the headline use case (the 2025 LibriBrain
competition ships against this package),
we’ll maintain pnpl for years to come and hope to grow it into a
useful asset for the wider community. The recent refactor adds
MEG-MASC, Armeni 2022, and the MOUS corpus alongside a shared
preprocessing pipeline; on the roadmap:
More public datasets and dataset loaders
Easy-to-use preprocessing pipelines
Data augmentation options
Contribute#
We welcome issues and pull requests. See the Contributor Guide for setup and guidelines.