Aloha to the PNPL library! 🍍

Aloha to the PNPL library! 🍍#

PNPL is a friendly Python toolkit for loading and processing brain datasets for deep learning. It ships PyTorch Dataset classes for several public MEG corpora — LibriBrain, MEG-MASC (Gwilliams 2022), Armeni 2022, and the MOUS corpus (Schöffelen 2019) — together with a composable preprocessing pipeline and task abstractions so you can focus on modeling, not file plumbing.

Get Started#

Install PNPL

pip install pnpl

Load LibriBrain with the task-based API

from pnpl.datasets import LibriBrain
from pnpl.tasks import SpeechDetection

ds = LibriBrain(
    data_path="./data/LibriBrain",
    task=SpeechDetection(tmin=0.0, tmax=0.2),
    partition="train",
)
x, y, info = ds[0]
print(x.shape, y.shape)  # (channels,time), (time,)

The same task-based pattern works for the other datasets (Gwilliams2022, Armeni2022, Schoffelen2019); they each take a matching task object from pnpl.tasks.<dataset> and an optional preprocessing string. Wrapper classes such as LibriBrainSpeech and LibriBrainPhoneme are also available.

Explore PNPL#

Start here

Quickstart

Install and load your first dataset run in a few lines.

Datasets

Dataset overview

The four shipped datasets, what each one needs, and how to pick.

Pipelines

Preprocessing

The composable pipeline that turns raw MEG into cached H5.

Pipelines

Tasks

How sample windows and labels are defined; available tasks per dataset.

Reference

API

Auto‑generated docs for classes and modules with links to source.

Tutorial

Speech Detection (LibriBrain)

Learn speech vs. silence classification with a compact walkthrough and Colab GPU.

Tutorial

Phoneme Classification (LibriBrain)

Build a phoneme recognizer on MEG with practical tips and code.

Future Plans#

While LibriBrain is still the headline use case (the 2025 LibriBrain competition ships against this package), we’ll maintain pnpl for years to come and hope to grow it into a useful asset for the wider community. The recent refactor adds MEG-MASC, Armeni 2022, and the MOUS corpus alongside a shared preprocessing pipeline; on the roadmap:

More public datasets and dataset loaders
Easy-to-use preprocessing pipelines
Data augmentation options

Contribute#

We welcome issues and pull requests. See the Contributor Guide for setup and guidelines.