Tasks

Tasks#

A Task decides how a dataset’s continuous (or epoched) data is turned into supervised samples. Pass a task instance to a dataset’s task= argument and the dataset will:

call task.collect_samples(self) during __init__ to enumerate sample tuples (one per phoneme onset, word onset, trial onset, …),
call task.get_label(sample) inside __getitem__ to compute the label, and
expose task.label_info (classes, label↔id maps, n_classes).

Most tasks are dataclasses with a tmin and tmax (window edges in seconds) plus task-specific knobs. They are cheap — instantiate one per dataset.

TaskProtocol#

from pnpl.tasks import TaskProtocol

A minimal task implements three things:

@dataclass
class MyTask:
    tmin: float = 0.0
    tmax: float = 0.5

    def collect_samples(self, dataset) -> list[tuple]:
        ...

    def get_label(self, sample: tuple) -> Any:
        ...

    @property
    def label_info(self) -> dict:
        return {"classes": [...], "label_to_id": {...}, "n_classes": ...}

pnpl.tasks.base.BaseTask is an optional convenience base class with a default label_info implementation if you set _classes / _label_to_id.

LibriBrain tasks#

from pnpl.tasks import (
    SpeechDetection,
    PhonemeClassification,
    WordClassification,
)

(Re-exported at pnpl.tasks for convenience; the canonical module is pnpl.tasks.libribrain.)

SpeechDetection(tmin, tmax, stride=None, oversample_silence_jitter=0) — slide a window across continuous MEG and label each step as speech / silence. Returns a per-time-point label array.
PhonemeClassification(tmin, tmax, label_type="phoneme" | "voicing", exclude_phonemes=[]) — sample windowed around each phoneme onset.
WordClassification(tmin, tmax, min_word_length=1, max_word_length=None, keyword_detection=None) — multi-class word classification or binary keyword detection (keyword_detection="cat" → 1 if the window’s word is "cat", else 0). tmin / tmax may be None to auto-compute from word duration.

MEG-MASC tasks (Gwilliams 2022)#

from pnpl.tasks.gwilliams2022 import PhonemeClassification, WordClassification

PhonemeClassification(tmin, tmax, label_type="phoneme" | "voicing") — phoneme-aligned epochs from the MEG-MASC events.tsv.
WordClassification(tmin, tmax, require_pronounced=True) — word-aligned epochs; label is the lower-cased word string.

Armeni 2022 tasks#

from pnpl.tasks.armeni2022 import PhonemeClassification

PhonemeClassification(tmin, tmax, label_type, exclude_phonemes, skip_negative_onset) — phoneme-aligned epochs. ARPABET stress digits (AH0, IY1, …) are stripped before mapping to class ids.

Pallier 2025 (LittlePrince Listen) tasks#

from pnpl.tasks.pallier2025 import WordClassification

WordClassification(tmin, tmax, min_word_length, max_word_length, keep_top_k) — windowed around each word onset (trial_type='Word' rows in the events.tsv). Default tmin=0.0, tmax=3.0 matches the d’Ascoli et al. (Nat Commun 2025) recipe. keep_top_k restricts the label vocabulary to the k most-frequent tokens — useful for the paper’s “top-250” evaluation.

Schöffelen 2019 (MOUS) tasks#

from pnpl.tasks.schoffelen2019 import TrialEpoching

TrialEpoching(tmin, tmax, label_type="trigger" | "binary", include_tasks=None) — epochs around each trial onset. "trigger" labels with the first UPPT001 trigger code inside the trial; "binary" labels every trial with a constant 1.

Sample tuple convention#

For continuous-data tasks, sample tuples follow:

(subject, session, task, run, onset, label_value, ...)

label_value is a string for tasks like word detection, an integer trigger code for MOUS, or a phoneme symbol for phoneme classification. The dataset translates it to the final tensor label via task.get_label(sample) and the label_info lookup.