pnpl.tasks.pallier2025.WordClassification#
- class pnpl.tasks.pallier2025.WordClassification(tmin=0.0, tmax=3.0, min_word_length=1, max_word_length=None, keep_top_k=None, _words_sorted=<factory>, _word_to_id=<factory>)[source]#
Word-onset classification on Pallier 2025 (LittlePrince Listen).
Sample tuples follow the continuous-data convention:
(subject, session, task, run, onset, word_str). The label vocabulary is the set of unique words observed across the requested runs (lower-cased, stripped).- Parameters:
tmin (float) – Start time relative to word onset (seconds). Default 0.0.
tmax (float) – End time relative to word onset (seconds). Default 3.0.
min_word_length (int) – Minimum stripped word length to include (default 1; set higher to drop function words / single letters from the elided audiobook tokenization).
max_word_length (int | None) – Maximum stripped word length to include (
Nonefor no limit).keep_top_k (int | None) – If set, restrict the label vocabulary to the
kmost-frequent tokens across the requested runs. Other windows are dropped. Useful for the d’Ascoli et al. “top-250” evaluation._words_sorted (list)
_word_to_id (dict)
- __init__(tmin=0.0, tmax=3.0, min_word_length=1, max_word_length=None, keep_top_k=None, _words_sorted=<factory>, _word_to_id=<factory>)#
- Parameters:
tmin (float)
tmax (float)
min_word_length (int)
max_word_length (int | None)
keep_top_k (int | None)
_words_sorted (list)
_word_to_id (dict)
- Return type:
None
Methods
__init__([tmin, tmax, min_word_length, ...])collect_samples(dataset)get_label(sample)Attributes
keep_top_klabel_infomax_word_lengthmin_word_lengthtmaxtmin