pnpl.tasks.pallier2025.WordClassification

pnpl.tasks.pallier2025.WordClassification#

class pnpl.tasks.pallier2025.WordClassification(tmin=0.0, tmax=3.0, min_word_length=1, max_word_length=None, keep_top_k=None, _words_sorted=<factory>, _word_to_id=<factory>)[source]#

Word-onset classification on Pallier 2025 (LittlePrince Listen).

Sample tuples follow the continuous-data convention: (subject, session, task, run, onset, word_str). The label vocabulary is the set of unique words observed across the requested runs (lower-cased, stripped).

Parameters:

tmin (float) – Start time relative to word onset (seconds). Default 0.0.
tmax (float) – End time relative to word onset (seconds). Default 3.0.
min_word_length (int) – Minimum stripped word length to include (default 1; set higher to drop function words / single letters from the elided audiobook tokenization).
max_word_length (int | None) – Maximum stripped word length to include (None for no limit).
keep_top_k (int | None) – If set, restrict the label vocabulary to the k most-frequent tokens across the requested runs. Other windows are dropped. Useful for the d’Ascoli et al. “top-250” evaluation.
_words_sorted (list)
_word_to_id (dict)

__init__(tmin=0.0, tmax=3.0, min_word_length=1, max_word_length=None, keep_top_k=None, _words_sorted=<factory>, _word_to_id=<factory>)#

Parameters:

tmin (float)
tmax (float)
min_word_length (int)
max_word_length (int | None)
keep_top_k (int | None)
_words_sorted (list)
_word_to_id (dict)

Return type:

None

Methods

`__init__`([tmin, tmax, min_word_length, ...])
`collect_samples`(dataset)
`get_label`(sample)

Attributes

`keep_top_k`
`label_info`
`max_word_length`
`min_word_length`
`tmax`
`tmin`

pnpl.tasks.pallier2025.WordClassification

Contents

pnpl.tasks.pallier2025.WordClassification#