pnpl.tasks.pallier2025.WordClassification

pnpl.tasks.pallier2025.WordClassification#

class pnpl.tasks.pallier2025.WordClassification(tmin=0.0, tmax=3.0, min_word_length=1, max_word_length=None, keep_top_k=None, _words_sorted=<factory>, _word_to_id=<factory>)[source]#

Word-onset classification on Pallier 2025 (LittlePrince Listen).

Sample tuples follow the continuous-data convention: (subject, session, task, run, onset, word_str). The label vocabulary is the set of unique words observed across the requested runs (lower-cased, stripped).

Parameters:
  • tmin (float) – Start time relative to word onset (seconds). Default 0.0.

  • tmax (float) – End time relative to word onset (seconds). Default 3.0.

  • min_word_length (int) – Minimum stripped word length to include (default 1; set higher to drop function words / single letters from the elided audiobook tokenization).

  • max_word_length (int | None) – Maximum stripped word length to include (None for no limit).

  • keep_top_k (int | None) – If set, restrict the label vocabulary to the k most-frequent tokens across the requested runs. Other windows are dropped. Useful for the d’Ascoli et al. “top-250” evaluation.

  • _words_sorted (list)

  • _word_to_id (dict)

__init__(tmin=0.0, tmax=3.0, min_word_length=1, max_word_length=None, keep_top_k=None, _words_sorted=<factory>, _word_to_id=<factory>)#
Parameters:
  • tmin (float)

  • tmax (float)

  • min_word_length (int)

  • max_word_length (int | None)

  • keep_top_k (int | None)

  • _words_sorted (list)

  • _word_to_id (dict)

Return type:

None

Methods

__init__([tmin, tmax, min_word_length, ...])

collect_samples(dataset)

get_label(sample)

Attributes

keep_top_k

label_info

max_word_length

min_word_length

tmax

tmin