Armeni 2022#
pnpl.datasets.Armeni2022 wraps the audiobook-listening MEG dataset
released by Armeni et al. (2022) on the Radboud Data Repository
(DSC_3011085.05_995_v1).
3 subjects × 10 sessions, ~10 hours of audiobook per subject
CTF275 axial gradiometer system, raw
.dsdirectoriesSessions are large (~8 GB raw CTF each)
Single task:
compr(comprehension)
Warning
Armeni 2022 is not open access. You need an approved data-sharing agreement with the dataset owner before you can download.
Auth#
export RADBOUD_USERNAME="you@orcid.org" # often an ORCID
export RADBOUD_PASSWORD="..."
pnpl reads these from the environment (or a project-local .env)
and uses HTTP Basic auth against the Radboud WebDAV endpoint.
Quickstart#
from pnpl.datasets import Armeni2022
from pnpl.tasks.armeni2022 import PhonemeClassification
ds = Armeni2022(
data_path="./data/armeni",
task=PhonemeClassification(tmin=-0.2, tmax=0.6, label_type="phoneme"),
include_subjects=["001"],
include_sessions=["001"],
include_tasks=["compr"],
preprocessing="notch+bp+ds",
download=True,
standardize=True,
)
x, y = ds[0]
print(x.shape, y.item())
The first construction downloads the requested CTF .ds directory
(every chunked binary inside it), runs the preprocessing pipeline, and
caches the result as H5. Subsequent constructions read directly from
the cached H5.
Note
Because raw CTF sessions are multi-GB, the Armeni loader runs the pipeline chunk-by-chunk (default 120-second chunks) so peak memory stays bounded. Reference / EEG / EOG / STIM channels are dropped before filtering — only MEG channels are preserved.
BIDS axes#
Axis |
Values |
|---|---|
|
|
|
|
|
|
|
always |
Selected arguments#
task— currentlypnpl.tasks.armeni2022.PhonemeClassification.preprocessing— pipeline string used in derivative filenames (e.g."notch+bp+ds"). Set toNoneto materialize H5 unchanged from the raw CTF.preprocessing_config— per-step overrides forwarded topnpl.preprocessing.Pipeline. See Preprocessing.include_subjects,include_sessions,include_tasks,include_run_keys(and theirexclude_*counterparts).standardize,clipping_boundary,channel_means,channel_stds.create_h5_if_missing(defaultTrue).preload_h5(defaultFalse) — read the H5 fully into RAM on first access. Faster repeat reads at the cost of memory.