Files
research_augmentation_EEG/Organizing dataset.md
2025-06-16 14:15:05 +02:00

3.7 KiB
Raw Permalink Blame History

This document is about how to work with datasets. Basic idea is to research how others do it and implement it in our pipeline.

Our approach will be as follows. Do training from scratch on as large mix of datasets as possible. Then do fine tuning on some benchmark dataset and evaluate on it.

This document talks about used datasets: train, eval and how are they used.

Unify datasets

Many decisions to be made. Convert arbitrary dataset format to mne.io.Raw. Resample to common frequency, select only relevant probes. Common frequency: TODO. Selected common channels: TODO. Missing channels will be filled with 0. Normalize values to one interval. Bandpass filtering: what should be the parameters

How others do it

Augmentation methods#EEG Data Augmentation Method for Identity Recognition Based on SpatialTemporal Generating Adversarial Network

This paper uses GAN to augment data and train something like brain identification on it. More info on that in Augmentation methods. They used dataset BCI competition IV dataset 2A. This dataset records EEG data during motor imagery tasks involving left hand, right hand, both feet, and tongue movements performed by 9 subjects. Each subject performed 72 trials of each of the 4 tasks during a single experiment, and each motor imagery trial lasted for 3 s. The EEG data were recorded using 22 Ag/AgCl electrodes at a sampling frequency of 250 Hz and were bandpass filtered between 0.5 and 100 Hz. Image_comp4_2A Furthemore authors used 50Hz notch to supress line noise and excluded three channels recording eye movement. For each individuals EEG data, a third-order Butterworth IIR filter was applied in the 440 Hz frequency band to reduce the influence of eye movements. Subsequentially data were min-max normalized to range <0,1>. The dataset was divided into training and testing sets in a 4:1 ratio, with each individuals training set consisting of 864 samples.

Augmentation methods#Generative Adversarial Networks-Based Data Augmentation for BrainComputer Interface(2020)

Evaluation using their own dataset: Leave one subject out - train on all subjects except one, then test on that one. Adaptive training - train on all subjects and half data of one subject. Test on 2nd half of that subjects data. Evaluation on BCI Competition III dataset IV a: Down-sampled to 100 Hz. Only testing generalizability, using the adaptive training with and without augmented data.

Augmentation methods#Augmenting The Size of EEG datasets Using Generative Adversarial Networks (2018)

  • Evaluation using 5 fold cross-validation on dataset PhysioNet against AutoEncoders and VAE. Using metric reconstruction error.
  • Assesing impact of RGAN with different classification models. Evaluating classifcation accuracy on deep feed-forward NN, SVM, random forest tree.

Augmentation methods#Data augmentation strategies for EEG-based motor imagery decoding (2022)

Used datasets: