Added augmentation methods

2025-06-13 13:36:39 +02:00
parent 97b1a71872
commit 823134efd8
7 changed files with 115 additions and 7 deletions
--- a/dataset.md
+++ b/dataset.md
@ -0,0 +1,37 @@
+
+This document is about how to work with datasets. Basic idea is to research how others do it and implement it in our pipeline.
+
+Our approach will be as follows. Do training from scratch on as large mix of datasets as possible. Then do fine tuning on some benchmark dataset and evaluate on it. 
+
+This document talks about used datasets: train, eval and how are they used. 
+
+## Unify datasets
+
+Many decisions to be made.
+Convert arbitrary dataset format to `mne.io.Raw`. Resample to common frequency, select only relevant probes.
+Common frequency: TODO. 
+Selected common channels: TODO. Missing channels will be filled with 0.
+Normalize values to one interval. 
+Bandpass filtering: what should be the parameters
+
+
+## How others do it 
+### [[Augmentation methods#EEG Data Augmentation Method for Identity Recognition Based on Spatial–Temporal Generating Adversarial Network]]
+This paper uses GAN to augment data and train something like brain identification on it.
+More info on that in [[Augmentation methods]]. 
+They used dataset BCI competition IV dataset 2A. 
+This dataset records EEG data during motor imagery tasks involving left hand, right hand, both feet, and tongue movements performed by 9 subjects. Each subject performed 72 trials of each of the 4 tasks during a single experiment, and each motor imagery trial lasted for 3 s. The EEG data were recorded using 22 Ag/AgCl electrodes at a sampling frequency of 250 Hz and were bandpass filtered between 0.5 and 100 Hz.
+![Image_comp4_2A](Pasted_image_20250610163351.png)
+Furthemore authors used 50Hz notch to supress line noise and excluded three channels recording eye movement. 
+For each individual’s EEG data, a third-order Butterworth IIR filter was applied in the 4–40 Hz frequency band to reduce the influence of eye movements.
+Subsequentially data were min-max normalized to range <0,1>. 
+**The dataset was divided into training and testing sets in a 4:1 ratio, with each individual’s training set consisting of 864 samples.**
+
+### [[Augmentation methods#Generative Adversarial Networks-Based Data Augmentation for Brain–Computer Interface(2020)]]
+**Evaluation using their own dataset**:
+Leave one subject out - train on all subjects except one, then test on that one.
+Adaptive training - train on all subjects and half data of one subject. Test on 2nd half of that subjects data.
+**Evaluation on BCI Competition III dataset IV a**:
+Down-sampled to 100 Hz. Only testing generalizability, using the adaptive training with and without augmented data. 
+
+