56 lines
3.7 KiB
Markdown
56 lines
3.7 KiB
Markdown
|
||
This document is about how to work with datasets. Basic idea is to research how others do it and implement it in our pipeline.
|
||
|
||
Our approach will be as follows. Do training from scratch on as large mix of datasets as possible. Then do fine tuning on some benchmark dataset and evaluate on it.
|
||
|
||
This document talks about used datasets: train, eval and how are they used.
|
||
|
||
## Unify datasets
|
||
|
||
Many decisions to be made.
|
||
Convert arbitrary dataset format to `mne.io.Raw`. Resample to common frequency, select only relevant probes.
|
||
Common frequency: TODO.
|
||
Selected common channels: TODO. Missing channels will be filled with 0.
|
||
Normalize values to one interval.
|
||
Bandpass filtering: what should be the parameters
|
||
|
||
|
||
## How others do it
|
||
### [[Augmentation methods#EEG Data Augmentation Method for Identity Recognition Based on Spatial–Temporal Generating Adversarial Network]]
|
||
This paper uses GAN to augment data and train something like brain identification on it.
|
||
More info on that in [[Augmentation methods]].
|
||
They used dataset BCI competition IV dataset 2A.
|
||
This dataset records EEG data during motor imagery tasks involving left hand, right hand, both feet, and tongue movements performed by 9 subjects. Each subject performed 72 trials of each of the 4 tasks during a single experiment, and each motor imagery trial lasted for 3 s. The EEG data were recorded using 22 Ag/AgCl electrodes at a sampling frequency of 250 Hz and were bandpass filtered between 0.5 and 100 Hz.
|
||

|
||
Furthemore authors used 50Hz notch to supress line noise and excluded three channels recording eye movement.
|
||
For each individual’s EEG data, a third-order Butterworth IIR filter was applied in the 4–40 Hz frequency band to reduce the influence of eye movements.
|
||
Subsequentially data were min-max normalized to range <0,1>.
|
||
**The dataset was divided into training and testing sets in a 4:1 ratio, with each individual’s training set consisting of 864 samples.**
|
||
|
||
### [[Augmentation methods#Generative Adversarial Networks-Based Data Augmentation for Brain–Computer Interface(2020)]]
|
||
**Evaluation using their own dataset**:
|
||
Leave one subject out - train on all subjects except one, then test on that one.
|
||
Adaptive training - train on all subjects and half data of one subject. Test on 2nd half of that subjects data.
|
||
**Evaluation on BCI Competition III dataset IV a**:
|
||
Down-sampled to 100 Hz. Only testing generalizability, using the adaptive training with and without augmented data.
|
||
|
||
|
||
### [[Augmentation methods#Augmenting The Size of EEG datasets Using Generative Adversarial Networks (2018)]]
|
||
* Evaluation using 5 fold cross-validation on dataset PhysioNet against AutoEncoders and VAE. Using metric reconstruction error.
|
||
* Assesing impact of RGAN with different classification models. Evaluating classifcation accuracy on deep feed-forward NN, SVM, random forest tree.
|
||
|
||
## [[Augmentation methods#Data augmentation strategies for EEG-based motor imagery decoding (2022)]]
|
||
|
||
Used datasets:
|
||
* https://academic.oup.com/gigascience/article/6/7/gix034/3796323
|
||
* https://www.nature.com/articles/sdata2018211
|
||
For now I don't know where to get raw data of those datasets
|
||
Data processing:
|
||
* Bandpass filter 1-40 Hz
|
||
* Baseline correction was performed with the first 200ms pre-cue. Subtract average of eeg signal before the cue
|
||
* Artifact correction, oculograph and myograph. Slightly different parameters for each dataset
|
||
* Data re-referencing to average to improve the signal-to-noise ratio. The signal at each channel is re-referenced to the average signal across all electrodes.
|
||
* Used [[Papers#Autoreject Automated artifact rejection for MEG and EEG data]]
|
||
Dataset was split in ratio 70:12:18 between train:validation:test.
|
||
|