Added augmentation methods

2025-06-13 13:36:39 +02:00
parent 97b1a71872
commit 823134efd8
7 changed files with 115 additions and 7 deletions
--- a/methods.md
+++ b/methods.md
@ -50,4 +50,56 @@ Comparing
 * Random erasing - replaces random patches of image with gaussian noise

 GAN architecture was used to generate synthetic images. 
-Not exactly an EEG paper, but methods can be used for BCI I thinnk
+Not exactly an EEG paper, but methods can be used for BCI
+
+## EEG Data Augmentation Method for Identity Recognition Based on Spatial–Temporal Generating Adversarial Network(2024)
+https://www.mdpi.com/2079-9292/13/21/4310
+
+Authors propose end-to-end EEG data augmentation method  based on spatial-temporal generative adversarial(STGAN) network. Discriminator uses temporal feature encoding (todo learn about it more [here](https://medium.com/@raphael.schoenenberger_95380/encoding-temporal-features-part-1-f26d08feebd8)) and a spatial feature encoder in parallel. Good for global dependencies across channels and time of EEG. GAN improves quality and diversity of augmented EEG data. Conducted on BCI-IV2A dataset. Frechet inception distance was used to evaluate data quality. Compared on deep learning models: EEGNET, ShalowConvNet and DeepConvNet. Approach with STGAN showed better performance of recognition, higher data quality. Worth exploring more. maybe use in pilot pipeline. 
+Good references to other papers about data augmentation.
+TODO: very interesting worth reading more times.
+
+
+## Augmenting The Size of EEG datasets Using Generative Adversarial Networks (2018)
+Explained here: [[Augmentation methods#EEG Data Augmentation Method for Identity Recognition Based on Spatial–Temporal Generating Adversarial Network]]
+Link: https://ieeexplore.ieee.org/abstract/document/8489727
+
+## Generative Adversarial Networks-Based Data Augmentation for Brain–Computer Interface(2020)
+Explained here: [[Augmentation methods#EEG Data Augmentation Method for Identity Recognition Based on Spatial–Temporal Generating Adversarial Network]]
+Link: https://ieeexplore.ieee.org/abstract/document/9177281
+Data augmentation using deep convolutional GAN (DCGAN). Then tested improvements on BCI classifier.
+How are they doing it: 
+Real EEG data is collected from 14 subjects performing a movement intention task. Signal is then segmented into 2 second windows representing either task or rest. Resulting dataset is of shape 80x500x62:
+* 80 samples. 40 MI tasks and 40 rest windows
+* 500 time points per task. 2 second window with 250 Hz sampling rate
+* 62 EEG channels
+For each subject is extracted **subject-specific feature vector** from first half (40) of samples, 20 MI and 20 rest. Feature vector has shape 1x100. This vector is then fed into generator to generate EEG resembling the target subject. 
+**Then DCGAN comes**. Generator part of DCGAN takes random noise and **subject-specific conditional vector** and generates one sample of shape 500x62. Discriminator takes real or generated EEG, subject-specific conditional vector and tries to decide whether the input is real or fake. Training is like GAN. 
+**Once generator is trained** it can produce arbitrary amount of artificial data for each subject from noise. In paper authors mix real and generated data to create dataset for classification training. 
+
+Used dataset was obtained in house by authors. They conducted two types of eeg data gathering. One was with subjects fully concentrated mind, the other was closer to real life scenario where subjects had diverted attention. Attention was diverted by pauses to count beeps of a certain frequency. 
+The EEG signals were recorded using a g.HIamp-research amplifier and 62 gel-based active electrodes placed in a g.Gamma cap. The recorded EEG channels were Fp1, Fp2, Fpz, AF3, AF4, AF7, AF8, F1-8, Fz, FC1-6, FCz, FT7, FT8, C1-6, Cz, T7, T8, CP1-6, CPz, TP7-10, P1-8, Pz, PO3, PO4, PO7, PO8, POz, O1, O2, and Oz, and they were referenced to the right earlobe. AFz channel was used as the ground. Recording frequency was 1200 Hz. 
+
+Data were bandpass filtered at 0.01-100 Hz, 50Hz notch filter. Independent component analysis (ICA) and artifact subspace reconstruction (ASR) were applied to remove electrooculogram (EOG) and EMG artifacts, and the data were high-pass filtered with the cutoff at 0.5 Hz. These methods were implemented in MATLAB R2013b using EEGLab. Data were segmented to 2s epochs and then down-sampled to 250 Hz.
+**Evaluation using their own dataset**:
+Leave one subject out - train on all subjects except one, then test on that one.
+Adaptive training - train on all subjects and half data of one subject. Test on 2nd half of that subjects data.
+**Evaluation on BCI Competition III dataset IV a**:
+Down-sampled to 100 Hz. Only testing generalizability, using the adaptive training with and without augmented data. 
+**What we can do**: Extract something like action-specific feature vector and feed it to generator along with noise so it will generate new data specific for each movement. 
+Extracting that specific vector can be done with (now only thinking) some clustering or find method which would find discriminative features given labels. 
+
+ 
+
+
+
+![image](images/Pasted_image_20250610150904.png)
+This is taken from: [[Augmentation methods#EEG Data Augmentation Method for Identity Recognition Based on Spatial–Temporal Generating Adversarial Network]]
+
+
+**Papers to cite from:**
+
+About Motion intention: 
+E. Lew, R. Chavarriaga, S. Silvoni, and J. D. R. Millán, “Detection of self-paced reaching movement intention from EEG signals,” Frontiers Neuroeng., vol. 5, p. 13, Jul. 2012.
+
+M. Jahanshahi and M. Hallett, The Bereitschaftspotential: MovementRelated Cortical Potentials. Boston, MA, USA: Springer, 2003.
--- a/stuff.md
+++ b/stuff.md
@ -0,0 +1,15 @@
+This document is about my ideas what we can do, based on what I've read in papers or anywhere on the internet.
+
+## Augmentation
+
+### Use GAN
+Gan generates new samples of data. Generator is trained alongside descriminator. After that we have a generator capable of generating new data. I mean it is not augmentation of dataset, it is creating whole new dataset.
+Tho the generator needs some input. And that is the question, what should it be. 
+* One option is to use labels with some random noise as input. 
+* This paper shows different approach [[Augmentation methods#Generative Adversarial Networks-Based Data Augmentation for Brain–Computer Interface(2020)]]. Based on that I propose method where we would extract something like **movement-specific feature vector** which would together with noise be input to generator.
+
+### Contrastive learning
+youtube video with explanation: https://www.youtube.com/watch?v=UqJauYELn6c
+
+## Feature Extraction
+I think we should try variational autoendcoders, and some novel architecture, like VQVAE(vector quantization should bring narrower space for classifier at the end). 
--- a/dataset.md
+++ b/dataset.md
@ -0,0 +1,37 @@
+
+This document is about how to work with datasets. Basic idea is to research how others do it and implement it in our pipeline.
+
+Our approach will be as follows. Do training from scratch on as large mix of datasets as possible. Then do fine tuning on some benchmark dataset and evaluate on it. 
+
+This document talks about used datasets: train, eval and how are they used. 
+
+## Unify datasets
+
+Many decisions to be made.
+Convert arbitrary dataset format to `mne.io.Raw`. Resample to common frequency, select only relevant probes.
+Common frequency: TODO. 
+Selected common channels: TODO. Missing channels will be filled with 0.
+Normalize values to one interval. 
+Bandpass filtering: what should be the parameters
+
+
+## How others do it 
+### [[Augmentation methods#EEG Data Augmentation Method for Identity Recognition Based on Spatial–Temporal Generating Adversarial Network]]
+This paper uses GAN to augment data and train something like brain identification on it.
+More info on that in [[Augmentation methods]]. 
+They used dataset BCI competition IV dataset 2A. 
+This dataset records EEG data during motor imagery tasks involving left hand, right hand, both feet, and tongue movements performed by 9 subjects. Each subject performed 72 trials of each of the 4 tasks during a single experiment, and each motor imagery trial lasted for 3 s. The EEG data were recorded using 22 Ag/AgCl electrodes at a sampling frequency of 250 Hz and were bandpass filtered between 0.5 and 100 Hz.
+![Image_comp4_2A](Pasted_image_20250610163351.png)
+Furthemore authors used 50Hz notch to supress line noise and excluded three channels recording eye movement. 
+For each individual’s EEG data, a third-order Butterworth IIR filter was applied in the 4–40 Hz frequency band to reduce the influence of eye movements.
+Subsequentially data were min-max normalized to range <0,1>. 
+**The dataset was divided into training and testing sets in a 4:1 ratio, with each individual’s training set consisting of 864 samples.**
+
+### [[Augmentation methods#Generative Adversarial Networks-Based Data Augmentation for Brain–Computer Interface(2020)]]
+**Evaluation using their own dataset**:
+Leave one subject out - train on all subjects except one, then test on that one.
+Adaptive training - train on all subjects and half data of one subject. Test on 2nd half of that subjects data.
+**Evaluation on BCI Competition III dataset IV a**:
+Down-sampled to 100 Hz. Only testing generalizability, using the adaptive training with and without augmented data. 
+
+
--- a/Papers.md
+++ b/Papers.md
@ -44,11 +44,6 @@ Models like GoogleNet and AlexNet have been used, where AlexNet outperformed Goo
 ## A generic framework for adaptive EEG-based BCI training and operation
 https://arxiv.org/abs/1707.07935

-## EEG Data Augmentation Method for Identity Recognition Based on Spatial–Temporal Generating Adversarial Network
-https://www.mdpi.com/2079-9292/13/21/4310
-
-Authors propose end-to-end EEG data augmentation method  based on spatial-temporal generative adversarial(STGAN) network. Discriminator uses temporal feature encoding (todo learn about it more [here](https://medium.com/@raphael.schoenenberger_95380/encoding-temporal-features-part-1-f26d08feebd8)) and a spatial feature encoder in parallel. Good for global dependencies across channels and time of EEG. GAN improves quality and diversity of augmented EEG data. Conducted on BCI-IV2A dataset. Frechet inception distance was used to evaluate data quality. Compared with deep learning models: EEGNET, ShalowConvNet and DeepConvNet. Approach with STGAN  was better in terms of data quality.
-
 ## Enhancing the decoding accuracy of EEG signals by the introduction of anchored-STFT and adversarial data augmentation method
 https://www.nature.com/articles/s41598-022-07992-w

@ -80,7 +75,7 @@ Somehow we could leverage methods proposed here.
 ## Data augmentation for deep-learning-based electroencephalography
 https://www.sciencedirect.com/science/article/pii/S0165027020303083?via%3Dihub
 Authors try to augment EEG data for better results using them in deep learning. 
-It is only review. This papers discusses methods for DA:
+It is only review. This papers discusses methods for data augmentation:
 * noise addition
 * GAN
 * sliding window
@ -164,6 +159,8 @@ Variations of VAE:
 * VQ-vae - provide discrete latent space for sharper reconstructions


+
+
 TODO:
 - find augmentation methods worth of trying
 - find example architecture for classification BCI
--- a/TODOs.md
+++ b/TODOs.md
@ -1,3 +1,4 @@
+
 **Create thesis on overleaf**
 Take template for masters thesis and start working on it. Just start writing, also it will be good place to just write what has been explored 

--- a/images/Pasted_image_20250610150904.png
+++ b/images/Pasted_image_20250610150904.png
--- a/images/Pasted_image_20250610163351.png
+++ b/images/Pasted_image_20250610163351.png