## Manifool
https://arxiv.org/abs/1711.09115
This method was used in article [Data Augmentation with manifold exploring](https://arxiv.org/pdf/1901.04420), which is relevant to eeg research.
**Geometric robustness of deep networks: analysis and improvement**
The manifool paper discusses creation of an algorithm to measure geometric invariance of deep networks. Then discusses how to make the networks better at managing those transformations. Authors try to search for minimal fooling transformation to measure the invariance of a NN. Then they do fine tuning on transformed examples.
First preliminaries. Lets say we have a set of transformations $\Tau$. Then small $\Tau$ is a transformation which belongs to $\Tau$. Now how to measure how the effect of transformation, respectivelly we are looking for a metric which would quantitatively measure how much does transformation change sample. Lets look at some metrics applicable for this problem:

* Measure l<sup>2</sup> distance between transformation matrices. This doesn't reflect the semantics of transformations such as rotation vs translation. 
* Squared L<sup>2</sup> distance, difference between two transformed images. . This is better since it is dependent on sample. Tho not good enough. 
* Length of the shortest curve between two transformations, geodesic distance. This metric is the one used in manifool algorithm.

![image](images/Pasted_image_20250319153406.png)

As you can see the metric is between two transformations. The first transformation is always identity transformation, we will call it e. 
Manifool in particular tries to find minimal transformation, with minimal distance from e, which leads to misclassification. 
Then the invariance score of classifier is calculateg as an average minimal fooling transformation.

![image](images/Pasted_image_20250319194746.png)

**Manifool algorithm**
Algorithm to find minimal fooling transformation. The main idea is to iteratively move from image towards decision boundary of the classifier, while staying on the transformation manifold.
Each iteration is composed of two steps:

* Choosing the movement direction
* Mapping the movement onto manifold

![image](images/Pasted_image_20250319200254.png)

This iteratively continues until hitting classification boundary of classifier.  
Choosing movement direction and length. 
Lets imagine binary classifier k(x) = sign(f(x)), where f: R<sup>n</sup> -> R is an arbitrary differentiable classification function. In paper for simplicity authors consider x for which label is 1. Therefore assuming f(x) > 0, for this example. Then to reach decision boundary in shortest way we need to choose direction which maximaly reduces f(x) therefore moving against derivation of f in x, opposite of gradient. Then some serious math is involved, I need more time to fully grasp it. 

**Experiments**
In the experiments, the invariance score for minimal transformations, defined in (5), is calculated by finding fooling transformation examples using ManiFool for a set of images, and computing the average of the geodesic distance of these examples. 
On the other hand, to calculate the invariance against random transformations, we generate a number of random transformations with a given geodesic distance r for each image in a set and calculate the misclassification rate2 of the network for the transformed images.

**My summary**
We can take pretrained model. Then make dataset tailored to that dataset which will contain minimal fooling samples. That dataset will be used to fine tune. 

## Data Augmentation with Manifold Exploring Geometric Transformations for Increased Performance and Robustness

https://arxiv.org/pdf/1901.04420
Authors did create dataset with images lying on decision boundary to maximize variance to which network is exposed to during training. 
Models used were: ResNet, VGG16, InceptionV3.
Comparing

* no augmentation
* random augmentation - rotational and horizontal flipping
* manifool
* Random erasing - replaces random patches of image with gaussian noise

GAN architecture was used to generate synthetic images. 
Not exactly an EEG paper, but methods can be used for BCI

## EEG Data Augmentation Method for Identity Recognition Based on Spatial–Temporal Generating Adversarial Network(2024)
https://www.mdpi.com/2079-9292/13/21/4310

Authors propose end-to-end EEG data augmentation method  based on spatial-temporal generative adversarial(STGAN) network. Discriminator uses temporal feature encoding (todo learn about it more [here](https://medium.com/@raphael.schoenenberger_95380/encoding-temporal-features-part-1-f26d08feebd8)) and a spatial feature encoder in parallel. Good for global dependencies across channels and time of EEG. GAN improves quality and diversity of augmented EEG data. Conducted on BCI-IV2A dataset. Frechet inception distance was used to evaluate data quality. Compared on deep learning models: EEGNET, ShalowConvNet and DeepConvNet. Approach with STGAN showed better performance of recognition, higher data quality. Worth exploring more. maybe use in pilot pipeline. 
Good references to other papers about data augmentation.
TODO: very interesting worth reading more times.


## Augmenting The Size of EEG datasets Using Generative Adversarial Networks (2018)
Explained here: [[Augmentation methods#EEG Data Augmentation Method for Identity Recognition Based on Spatial–Temporal Generating Adversarial Network]]
Link: https://ieeexplore.ieee.org/abstract/document/8489727
Authors propose architecture of recurrent generative adversarial network (RGAN). The main feature is using recurrent neural network in generator.
![Image](images/Pasted_image_20250616105342.png)


## Generative Adversarial Networks-Based Data Augmentation for Brain–Computer Interface(2020)
Explained here: [[Augmentation methods#EEG Data Augmentation Method for Identity Recognition Based on Spatial–Temporal Generating Adversarial Network]]
Link: https://ieeexplore.ieee.org/abstract/document/9177281
Data augmentation using deep convolutional GAN (DCGAN). Then tested improvements on BCI classifier.
How are they doing it: 
Real EEG data is collected from 14 subjects performing a movement intention task. Signal is then segmented into 2 second windows representing either task or rest. Resulting dataset is of shape 80x500x62:
* 80 samples. 40 MI tasks and 40 rest windows
* 500 time points per task. 2 second window with 250 Hz sampling rate
* 62 EEG channels
For each subject is extracted **subject-specific feature vector** from first half (40) of samples, 20 MI and 20 rest. Feature vector has shape 1x100. This vector is then fed into generator to generate EEG resembling the target subject. 
**Then DCGAN comes**. Generator part of DCGAN takes random noise and **subject-specific conditional vector** and generates one sample of shape 500x62. Discriminator takes real or generated EEG, subject-specific conditional vector and tries to decide whether the input is real or fake. Training is like GAN. 
**Once generator is trained** it can produce arbitrary amount of artificial data for each subject from noise. In paper authors mix real and generated data to create dataset for classification training. 

Used dataset was obtained in house by authors. They conducted two types of eeg data gathering. One was with subjects fully concentrated mind, the other was closer to real life scenario where subjects had diverted attention. Attention was diverted by pauses to count beeps of a certain frequency. 
The EEG signals were recorded using a g.HIamp-research amplifier and 62 gel-based active electrodes placed in a g.Gamma cap. The recorded EEG channels were Fp1, Fp2, Fpz, AF3, AF4, AF7, AF8, F1-8, Fz, FC1-6, FCz, FT7, FT8, C1-6, Cz, T7, T8, CP1-6, CPz, TP7-10, P1-8, Pz, PO3, PO4, PO7, PO8, POz, O1, O2, and Oz, and they were referenced to the right earlobe. AFz channel was used as the ground. Recording frequency was 1200 Hz. 

Data were bandpass filtered at 0.01-100 Hz, 50Hz notch filter. Independent component analysis (ICA) and artifact subspace reconstruction (ASR) were applied to remove electrooculogram (EOG) and EMG artifacts, and the data were high-pass filtered with the cutoff at 0.5 Hz. These methods were implemented in MATLAB R2013b using EEGLab. Data were segmented to 2s epochs and then down-sampled to 250 Hz.
**Evaluation using their own dataset**:
Leave one subject out - train on all subjects except one, then test on that one.
Adaptive training - train on all subjects and half data of one subject. Test on 2nd half of that subjects data.
**Evaluation on BCI Competition III dataset IV a**:
Down-sampled to 100 Hz. Only testing generalizability, using the adaptive training with and without augmented data. 
**What we can do**: Extract something like action-specific feature vector and feed it to generator along with noise so it will generate new data specific for each movement. 
Extracting that specific vector can be done with (now only thinking) some clustering or find method which would find discriminative features given labels. 


## Data augmentation strategies for EEG-based motor imagery decoding (2022)
Link: https://www.cell.com/heliyon/fulltext/S2405-8440(22)01528-6
Good introduction, can be used to source citation to intro.
Evaluation of these augmentation techniques:
* Averaging randomly selected trials - no good
* Recombining time slices of randomly selected trials - no good
* Recombining frequency slices of randomly selected data - no good 
* Gaussian noise addition
* Cropping
* Variational autoencoders data synthesis. Kullback-Leibler (KL) divergence and mean square reconstruction loss.
Metrics for evaluation of generated data: 
* Accuracy of prediction
* Frechet inception distance
* t-distributed stochastic neighbor embedding plots 


![image](images/Pasted_image_20250610150904.png)
This is taken from: [[Augmentation methods#EEG Data Augmentation Method for Identity Recognition Based on Spatial–Temporal Generating Adversarial Network]]


**Papers to cite from:**

About Motion intention: 
E. Lew, R. Chavarriaga, S. Silvoni, and J. D. R. Millán, “Detection of self-paced reaching movement intention from EEG signals,” Frontiers Neuroeng., vol. 5, p. 13, Jul. 2012.

M. Jahanshahi and M. Hallett, The Bereitschaftspotential: MovementRelated Cortical Potentials. Boston, MA, USA: Springer, 2003.