0 likes | 10 Vues
Adversarial attacks and counterfactual explanations play crucial roles in understanding and improving machine learning models. By leveraging a 2-step filter technique, this study aims to transform adversarial attacks into counterfactual explanations without the need for retraining the model. Utilizing Denoising Diffusion Probabilistic Models (DDPM) and post-processing methods, the approach enhances the classifier's robustness and maintains image structures while explaining model predictions. Evaluation metrics include flip rate, mean number of attributes changed, face verification accuracy, face similarity, and more.
E N D