Application of Deep Convolutional Generative Adversarial Networks to Generate Pose Invariant Facial Image Synthesis Data
Abstract
The field of technology is currently developing rapidly, one of the developments is artificial intelligence. Artificial intelligence can still find it difficult to solve problems that are easy for humans to do but difficult for computers to describe, such as facial recognition. There are still several problems related to the existing facial recognition model, namely, the facial recognition model is still unable to recognize facial shapes that are not in a perfect state due to several factors such as face position, lighting, expression, and obstacles covering the face. Among these several factors, the most influencing factor is the position of the face. Therefore, in this study, deep convolutional generative adversarial networks (DCGANs) will be applied to generate fake image data with varying face positions. This research will be carried out starting from collecting data, processing data, designing and training models, hyperparameter tuning, and lastly analyzing test results. Based on the results of hyperparameter tuning that were performed sequentially, the best hyperparameter combination produced is 200 epoch, 0.002 Generator learning rate, 0.5 Generator momentum/beta1, Adam as Generator optimizer, 0.0002 Discriminator learning rate, 0.5 Discriminator momentum/beta1, and Adam as Discriminator optimizer. The combination of hyperparameters gives a result with an FID score of 74.05. Based on testing with human observers, generated fake images have relatively good results, but there are still few bad fake image results
Downloads
References
Shao, H.-C., Liu, K.-Y., Lin, C.-W., & Lu, J. (2020). DotFAN: A Domain-transferred Face Augmentation Network for Pose and Illumination Invariant Face Recognition. http://arxiv.org/abs/2002.09859
Tsai, E.-J., & Yeh, W.-C. (2021). PAM: Pose Attention Module for Pose-Invariant Face Recognition. http://arxiv.org/abs/2111.11940
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative Adversarial Networks. http://arxiv.org/abs/1406.2661
Tran, L., Yin, X., & Liu, X. (2017). Disentangled Representation Learning GAN for Pose-Invariant Face Recognition. https://doi.org/10.48550/arXiv.1705.11136
Tian, Y., Peng, X., Zhao, L., Zhang, S., & Metaxas, D. N. (2018). CR-GAN: Learning Complete Representations for Multi-view Generation. http://arxiv.org/abs/1806.11191
Bhattacharjee, A., Banerjee, S., & Das, S. (2018). PosIX-GAN: Generating multiple poses using GAN for Pose-Invariant Face Recognition. https://doi.org/10.1007/978-3-030-11015-4_31
Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. http://arxiv.org/abs/1511.06434
Wu, K., Yu, Y., Zhang, X., Li, J., & Zhang, Q. (2020). Application of Face Data Augmentation Based on Rotate-and-Render-DCGAN in Campus Security. Proceedings of 2020 IEEE 3rd International Conference of Safe Production and Informatization, IICSPI 2020, 561–564. https://doi.org/10.1109/IICSPI51290.2020.9332396
Adedeji, O., Owoade, P., Ajayi, O., & Arowolo, O. (2022). Image Augmentation for Satellite Images. http://arxiv.org/abs/2207.14580
Bond-Taylor, S., Leach, A., Long, Y., & Willcocks, C. G. (2021). Deep Generative Modelling: A Comparative Review of VAEs, GANs, Normalizing Flows, Energy-Based and Autoregressive Models. https://doi.org/10.1109/TPAMI.2021.3116668
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., & Hochreiter, S. (2017). GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. http://arxiv.org/abs/1706.08500
N. Gourier, D. Hall, J. L. Crowley. (2004). Estimating Face Orientation from Tobust Detection of Salient Facial Features. Proceedings of Pointing 2004, ICPR, International Workshop on Visual Observation of Deictic Gestures, Cambridge, UK.
Gross, R., Matthews, I., Cohn, J., Kanade, T., & Baker, S. (2008). Multi-PIE. FG 2008: 8th IEEE Int’l Conference on Automatic Face and Gesture Recognition, September 17-19, 2008, Amsterdam, The Netherlands.
Queen Mary University of London. (2001). Head Pose Estimation.
Kingma, D. P., & Welling, M. (2014). Auto-Encoding Variational Bayes. arXiv preprint arXiv:1312.6114. http://arxiv.org/abs/1312.6114
Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10) (pp. 807-814). PMLR.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444.
Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the 30th International Conference on Machine Learning (ICML-13) (Vol. 28, No. 1, pp. 3-11). PMLR.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533-536.
Copyright (c) 2023 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;