Deep Learning Techniques for Image and Speech Recognition: Current Trends and Future Directions

Authors

  • Soleman Soleman Universitas Borobudur

DOI:

https://doi.org/10.37296/esci.v5i1.212

Keywords:

Deep Learning, Image Recognition, Transformer Model

Abstract

This study examines the latest developments and future directions of deep learning techniques in image and sound recognition. The study focuses on the analysis of various neural network architectures such as Convolutional Neural Networks (CNNs) for image processing and Recurrent Neural Networks (RNNs) for speech recognition. The methodology used includes a comprehensive literature study of the latest studies, evaluation of the performance of various models, and comparative analysis of existing techniques. The results showed a significant improvement in recognition accuracy, with CNNs achieving up to 98% accuracy for image classification and transformer-based models outperforming traditional RNNs in speech recognition. The challenges identified include high computational requirements, reliance on quality datasets, and model interpretability issues. The study also proposes several future development directions, including the integration of attention mechanisms, hybrid architectures, and more efficient learning techniques. In conclusion, despite the rapid progress, there is still significant room for innovation in improving the efficiency and reliability of deep learning-based image and voice recognition systems

References

Acharya, U. R., Oh, S. L., Hagiwara, Y., Tan, J. H., Adam, M., Gertych, A., & Tan, R. S. (2017). A deep convolutional neural network model to classify heartbeats. Computers in Biology and Medicine, 89, 389–396. https://doi.org/10.1016/j.compbiomed.2017.08.022

Akkus, Z., Galimzianova, A., Hoogi, A., Rubin, D. L., & Erickson, B. J. (2017). Deep Learning for Brain MRI Segmentation: State of the Art and Future Directions. Journal of Digital Imaging, 30(4), 449–459. https://doi.org/10.1007/s10278-017-9983-4

Al-Fraihat, D., Sharrab, Y., Alzyoud, F., Qahmash, A., Tarawneh, M., & Maaita, A. (2024). Speech Recognition Utilizing Deep Learning: A Systematic Review of the Latest Developments. Human-Centric Computing and Information Sciences, 14(March). https://doi.org/10.22967/HCIS.2024.14.015

Alzubaidi, L., Zhang, J., Humaidi, A. J., Al-Dujaili, A., Duan, Y., Al-Shamma, O., Santamaría, J., Fadhel, M. A., Al-Amidie, M., & Farhan, L. (2021). Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. In Journal of Big Data (Vol. 8, Issue 1). Springer International Publishing. https://doi.org/10.1186/s40537-021-00444-8

Bhangale, K. B., & Kothandaraman, M. (2022). Survey of Deep Learning Paradigms for Speech Processing. In Wireless Personal Communications (Vol. 125, Issue 2). https://doi.org/10.1007/s11277-022-09640-y

Cummins, N., Baird, A., & Schuller, B. W. (2018). Speech analysis for health: Current state-of-the-art and the increasing impact of deep learning. Methods, 151, 41–54. https://doi.org/10.1016/j.ymeth.2018.07.007

Dargan, S., Kumar, M., Ayyagari, M. R., & Kumar, G. (2020). A Survey of Deep Learning and Its Applications: A New Paradigm to Machine Learning. Archives of Computational Methods in Engineering, 27(4), 1071–1092. https://doi.org/10.1007/s11831-019-09344-w

Delić, V., Perić, Z., Sečujski, M., Jakovljević, N., Nikolić, J., Mišković, D., Simić, N., Suzić, S., & Delić, T. (2019). Speech technology progress based on new machine learning paradigm. Computational Intelligence and Neuroscience, 2019. https://doi.org/10.1155/2019/4368036

Deng, L., & Li, X. (2013). Machine learning paradigms for speech recognition: An overview. IEEE Transactions on Audio, Speech and Language Processing, 21(5), 1060–1089. https://doi.org/10.1109/TASL.2013.2244083

Fu, S. W., Tsao, Y., & Lu, X. (2016). SNR-aware convolutional neural network modeling for speech enhancement. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 08-12-September-2016, 3768–3772. https://doi.org/10.21437/Interspeech.2016-211

Jauro, F., Chiroma, H., Gital, A. Y., Almutairi, M., Abdulhamid, S. M., & Abawajy, J. H. (2020). Deep learning architectures in emerging cloud computing architectures: Recent development, challenges and next research trend. Applied Soft Computing Journal, 96, 1–91. https://doi.org/10.1016/j.asoc.2020.106582

Kattenborn, T., Leitloff, J., Schiefer, F., & Hinz, S. (2021). Review on Convolutional Neural Networks (CNN) in vegetation remote sensing. ISPRS Journal of Photogrammetry and Remote Sensing, 173(November 2020), 24–49. https://doi.org/10.1016/j.isprsjprs.2020.12.010

Khalil, R. A., Jones, E., Babar, M. I., Jan, T., Zafar, M. H., & Alhussain, T. (2019). Speech Emotion Recognition Using Deep Learning Techniques: A Review. IEEE Access, 7, 117327–117345. https://doi.org/10.1109/ACCESS.2019.2936124

Khanam, F., Munmun, F. A., Ritu, N. A., Saha, A. K., & Mridha, M. F. (2022). Text to Speech Synthesis: A Systematic Review, Deep Learning Based Architecture and Future Research Direction. Journal of Advances in Information Technology, 13(5), 398–412. https://doi.org/10.12720/jait.13.5.398-412

Kwon, Y. H., Shin, S. B., & Kim, S. D. (2018). Electroencephalography based fusion two-dimensional (2D)-convolution neural networks (CNN) model for emotion recognition system. Sensors (Switzerland), 18(5). https://doi.org/10.3390/s18051383

Lionakis, E., Karampidis, K., & Papadourakis, G. (2023). Current Trends, Challenges, and Future Research Directions of Hybrid and Deep Learning Techniques for Motor Imagery Brain–Computer Interface. Multimodal Technologies and Interaction, 7(10). https://doi.org/10.3390/mti7100095

Mehrish, A., Majumder, N., Bharadwaj, R., Mihalcea, R., & Poria, S. (2023). A review of deep learning techniques for speech processing. In Information Fusion (Vol. 99). https://doi.org/10.1016/j.inffus.2023.101869

Praveen Chakravarthy, S., Gunasundari, C., Selva Bhuvaneswari, K., Sharma, B., & Chowdhury, S. (2022). Convolutional Neural Network (CNN) for Image Detection and Recognition in Medical Diagnosis. IET Conference Proceedings, 2022(26), 357–361. https://doi.org/10.1049/icp.2023.0579

Sarker, I. H. (2021). Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions. SN Computer Science, 2(6), 1–20. https://doi.org/10.1007/s42979-021-00815-1

Sekaran, K., Chandana, P., Krishna, N. M., & Kadry, S. (2020). Deep learning convolutional neural network (CNN) With Gaussian mixture model for predicting pancreatic cancer. Multimedia Tools and Applications, 79(15–16), 10233–10247. https://doi.org/10.1007/s11042-019-7419-5

Taye, M. M. (2023). Understanding of Machine Learning with Deep Learning : Computers MDPI, 12(91), 1–26.

Wu, J. (2017). Introduction to Convolutional Neural Networks. Introduction to Convolutional Neural Networks, 1–31. https://web.archive.org/web/20180928011532/https://cs.nju.edu.cn/wujx/teaching/15_CNN.pdf

Xiao, Y., Xing, C., Zhang, T., & Zhao, Z. (2019). An Intrusion Detection Model Based on Feature Reduction and Convolutional Neural Networks. IEEE Access, 7, 42210–42219. https://doi.org/10.1109/ACCESS.2019.2904620

Zhang, Q., Zhang, M., Chen, T., Sun, Z., Ma, Y., & Yu, B. (2019). Recent advances in convolutional neural network acceleration. Neurocomputing, 323, 37–51. https://doi.org/10.1016/j.neucom.2018.09.038

Zhang, Z., Geiger, J., Pohjalainen, J., Mousa, A. E. D., Jin, W., & Schuller, B. (2018). Deep learning for environmentally robust speech recognition: An overview of recent developments. ACM Transactions on Intelligent Systems and Technology, 9(5), 1–16. https://doi.org/10.1145/3178115

Zhu, X., & Bain, M. (2017). B-CNN: Branch Convolutional Neural Network for Hierarchical Classification. http://arxiv.org/abs/1709.09890

Downloads

Published

2024-11-30

How to Cite

Soleman, S. (2024). Deep Learning Techniques for Image and Speech Recognition: Current Trends and Future Directions. EScience Humanity Journal, 5(1), 277-288. https://doi.org/10.37296/esci.v5i1.212

Issue

Section

eScience Humanity Journal Volume 5 Number 1 November 2024