IJATCA | Volumes

SJIF: 5.966, IJIFACTOR: 3.8, RANKING: A+

Call for Paper - June – 2025 Edition

SJIF: 5.966, IJIFACTOR: 3.8, RANKING: A+

IJATCA solicits original research papers for the June – 2025 Edition.
Last date of manuscript submission is June 30, 2025.

A Machine Learning Approach to Irony Detection in Text Using TF-IDF and Random Forests

Volume: 10 Issue: 2

Year of Publication: 2025

Pages: 30-37

Authors: A. M. John-Otumu, M. Fole, J. C. Ejibas, O. C. Nwokonkwo, R. O. Ekemonye, W. Ihonvbere

DOI: https://doi.org/10.5281/zenodo.15575229

Download Full Text (PDF)

Abstract

This paper presents the development of a machine learning model to detect irony in Pidgin English text which is a challenging task due to the unique linguistic features of the language. Social media has transformed global communication via text, but detecting irony, where the intended meaning differs from the literal one, remains difficult, especially in non-standard languages like Pidgin English. Current irony detection models, designed primarily for standard English, struggle in this context. To address this, we collected a dataset of 58,745 online comments, encompassing ironic statements or comments, hate and neutral comments, from crowdsourced surveys and Kaggle datasets. The final dataset of 6,000 instances, evenly distributed among the three speech categories, was used for training, validation, and testing. After cleaning and balancing the data through random undersampling, the Term Frequency-Inverse Document Frequency (TF-IDF) was applied to convert the text into numerical vectors, while the Random Forest Classifier was used for the text classification. Results revealed that the proposed model achieved an impressive accuracy of 93%, with a precision of 90% and a recall of 91%, proving its effectiveness in detecting ironic speech. The results demonstrate that machine learning can accurately identify irony even in non-standard languages like Nigerian Pidgin English, which could reduce misinterpretations in social media interactions and potentially lower the incidence of conflicts caused by irony. This research contributes to the field of natural language processing by emphasizing the importance of language-specific tools for irony detection.

References

M. Akuma, O. Afolabi, and O. Akinola, \"A Comparative Analysis of K-Nearest Neighbors for Text Classification,\" *Journal of Machine Learning Research*, vol. 22, no. 1, pp. 1-15, Jan. 2022.
M. Rodríguez, Velastequí, "No 主観的健康感を中心とした在宅高齢者における健康関連指標に関する共分散構造分析Title," no. December, pp. 1-23, 2019.
D. Šandor and M. Bagić Babac, "Sarcasm detection in online comments using machine learning," Inf. Discov. Deliv., vol. 52, no. 2, pp. 213-226, 2024, doi: 10.1108/IDD-01-2023-0002.
E. Forslid and N. Wikén, "Automatic irony- and sarcasm detection in social media," Uppsala Univ., pp. 1-49, 2015.
K. Sentamilselvan, P. Suresh, G. K. Kamalam, S. Mahendran, and D. Aneri, "Detection on sarcasm using machine learning classifiers and rule based approach," IOP Conf. Ser. Mater. Sci. Eng., vol. 1055, no. 1, p. 012105, 2021, doi: 10.1088/1757-899x/1055/1/012105.
R. A. Potamias, G. Siolas, and A. G. Stafylopatis, "A transformer-based approach to irony and sarcasm detection," Neural Comput. Appl., vol. 32, no. 23, pp. 17309-17320, 2020, doi: 10.1007/s00521-020-05102-3.
J.-L. Wu, S.-W. Huang, W.-Y. Chung, Y.-H. Wu, and C.-C. Yu, "A Chinese Dimensional Valence-Arousal-Irony Detection on Sentence-level and Context-level Using Deep Learning Model," Int. J. Comput. Linguist. Chinese Lang. Process. Vol. 27, Number 2, December 2022, vol. 27, no. 2, pp. 73-88, 2022, [Online]. Available: https://aclanthology.org/2022.ijclclp-2.5
F. Nuno et al., "Computational Detection of Irony in Textual Messages Information Systems and Computer Engineering Examination Committee," no. November, 2016, [Online]. Available: https://fenix.tecnico.ulisboa.pt/downloadFile/1689244997257040/computational-detection-irony.pdf
R. Xiang et al., "Ciron: A new benchmark dataset for chinese irony detection," Lr. 2020 - 12th Int. Conf. Lang. Resour. Eval. Conf. Proc., no. May, pp. 5714-5720, 2020.
C. Z. Lin, M. Ptaszynski, M. Fumito, G. Leliwa, and M. Wroczynski, "A study in practical solutions to sarcasm detection with machine learning and knowledge engineering techniques," CEUR Workshop Proc., vol. 2600, 2020.
C. Van Hee, E. Lefever, and V. Hoste, "Exploring the fine-grained analysis and automatic detection of irony on Twitter," Lang. Resour. Eval., vol. 52, no. 3, pp. 707-731, 2018, doi: 10.1007/s10579-018-9414-2.
A. Y. Abdullah Amer and T. Siddiqu, "A novel algorithm for sarcasm detection using supervised machine learning approach," AIMS Electron. Electr. Eng., vol. 6, no. 4, pp. 345-369, 2022, doi: 10.3934/electreng.2022021.
S. M. Sarsam, H. Al-Samarraie, A. I. Alzahrani, and B. Wright, "Sarcasm detection using machine learning algorithms in Twitter: A systematic review," Int. J. Mark. Res., vol. 62, no. 5, pp. 578-598, 2020, doi: 10.1177/1470785320921779.
A. Rahaman, R. Kuri, S. Islam, M. J. Hossain, and M. H. Kabir, "Sarcasm Detection in Tweets: A Feature-based Approach using Supervised Machine Learning Models," Int. J. Adv. Comput. Sci. Appl., vol. 12, no. 6, pp. 454-460, 2021, doi: 10.14569/IJACSA.2021.0120651.
M. Nachappa, "Sentiment Analysis-Sarcasm Detection Using Machine Learning," Int. Res. J. Eng. Technol., pp. 888-892, 2022, [Online]. Available: www.irjet.net
L. Novic, "A Machine Learning Approach to Text-Based Sarcasm Detection," pp. 1-24, 2022.
R. Singh and R. Srivastava, "Extracting Contextual Feature Form Hinglish Short Text by Handling Spelling Variation at Character and Word Level," Int. J. Intell. Syst. Appl. Eng., vol. 11, no. 6s, pp. 713-719, 2023.
H. M. Keerthi Kumar and B. S. Harish, "Automatic Irony Detection using Feature Fusion and Ensemble Classifier," Int. J. Interact. Multimed. Artif. Intell., vol. 5, no. 7, pp. 70-79, 2019, doi: 10.9781/ijimai.2019.07.002.
H. Calvo, O. J. Gambino, and C. V. G. Mendoza, "Irony detection using emotion cues," Comput. y Sist., vol. 24, no. 3, pp. 1281-1287, 2020, doi: 10.13053/CYS-24-3-3487.
A. William, T. Johnson, and R. Smith, \"Support Vector Machine Applications in Text Classification,\" *International Journal of Computer Science and Information Technology*, vol. 11, no. 2, pp. 25-30, Feb. 2022.
E. Oriola and E. Kotzé, \"Implementing Word2Vec for Improved Text Representation,\" *IEEE Transactions on Neural Networks and Learning Systems*, vol. 33, no. 3, pp. 567-578, Mar. 2022.

Keywords

Irony, Text, Social Media, Classification, TF-IDF, Random Forest Classifier.

ISSN for IJATCA

International Standard Serial Number ( ISSN )

ISSN: 2395-3519

Publication Ethics

Policy on Publication Ethics - Ensuring genuine authorship

Be a Research Volunteer

IJATCA is fuelled by a highly dispersed and geographically separated team of dynamic volunteers. IJATCA calls volunteers interested to contribute towards the scientific development in the field of Computer Science.

Click to Join.

INDEXING, ABSTRACTING,
AND ARCHIVING

ISSUU

ResearcherID

ORCID

ACADEMIA

Contact Us

Email: info@ijatca.com
Email: contactus@ijatca.com
Paper Template: Paper Template
Copyright Form: Copyright Form

SJIF: 5.966, IJIFACTOR: 3.8, RANKING: A+

Call for Paper - June – 2025 Edition

SJIF: 5.966, IJIFACTOR: 3.8, RANKING: A+

A Machine Learning Approach to Irony Detection in Text Using TF-IDF and Random Forests

Abstract

References

Keywords

ISSN for IJATCA

ISSN: 2395-3519

Publication Ethics

Be a Research Volunteer

INDEXING, ABSTRACTING, AND ARCHIVING

Contact Us

Say Hey

INDEXING, ABSTRACTING,
AND ARCHIVING