As a result, directly deploying text-based NLP toxicity detection models could be problematic, and preventive measures are necessary. In real-world toxicity detection applications, toxicity filtering is mostly used in security-relevant industries like gaming platforms, where models are constantly being challenged by social engineering and adversarial attacks. Most recent methods employ transformer-based pre-trained language models and achieve high toxicity detection accuracy. To detect toxic language content, researchers have been developing deep learning-based natural language processing (NLP) approaches. Content moderation is crucial for promoting healthy online discussions and creating healthy online environments. However, this raises an additional concern about toxic speech, as well as cyber bullying, verbal harassment, or humiliation. With the growth and popularity of online social platforms, people can stay more connected than ever through tools like instant messaging.