![]() ![]() Most of the comment text length are within 500 characters, with some outliers up to 5,000 characters long. Number of comments in each category df_toxic = df.drop(, axis=1) counts = categories = list(df_) for i in categories: counts.append((i, df_toxic.sum())) df_stats = pd.DataFrame(counts, columns=) df_stats ( Disclaimer from the data source: the dataset contains text that may be considered profane, vulgar, or offensive.) Exploring %matplotlib inline import re import matplotlib import numpy as np import matplotlib.pyplot as plt import pandas as pd from sklearn.model_selection import train_test_split from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.naive_bayes import MultinomialNB from trics import accuracy_score from sklearn.multiclass import OneVsRestClassifier from rpus import stopwords stop_words = set(stopwords.words('english')) from sklearn.svm import LinearSVC from sklearn.linear_model import LogisticRegression from sklearn.pipeline import Pipeline import seaborn as sns df = pd.read_csv("train 2.csv", encoding = "ISO-8859-1") df.head() ![]() A toxic comment might be about any of toxic, severe toxic, obscene, threat, insult or identity hate at the same time or none of the above. We will be using supervised classifiers and text representations. In this post, we will build a multi-label model that’s capable of detecting different types of toxicity like severe toxic, threats, obscenity, insults, and so on. ![]() Researchers at Google are working on tools to study toxic comments online. Problem FormulationĪnyone who has been the target of abuse or harassment online will know that it doesn’t go away when you log off or switch off your phone. Multi-label text classification has many real world applications such as categorizing businesses on Yelp or classifying movies into one or more genre(s). This can be thought as predicting properties of a data-point that are not mutually exclusive, such as Tim Horton are often categorized as both bakery and coffee shop. On the other hand, Multi-label classification assigns to each sample a set of target labels. The classification makes the assumption that each sample is assigned to one and only one label. Multi-class classification means a classification task with more than two classes each label are mutually exclusive. ![]()
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |