Class Distribution — Toxic vs Non-Toxic Across Datasets
Jigsaw Civil Comments TweetEval 524K Jigsaw only 3.8K Twitter only 13,927 shared 2.6%
538,531
Jigsaw vocabulary
2.6%
Vocabulary overlap
17,726
Twitter vocabulary