Lab Journal: Machine Learning on tweets related to the Roseanne-ABC firing incident

This is my lab journal for the analysis of a data set composed from tweets related to the 2018 firing of Rosanne from ABC over a racist statement.

A summary of these results with methodological comments can be found here.

1) Preparation/Parsing the data set

2) Running the data-set as-is (“vanilla”)

3) Merged data set (Pro, Anti, UncNeut)

4) Meta classifications – Voting and Stacking on the vanilla and merged data sets

5) Introduction of Penalties via CostSensitiveClassifier

Preparation
As provided, several things had to be done to the data set. Firstly, every string needed quotations wrapped around it. Secondly, all original quotations needed to be removed or changed to single quotes, otherwise Weka would not be able to correctly parse the strings. This was straight-forward. Lastly, String2WordVector was performed. Two data sets were generated: the original four-class set, and a merged three-class set.

All computations were performed on the following:
2-core 2.50 GHz Intel i7 (Skylake), 16gb RAM, Intel HD Graphics 520, disabled Intel Management Engine and Spectre/Meltdown vulnerabilities patched (which comes at the expense of hyper-threading). Software was the latest version of the Java VM running on a custom fork of Debian-Experimental.

——————————————————
First run (“Vanilla”):

Tractability and Accuracy results

ZeroR – very fast but as the accuracy results show, this was wholly without merit. By classifying *everything* as Unclear/Unrelated, it achieved a trivial albeit pointless 37.9333 % Correctly Classified Instances.

=== Summary ===

Correctly Classified Instances 2276 37.9333 %
Incorrectly Classified Instances 3724 62.0667 %
Kappa statistic 0
Mean absolute error 0.3435
Root mean squared error 0.4144
Relative absolute error 100 %
Root relative squared error 100 %
Total Number of Instances 6000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
1.000 1.000 0.379 1.000 0.550 ? 0.499 0.379 Unclear/Unrelated
0.000 0.000 ? 0.000 ? ? 0.500 0.367 Anti-Roseanne
0.000 0.000 ? 0.000 ? ? 0.499 0.162 Pro-Roseanne
0.000 0.000 ? 0.000 ? ? 0.500 0.092 Neutral
Weighted Avg. 0.379 0.379 ? 0.379 ? ? 0.499 0.313

=== Confusion Matrix ===

a b c d <– classified as
2276 0 0 0 | a = Unclear/Unrelated
2200 0 0 0 | b = Anti-Roseanne
974 0 0 0 | c = Pro-Roseanne
550 0 0 0 | d = Neutral

NaiveBayes

Took a little bit of time to classify, but the ROC AUC looks good.

=== Summary ===

Correctly Classified Instances 3831 63.85 %
Incorrectly Classified Instances 2169 36.15 %
Kappa statistic 0.4781
Mean absolute error 0.1905
Root mean squared error 0.3808
Relative absolute error 55.4478 %
Root relative squared error 91.8898 %
Total Number of Instances 6000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.680 0.177 0.702 0.680 0.691 0.507 0.828 0.735 Unclear/Unrelated
0.612 0.202 0.637 0.612 0.625 0.414 0.783 0.681 Anti-Roseanne
0.480 0.128 0.421 0.480 0.449 0.335 0.777 0.447 Pro-Roseanne
0.851 0.019 0.821 0.851 0.836 0.819 0.968 0.857 Neutral
Weighted Avg. 0.639 0.163 0.644 0.639 0.641 0.474 0.816 0.680

=== Confusion Matrix ===

a b c d <– classified as
1548 445 272 11 | a = Unclear/Unrelated
422 1347 356 75 | b = Anti-Roseanne
207 283 468 16 | c = Pro-Roseanne
29 38 15 468 | d = Neutral

lazy.IBK
Overall quick (three minutes), but not great in terms of accuracy.

=== Summary ===

Correctly Classified Instances 3392 56.5333 %
Incorrectly Classified Instances 2608 43.4667 %
Kappa statistic 0.3359
Mean absolute error 0.2209
Root mean squared error 0.4386
Relative absolute error 64.3207 %
Root relative squared error 105.8332 %
Total Number of Instances 6000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.941 0.579 0.498 0.941 0.652 0.390 0.707 0.517 Unclear/Unrelated
0.311 0.076 0.704 0.311 0.431 0.308 0.646 0.560 Anti-Roseanne
0.188 0.018 0.670 0.188 0.294 0.301 0.632 0.354 Pro-Roseanne
0.696 0.014 0.836 0.696 0.760 0.742 0.903 0.672 Neutral
Weighted Avg. 0.565 0.251 0.633 0.565 0.523 0.378 0.691 0.520

=== Confusion Matrix ===

a b c d <– classified as
2142 105 24 5 | a = Unclear/Unrelated
1390 684 62 64 | b = Anti-Roseanne
650 135 183 6 | c = Pro-Roseanne
115 48 4 383 | d = Neutral

RandomTree
Took approx. 1 minute, but the results are unremarkable, and had lower CCI and ROC AUC than Naivebayes

=== Summary ===

Correctly Classified Instances 3575 59.5833 %
Incorrectly Classified Instances 2425 40.4167 %
Kappa statistic 0.4068
Mean absolute error 0.208
Root mean squared error 0.4474
Relative absolute error 60.5695 %
Root relative squared error 107.9566 %
Total Number of Instances 6000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.720 0.235 0.652 0.720 0.684 0.477 0.718 0.519 Unclear/Unrelated
0.562 0.229 0.587 0.562 0.575 0.337 0.636 0.498 Anti-Roseanne
0.333 0.104 0.382 0.333 0.355 0.241 0.605 0.241 Pro-Roseanne
0.682 0.029 0.705 0.682 0.693 0.663 0.827 0.510 Neutral
Weighted Avg. 0.596 0.192 0.589 0.596 0.592 0.405 0.680 0.465

=== Confusion Matrix ===

a b c d <– classified as
1639 416 188 33 | a = Unclear/Unrelated
548 1237 318 97 | b = Anti-Roseanne
267 356 324 27 | c = Pro-Roseanne
59 97 19 375 | d = Neutral

OneR
This is better than ZeroR but note that it has a similar problem — it threw everything into Pro- or Anti-, and misclassified a lot of Anti- as Pro-; in general it was heavily biased towards classifying things as Unclear/Unrelated.

=== Summary ===

Correctly Classified Instances 2580 43 %
Incorrectly Classified Instances 3420 57 %
Kappa statistic 0.0847
Mean absolute error 0.285
Root mean squared error 0.5339
Relative absolute error 82.975 %
Root relative squared error 128.826 %
Total Number of Instances 6000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.907 0.794 0.411 0.907 0.566 0.149 0.557 0.408 Unclear/Unrelated
0.235 0.122 0.526 0.235 0.324 0.146 0.556 0.404 Anti-Roseanne
0.000 0.000 ? 0.000 ? ? 0.500 0.162 Pro-Roseanne
0.000 0.000 ? 0.000 ? ? 0.500 0.092 Neutral
Weighted Avg. 0.430 0.346 ? 0.430 ? ? 0.542 0.338

=== Confusion Matrix ===

a b c d <– classified as
2064 212 0 0 | a = Unclear/Unrelated
1684 516 0 0 | b = Anti-Roseanne
745 229 0 0 | c = Pro-Roseanne
526 24 0 0 | d = Neutral

SimpleLogistic
Despite the name, the complexity made this very slow. It took approx 20 minutes to generate the model, followed by another hour to cross-validate (and nearly 75% of the laptop’s reported battery life from a full charge);

As the results show, this yielded more accurate classification within reasonable timeframe (but not rapid by any means). However, the weighted average ROC AUC of 0.885 was only slightly higher than that of NaiveBayes (0.816).

=== Summary ===

Correctly Classified Instances 4419 73.65 %
Incorrectly Classified Instances 1581 26.35 %
Kappa statistic 0.6108
Mean absolute error 0.1863
Root mean squared error 0.3065
Relative absolute error 54.2321 %
Root relative squared error 73.9571 %
Total Number of Instances 6000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.840 0.189 0.731 0.840 0.782 0.637 0.897 0.824 Unclear/Unrelated
0.706 0.150 0.732 0.706 0.719 0.561 0.860 0.793 Anti-Roseanne
0.508 0.046 0.683 0.508 0.583 0.523 0.862 0.646 Pro-Roseanne
0.833 0.014 0.854 0.833 0.843 0.828 0.972 0.871 Neutral
Weighted Avg. 0.737 0.135 0.735 0.737 0.732 0.608 0.885 0.788

=== Confusion Matrix ===

a b c d <– classified as
1912 266 84 14 | a = Unclear/Unrelated
453 1554 139 54 | b = Anti-Roseanne
204 265 495 10 | c = Pro-Roseanne
46 39 7 458 | d = Neutral

DecisionTable
This is an NP-Complete problem so it may never finish; I did not finish making the model so I cannot comment on the time it takes to classify.

MultilayerPerceptron
I set this up and went to sleep (so it would have approx 7 hours to run and/or crash my computer; it did not finish making the model (let alone move on to cross-validation) by the time I woke up); due to time and geographic constraints, I will not have the opportunity to run this on a faster computer but out of curiosity I might try it later.

RandomForest
This took a long time to classify so I abandoned it.

——————————————–

——————
Merged Unclear/Unrelated and Neutral into UncNeut

ZeroR
This yielded a higher CCI than vanilla ZeroR; however, this is wholly vacuous as I have reduced the class count by one, so the same issue applies.
=== Summary ===

Correctly Classified Instances 2826 47.1 %
Incorrectly Classified Instances 3174 52.9 %
Kappa statistic 0
Mean absolute error 0.4116
Root mean squared error 0.4536
Relative absolute error 100 %
Root relative squared error 100 %
Total Number of Instances 6000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.000 0.000 ? 0.000 ? ? 0.500 0.367 Anti-Roseanne
0.000 0.000 ? 0.000 ? ? 0.499 0.162 Pro-Roseanne
1.000 1.000 0.471 1.000 0.640 ? 0.499 0.471 UncNeut
Weighted Avg. 0.471 0.471 ? 0.471 ? ? 0.499 0.382

=== Confusion Matrix ===

a b c <– classified as
0 0 2200 | a = Anti-Roseanne
0 0 974 | b = Pro-Roseanne
0 0 2826 | c = UncNeut

NaiveBayes
This was slightly more accurate than vanilla (63.85% CCI, (0.816 avg ROC AUC)

=== Summary ===

Correctly Classified Instances 3838 63.9667 %
Incorrectly Classified Instances 2162 36.0333 %
Kappa statistic 0.424
Mean absolute error 0.2551
Root mean squared error 0.4374
Relative absolute error 61.9739 %
Root relative squared error 96.4239 %
Total Number of Instances 6000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.622 0.214 0.627 0.622 0.625 0.409 0.789 0.675 Anti-Roseanne
0.494 0.131 0.422 0.494 0.455 0.341 0.779 0.446 Pro-Roseanne
0.703 0.217 0.742 0.703 0.722 0.488 0.817 0.807 UncNeut
Weighted Avg. 0.640 0.202 0.648 0.640 0.643 0.435 0.800 0.700

=== Confusion Matrix ===

a b c <– classified as
1369 351 480 | a = Anti-Roseanne
283 481 210 | b = Pro-Roseanne
530 308 1988 | c = UncNeut

IBK
=== Summary ===

Correctly Classified Instances 3575 59.5833 %
Incorrectly Classified Instances 2425 40.4167 %
Kappa statistic 0.2804
Mean absolute error 0.2833
Root mean squared error 0.4972
Relative absolute error 68.842 %
Root relative squared error 109.6102 %
Total Number of Instances 6000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.321 0.066 0.739 0.321 0.448 0.336 0.635 0.568 Anti-Roseanne
0.205 0.022 0.643 0.205 0.311 0.305 0.634 0.352 Pro-Roseanne
0.944 0.650 0.564 0.944 0.706 0.359 0.670 0.589 UncNeut
Weighted Avg. 0.596 0.334 0.641 0.596 0.547 0.342 0.651 0.543

=== Confusion Matrix ===

a b c <– classified as
707 76 1417 | a = Anti-Roseanne
127 200 647 | b = Pro-Roseanne
123 35 2668 | c = UncNeut

RandomTree
This had slightly better CCI % but lower weighted average ROC AUC.

=== Summary ===

Correctly Classified Instances 3763 62.7167 %
Incorrectly Classified Instances 2237 37.2833 %
Kappa statistic 0.3858
Mean absolute error 0.2566
Root mean squared error 0.4954
Relative absolute error 62.3435 %
Root relative squared error 109.1988 %
Total Number of Instances 6000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.556 0.200 0.617 0.556 0.585 0.365 0.647 0.515 Anti-Roseanne
0.358 0.099 0.412 0.358 0.383 0.275 0.620 0.258 Pro-Roseanne
0.775 0.309 0.691 0.775 0.731 0.467 0.706 0.598 UncNeut
Weighted Avg. 0.627 0.235 0.619 0.627 0.621 0.398 0.670 0.512

=== Confusion Matrix ===

a b c <– classified as
1223 294 683 | a = Anti-Roseanne
328 349 297 | b = Pro-Roseanne
431 204 2191 | c = UncNeut

OneR
This performed much better in terms of CCI (vanilla was 43%) but had only a slightly better ROC AUC; furthermore, it did not classify anything as Pro-Rosanne at all.

=== Summary ===

Correctly Classified Instances 3179 52.9833 %
Incorrectly Classified Instances 2821 47.0167 %
Kappa statistic 0.138
Mean absolute error 0.3134
Root mean squared error 0.5599
Relative absolute error 76.1539 %
Root relative squared error 123.4158 %
Total Number of Instances 6000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.246 0.107 0.571 0.246 0.344 0.184 0.570 0.417 Anti-Roseanne
0.000 0.000 ? 0.000 ? ? 0.500 0.162 Pro-Roseanne
0.933 0.761 0.522 0.933 0.670 0.236 0.586 0.519 UncNeut
Weighted Avg. 0.530 0.398 ? 0.530 ? ? 0.566 0.424

=== Confusion Matrix ===

a b c <– classified as
541 0 1659 | a = Anti-Roseanne
218 0 756 | b = Pro-Roseanne
188 0 2638 | c = UncNeut

SimpleLogistic
This took less time (approx 35 minutes) to run than the vanilla arff (one less class to work with). The weighted averaged ROC AUC is actually (slightly) lower, and the CCI % is the same.

=== Summary ===

Correctly Classified Instances 4419 73.65 %
Incorrectly Classified Instances 1581 26.35 %
Kappa statistic 0.5609
Mean absolute error 0.2446
Root mean squared error 0.3502
Relative absolute error 59.4381 %
Root relative squared error 77.196 %
Total Number of Instances 6000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.686 0.146 0.731 0.686 0.708 0.548 0.859 0.792 Anti-Roseanne
0.488 0.041 0.695 0.488 0.573 0.518 0.867 0.648 Pro-Roseanne
0.862 0.258 0.749 0.862 0.801 0.605 0.886 0.862 UncNeut
Weighted Avg. 0.737 0.182 0.734 0.737 0.730 0.570 0.873 0.802

=== Confusion Matrix ===

a b c <– classified as
1509 122 569 | a = Anti-Roseanne
250 475 249 | b = Pro-Roseanne
305 86 2435 | c = UncNeut

——————————————–
(Meta Classifiers: Stacking, Vote):

Stack 1: ZeroR, NaiveBayes on the “Vanilla” set
This took about 20 minutes to run, but then turned to be quite a let-down, as it resulted in classification on-par with ZeroR
=== Summary ===

Correctly Classified Instances 2276 37.9333 %
Incorrectly Classified Instances 3724 62.0667 %
Kappa statistic 0
Mean absolute error 0.3435
Root mean squared error 0.4144
Relative absolute error 100 %
Root relative squared error 100 %
Total Number of Instances 6000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
1.000 1.000 0.379 1.000 0.550 ? 0.499 0.379 Unclear/Unrelated
0.000 0.000 ? 0.000 ? ? 0.500 0.367 Anti-Roseanne
0.000 0.000 ? 0.000 ? ? 0.499 0.162 Pro-Roseanne
0.000 0.000 ? 0.000 ? ? 0.500 0.092 Neutral
Weighted Avg. 0.379 0.379 ? 0.379 ? ? 0.499 0.313

=== Confusion Matrix ===

a b c d <– classified as
2276 0 0 0 | a = Unclear/Unrelated
2200 0 0 0 | b = Anti-Roseanne
974 0 0 0 | c = Pro-Roseanne
550 0 0 0 | d = Neutral

Stack 2: NaiveBayes RandomTree on the “Merged” set
=== Summary ===

Correctly Classified Instances 3719 61.9833 %
Incorrectly Classified Instances 2281 38.0167 %
Kappa statistic 0.3728
Mean absolute error 0.2942
Root mean squared error 0.4523
Relative absolute error 71.4835 %
Root relative squared error 99.7147 %
Total Number of Instances 6000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.538 0.201 0.607 0.538 0.570 0.346 0.682 0.536 Anti-Roseanne
0.351 0.099 0.407 0.351 0.377 0.268 0.668 0.297 Pro-Roseanne
0.776 0.321 0.683 0.776 0.727 0.456 0.720 0.637 UncNeut
Weighted Avg. 0.620 0.241 0.610 0.620 0.613 0.385 0.698 0.545

=== Confusion Matrix ===

a b c <– classified as
1183 316 701 | a = Anti-Roseanne
315 342 317 | b = Pro-Roseanne
450 182 2194 | c = UncNeut

Stack 3: NaiveBayes RandomTree on the “Vanilla” set
Took approx 20 minutes to classify
=== Summary ===

Correctly Classified Instances 3822 63.7 %
Incorrectly Classified Instances 2178 36.3 %
Kappa statistic 0.4751
Mean absolute error 0.1912
Root mean squared error 0.3795
Relative absolute error 55.6555 %
Root relative squared error 91.5838 %
Total Number of Instances 6000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.747 0.238 0.657 0.747 0.699 0.499 0.839 0.795 Unclear/Unrelated
0.522 0.154 0.662 0.522 0.584 0.391 0.788 0.666 Anti-Roseanne
0.520 0.120 0.456 0.520 0.486 0.379 0.795 0.444 Pro-Roseanne
0.847 0.018 0.825 0.847 0.836 0.819 0.947 0.804 Neutral
Weighted Avg. 0.637 0.168 0.642 0.637 0.635 0.469 0.823 0.691

=== Confusion Matrix ===

a b c d <– classified as
1701 325 240 10 | a = Unclear/Unrelated
625 1149 352 74 | b = Anti-Roseanne
233 220 506 15 | c = Pro-Roseanne
30 42 12 466 | d = Neutral

Vote 1 (ZeroR, NaiveBayes, RandomTree on “Vanilla” data set)

=== Summary ===

Correctly Classified Instances 3725 62.0833 %
Incorrectly Classified Instances 2275 37.9167 %
Kappa statistic 0.434
Mean absolute error 0.2422
Root mean squared error 0.3414
Relative absolute error 70.5037 %
Root relative squared error 82.3736 %
Total Number of Instances 6000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.754 0.258 0.641 0.754 0.693 0.485 0.858 0.817 Unclear/Unrelated
0.616 0.233 0.605 0.616 0.610 0.381 0.805 0.704 Anti-Roseanne
0.289 0.068 0.452 0.289 0.352 0.267 0.823 0.460 Pro-Roseanne
0.676 0.016 0.807 0.676 0.736 0.715 0.967 0.829 Neutral
Weighted Avg. 0.621 0.196 0.612 0.621 0.611 0.433 0.843 0.718

=== Confusion Matrix ===

a b c d <– classified as
1716 423 128 9 | a = Unclear/Unrelated
588 1356 195 61 | b = Anti-Roseanne
308 366 281 19 | c = Pro-Roseanne
63 98 17 372 | d = Neutral

Vote 2 (ZeroR, NaiveBayes, RandomTree on “Merged” data set)

Immediately note a 64.85% CCI, compared to other algorithms; the w-avg ROC AUC is 0.833

=== Summary ===

Correctly Classified Instances 3843 64.05 %
Incorrectly Classified Instances 2157 35.95 %
Kappa statistic 0.3952
Mean absolute error 0.3003
Root mean squared error 0.383
Relative absolute error 72.9533 %
Root relative squared error 84.4275 %
Total Number of Instances 6000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.563 0.182 0.642 0.563 0.600 0.393 0.817 0.720 Anti-Roseanne
0.299 0.065 0.469 0.299 0.365 0.283 0.830 0.489 Pro-Roseanne
0.818 0.358 0.670 0.818 0.737 0.465 0.846 0.845 UncNeut
Weighted Avg. 0.641 0.246 0.627 0.641 0.626 0.409 0.833 0.741

=== Confusion Matrix ===

a b c <– classified as
1239 181 780 | a = Anti-Roseanne
326 291 357 | b = Pro-Roseanne
365 148 2313 | c = UncNeut

——————

CostSensitiveClassifier (“Penalties”) on Vanilla/Merged set):

For this, I relied on the CostSensitiveClassifer using penalties (0 for the main diagonal, 5.0 for all other values) for ZeroR, OneR, NaiveBayes, RandomTree, SimpleLogistic, I tried to enhance accuracy.

For the Vanilla model:

ZeroR – penalties on the left-most column of the cost matrix.
=== Summary ===

Correctly Classified Instances 2276 37.9333 %
Incorrectly Classified Instances 3724 62.0667 %
Kappa statistic 0
Mean absolute error 0.3435
Root mean squared error 0.4144
Relative absolute error 100 %
Root relative squared error 100 %
Total Number of Instances 6000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
1.000 1.000 0.379 1.000 0.550 ? 0.499 0.379 Unclear/Unrelated
0.000 0.000 ? 0.000 ? ? 0.500 0.367 Anti-Roseanne
0.000 0.000 ? 0.000 ? ? 0.499 0.162 Pro-Roseanne
0.000 0.000 ? 0.000 ? ? 0.500 0.092 Neutral
Weighted Avg. 0.379 0.379 ? 0.379 ? ? 0.499 0.313

=== Confusion Matrix ===

a b c d <– classified as
2276 0 0 0 | a = Unclear/Unrelated
2200 0 0 0 | b = Anti-Roseanne
974 0 0 0 | c = Pro-Roseanne
550 0 0 0 | d = Neutral

NaiveBayes
=== Summary ===

Correctly Classified Instances 3831 63.85 %
Incorrectly Classified Instances 2169 36.15 %
Kappa statistic 0.4781
Mean absolute error 0.1905
Root mean squared error 0.3808
Relative absolute error 55.4478 %
Root relative squared error 91.8898 %
Total Number of Instances 6000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.680 0.177 0.702 0.680 0.691 0.507 0.828 0.735 Unclear/Unrelated
0.612 0.202 0.637 0.612 0.625 0.414 0.783 0.681 Anti-Roseanne
0.480 0.128 0.421 0.480 0.449 0.335 0.777 0.447 Pro-Roseanne
0.851 0.019 0.821 0.851 0.836 0.819 0.968 0.857 Neutral
Weighted Avg. 0.639 0.163 0.644 0.639 0.641 0.474 0.816 0.680

=== Confusion Matrix ===

a b c d <– classified as
1548 445 272 11 | a = Unclear/Unrelated
422 1347 356 75 | b = Anti-Roseanne
207 283 468 16 | c = Pro-Roseanne
29 38 15 468 | d = Neutral

RandomTree
=== Summary ===

Correctly Classified Instances 3575 59.5833 %
Incorrectly Classified Instances 2425 40.4167 %
Kappa statistic 0.4068
Mean absolute error 0.208
Root mean squared error 0.4474
Relative absolute error 60.5695 %
Root relative squared error 107.9566 %
Total Number of Instances 6000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.720 0.235 0.652 0.720 0.684 0.477 0.718 0.519 Unclear/Unrelated
0.562 0.229 0.587 0.562 0.575 0.337 0.636 0.498 Anti-Roseanne
0.333 0.104 0.382 0.333 0.355 0.241 0.605 0.241 Pro-Roseanne
0.682 0.029 0.705 0.682 0.693 0.663 0.827 0.510 Neutral
Weighted Avg. 0.596 0.192 0.589 0.596 0.592 0.405 0.680 0.465

=== Confusion Matrix ===

a b c d <– classified as
1639 416 188 33 | a = Unclear/Unrelated
548 1237 318 97 | b = Anti-Roseanne
267 356 324 27 | c = Pro-Roseanne
59 97 19 375 | d = Neutral

OneR
=== Summary ===

Correctly Classified Instances 2562 42.7 %
Incorrectly Classified Instances 3438 57.3 %
Kappa statistic 0.0804
Mean absolute error 0.2865
Root mean squared error 0.5353
Relative absolute error 83.4117 %
Root relative squared error 129.1645 %
Total Number of Instances 6000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.886 0.760 0.416 0.886 0.566 0.155 0.563 0.412 Unclear/Unrelated
0.248 0.159 0.474 0.248 0.325 0.108 0.544 0.393 Anti-Roseanne
0.000 0.000 ? 0.000 ? ? 0.500 0.162 Pro-Roseanne
0.000 0.000 ? 0.000 ? ? 0.500 0.092 Neutral
Weighted Avg. 0.427 0.347 ? 0.427 ? ? 0.540 0.335

=== Confusion Matrix ===

a b c d <– classified as
2017 259 0 0 | a = Unclear/Unrelated
1655 545 0 0 | b = Anti-Roseanne
705 269 0 0 | c = Pro-Roseanne
472 78 0 0 | d = Neutral

IBK
=== Summary ===

Correctly Classified Instances 3392 56.5333 %
Incorrectly Classified Instances 2608 43.4667 %
Kappa statistic 0.3359
Mean absolute error 0.2209
Root mean squared error 0.4386
Relative absolute error 64.3207 %
Root relative squared error 105.8332 %
Total Number of Instances 6000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.941 0.579 0.498 0.941 0.652 0.390 0.707 0.517 Unclear/Unrelated
0.311 0.076 0.704 0.311 0.431 0.308 0.646 0.560 Anti-Roseanne
0.188 0.018 0.670 0.188 0.294 0.301 0.632 0.354 Pro-Roseanne
0.696 0.014 0.836 0.696 0.760 0.742 0.903 0.672 Neutral
Weighted Avg. 0.565 0.251 0.633 0.565 0.523 0.378 0.691 0.520

=== Confusion Matrix ===

a b c d <– classified as
2142 105 24 5 | a = Unclear/Unrelated
1390 684 62 64 | b = Anti-Roseanne
650 135 183 6 | c = Pro-Roseanne
115 48 4 383 | d = Neutral

SimpleLogistic
=== Summary ===

Correctly Classified Instances 4419 73.65 %
Incorrectly Classified Instances 1581 26.35 %
Kappa statistic 0.6108
Mean absolute error 0.1863
Root mean squared error 0.3065
Relative absolute error 54.2321 %
Root relative squared error 73.9571 %
Total Number of Instances 6000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.840 0.189 0.731 0.840 0.782 0.637 0.897 0.824 Unclear/Unrelated
0.706 0.150 0.732 0.706 0.719 0.561 0.860 0.793 Anti-Roseanne
0.508 0.046 0.683 0.508 0.583 0.523 0.862 0.646 Pro-Roseanne
0.833 0.014 0.854 0.833 0.843 0.828 0.972 0.871 Neutral
Weighted Avg. 0.737 0.135 0.735 0.737 0.732 0.608 0.885 0.788

=== Confusion Matrix ===

a b c d <– classified as
1912 266 84 14 | a = Unclear/Unrelated
453 1554 139 54 | b = Anti-Roseanne
204 265 495 10 | c = Pro-Roseanne
46 39 7 458 | d = Neutral

For the Merged model:

ZeroR
=== Summary ===

Correctly Classified Instances 2200 36.6667 %
Incorrectly Classified Instances 3800 63.3333 %
Kappa statistic 0
Mean absolute error 0.4253
Root mean squared error 0.4623
Relative absolute error 103.3205 %
Root relative squared error 101.9046 %
Total Number of Instances 6000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
1.000 1.000 0.367 1.000 0.537 ? 0.500 0.367 Anti-Roseanne
0.000 0.000 ? 0.000 ? ? 0.499 0.162 Pro-Roseanne
0.000 0.000 ? 0.000 ? ? 0.499 0.470 UncNeut
Weighted Avg. 0.367 0.367 ? 0.367 ? ? 0.499 0.382

=== Confusion Matrix ===

a b c <– classified as
2200 0 0 | a = Anti-Roseanne
974 0 0 | b = Pro-Roseanne
2826 0 0 | c = UncNeut

NaiveBayes
=== Summary ===

Correctly Classified Instances 3845 64.0833 %
Incorrectly Classified Instances 2155 35.9167 %
Kappa statistic 0.4278
Mean absolute error 0.2573
Root mean squared error 0.4365
Relative absolute error 62.5082 %
Root relative squared error 96.2157 %
Total Number of Instances 6000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.638 0.223 0.624 0.638 0.631 0.414 0.789 0.676 Anti-Roseanne
0.501 0.133 0.421 0.501 0.458 0.344 0.780 0.448 Pro-Roseanne
0.691 0.201 0.753 0.691 0.721 0.494 0.817 0.807 UncNeut
Weighted Avg. 0.641 0.198 0.652 0.641 0.645 0.440 0.801 0.701

=== Confusion Matrix ===

a b c <– classified as
1404 355 441 | a = Anti-Roseanne
288 488 198 | b = Pro-Roseanne
558 315 1953 | c = UncNeut

OneR
=== Summary ===

Correctly Classified Instances 2377 39.6167 %
Incorrectly Classified Instances 3623 60.3833 %
Kappa statistic 0.0308
Mean absolute error 0.4026
Root mean squared error 0.6345
Relative absolute error 97.8042 %
Root relative squared error 139.8632 %
Total Number of Instances 6000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.926 0.887 0.377 0.926 0.536 0.064 0.520 0.376 Anti-Roseanne
0.000 0.000 ? 0.000 ? ? 0.500 0.162 Pro-Roseanne
0.120 0.080 0.573 0.120 0.198 0.067 0.520 0.483 UncNeut
Weighted Avg. 0.396 0.363 ? 0.396 ? ? 0.517 0.392

=== Confusion Matrix ===

a b c <– classified as
2038 0 162 | a = Anti-Roseanne
883 0 91 | b = Pro-Roseanne
2487 0 339 | c = UncNeut

IBK
=== Summary ===

Correctly Classified Instances 3575 59.5833 %
Incorrectly Classified Instances 2425 40.4167 %
Kappa statistic 0.2804
Mean absolute error 0.2833
Root mean squared error 0.4972
Relative absolute error 68.842 %
Root relative squared error 109.6102 %
Total Number of Instances 6000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.321 0.066 0.739 0.321 0.448 0.336 0.635 0.568 Anti-Roseanne
0.205 0.022 0.643 0.205 0.311 0.305 0.634 0.352 Pro-Roseanne
0.944 0.650 0.564 0.944 0.706 0.359 0.670 0.589 UncNeut
Weighted Avg. 0.596 0.334 0.641 0.596 0.547 0.342 0.651 0.543

=== Confusion Matrix ===

a b c <– classified as
707 76 1417 | a = Anti-Roseanne
127 200 647 | b = Pro-Roseanne
123 35 2668 | c = UncNeut

RandomTree
=== Summary ===

Correctly Classified Instances 3803 63.3833 %
Incorrectly Classified Instances 2197 36.6167 %
Kappa statistic 0.4083
Mean absolute error 0.2596
Root mean squared error 0.4728
Relative absolute error 63.0709 %
Root relative squared error 104.2302 %
Total Number of Instances 6000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.629 0.245 0.598 0.629 0.613 0.380 0.676 0.541 Anti-Roseanne
0.373 0.119 0.377 0.373 0.375 0.254 0.615 0.262 Pro-Roseanne
0.728 0.210 0.756 0.728 0.741 0.519 0.759 0.672 UncNeut
Weighted Avg. 0.634 0.208 0.636 0.634 0.635 0.425 0.705 0.557

=== Confusion Matrix ===

a b c <– classified as
1384 353 463 | a = Anti-Roseanne
409 363 202 | b = Pro-Roseanne
523 247 2056 | c = UncNeut

SimpleLogistic
=== Summary ===

Correctly Classified Instances 4487 74.7833 %
Incorrectly Classified Instances 1513 25.2167 %
Kappa statistic 0.5862
Mean absolute error 0.2493
Root mean squared error 0.3478
Relative absolute error 60.563 %
Root relative squared error 76.6581 %
Total Number of Instances 6000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.759 0.181 0.708 0.759 0.732 0.570 0.865 0.799 Anti-Roseanne
0.532 0.049 0.676 0.532 0.595 0.533 0.873 0.660 Pro-Roseanne
0.814 0.181 0.800 0.814 0.807 0.632 0.890 0.869 UncNeut
Weighted Avg. 0.748 0.160 0.746 0.748 0.745 0.593 0.878 0.809

=== Confusion Matrix ===

a b c <– classified as
1669 141 390 | a = Anti-Roseanne
270 518 186 | b = Pro-Roseanne
419 107 2300 | c = UncNeut

<end of report>

2 thoughts on “Lab Journal: Machine Learning on tweets related to the Roseanne-ABC firing incident

Comments are closed.