Summary of results: which methodology/modality “wins?”
|Algorithm||Speed||CCI %||ROC AUC||RMSE||F-1||CCI %||ROC AUC||F-1||RMSE|
|DecisionTable||Slow||too slow for viable computation on consumer-grade hardware|
|Meta-Classifier||Speed||CCI %||ROC AUC||RMSE||F-1||CCI %||ROC AUC||F-1||RMSE|
|Stack (ZR, NB)||Moderate||37.9333||0.4990||0.4144||NULL||vacuous results, omitted|
|Stack (NB. RT)||Moderate||63.7000||0.8230||0.3795||0.6350||61.9833||0.6980||0.6130||0.4523|
|Vote (ZR, NB, RT)||Moderate||62.0833||0.8430||0.3414||0.6110||64.0500||0.8330||0.6260||0.3830|
My results are contained in a separate text file in lab journal format. Salient results consisted of:
My methodological strategy began with a wide selection of algorithms. In particular, I was concerned about the trade-off between speed and accuracy. If one algorithm yields >80% accuracy (NB: none of those I queried did so) but takes days to compute (eg. Multilayer Perceptron), it may be unsuitable for rapid analysis of the constantly-changed media landscape in which we work.
An algorithm that takes an hour and returns ~70% accuracy may be more desirable (as was the case with SimpleLogistic). Furthermore, depending on the usage case (such as user interfaces or a mobile app), one might prefer something with lower accuracy but near-instantaneous results (NaiveBayes being an excellent candidate).
Another concern is whether certain modalities can demonstrate the rather cynical scenario where people willing to publicly defend a racist cannot be easily discerned from noise and chatter. I shall return to this concern shortly.
All-in-all, NaiveBayes performed well in terms of worst-case computation time, taking very little time to provide results. Using what I refer to as the “vanilla” dataset, results came with a weighted average ROC AUC just over 0.8 and approximately 64% correctly classified instances. On the other hand, ZeroR performed horribly — basically classifying everything in one class (NB: this is still better than just randomly guessing).
A dark horse contender, however, showed up late in the game: SimpleLogistic. SimpleLogistic took under an hour to build a model and run a 10-fold cross-validation while returning better accuracy (CCI %, ROC AUC, RMSE) than the other algorithms I queried. It took about 30-40 minutes to run this depending such modalities as merged data, penalties, and so on. I find the speed-accuracy trade-off to be reasonable. Furthermore, unlike almost other algorithms investigated, applying CostSensitiveClassifier to merged SimpleLogistic yielded slightly improved accuracy (in terms of CCI% and ROC AUC); however, this came at a cost of a higher RMSE.
Lastly, aside from the observations made in the previous paragraph, using meta-classification strategies such as voting/stacking and introducing penalties did not yield very different results from the algorithms they modulated, but were none-the-less interesting to observe.
Upon further scrutiny, I felt tempted to give merit to the rather bold claim that Pro-Rosanne rhetoric is hard to discern from noise (suggested by running the RandomTree algorithm with penalties). A counter-example to this was discovered when running vanilla IBK with penalties (Anti-Roseanne came back misclassified as Unclear/Unrelated rather often).
This underscores a subtle but critical notion: just as one entertains a moral hazard by cherry-picking data, one may also encounter into a similar, far graver problem by bashing models to support a claim and hiding behind algorithms as a sort of unassailable black box.
One thought on “Summary: Machine Learning on the Rosanne-ABC Firing Incident Dataset”
[…] A summary of these results with methodological comments can be found here. […]
Comments are closed.