An Introduction

This site encompasses some of my research as a graduate student in Logic, Computer Science, and Mathematics. My interests include computational mathematics, conceptual modeling, feature engineering, and explainable/ethical artificial intelligence.

As time permits I shall publish research material and works in progress. This may include topics in Cognitive Systems (Organic and Machine Learning), Model Theory, and Computational Complexity, but I may also share some anecdotes about methodology as my lab work progresses. Additionally, my previous and forth-coming publications/collaborations shall reside here.

My Spring 2020 JPL Reading List

In January 2020, I began a very pleasant internship at NASA Jet Propulsion Laboratory, studying graphics preprocessors and how to invoke parallelism with large terrain rendering tasks. As I had barely any background in either of these very interesting subjects, and needed to build a bridge between the two, I spent nearly every day going to the JPL library. That is, until the work-from-home order struck. So, this has been my Plague Reading List, Part 1.

Standard C++ IOStreams and locales : advanced programmer’s guide and reference

Earths of distant suns : how we find them, communicate with them, and maybe even travel there

Introduction to applied nonlinear dynamical systems and chaos

Graph-based natural language processing and information retrieval

Explanatory nonmonotonic reasoning

Learning GNU Emacs

Computer systems : digital design, fundamentals of computer architecture and assembly language

Qualitative approaches for reasoning under uncertainty

MySQL reference manual : documentation from the source

Introduction to high performance computing for scientists and engineers

Data communications and networks : an engineering approach

Extraterrestrial intelligence

High performance scientific and engineering computing : hardware/software support

Regression modeling strategies : with applications to linear models, logistic regression, and survival analysis

Machine learning and systems engineering

Concrete abstract algebra : from numbers to Grobner bases

Nets, terms and formulas : three views of concurrent processes and their relationship

Mathematical foundations of computer science

Categorical data analysis

Matching theory

Data integration blueprint and modeling : techniques for a scalable and sustainable architecture

Parameterized complexity

High performance computing : paradigm and infrastructure

TCP/IP protocol suite

Large-scale C++ software design

Encountering life in the universe : ethical foundations and social implications of astrobiology

Other minds : the octopus, the sea, and the deep origins of consciousness

Modern C++ design : generic programming and design patterns applied

Reverse engineering of object oriented code

Software abstractions : logic, language and analysis

Extraterrestrial languages

Algorithms and theory of computation handbook

Secure programming cookbook for C and C++

Archaeology, anthropology, and interstellar communication

Semiparametric theory and missing data

The pocket handbook of image processing algorithms in C

Design of experiments : ranking and selection : essays in honor of Robert E. Bechhofer

Designing digital systems with SystemVerilog

C++ network programming

Intelligent control and computer engineering

Operations research : an introduction

Logic-based methods for optimization : combining optimization and constraint satisfaction

Applied combinatorial mathematics.

Visual complexity : mapping patterns of information

Dawn of the new everything : encounters with reality and virtual reality

Handbook of logic and language

Recurrent neural networks : design and applications

Handbook of computational methods for integration

Data structure programming : with the standard template library in C++

If the universe is teeming with aliens … where is everybody? : fifty solutions to the Fermi paradox and the problem of extraterrestrial life

The computational beauty of nature : computer explorations of fractals, chaos, complex systems, and adaptation

Tutorial: Neural Networks

Neural networks are a paradigm rather than just a single algorithm.

They originally resembled human neurons in that they received an input, then output a signal only if a threshold was met. This is an all-or-nothing process: either the neuron fires a transmitter or it does not. The paradigm was extended dramatically by modern computation power and now allows for massively complicated binary decision problems to be performed.

Although many neural networks have a ton of arcane connections and “hidden layers” (hidden only in that displaying them all is tedious), I have hand-drawn an example. This is a fully working neural network. My inspiration comes from participating in YCombinator’s Startup School over this summer. It solves the problem many of us keep running into: how to rapidly tell if a startup is BS (a bad startup).

This is a functional neural network. It takes in a set of words and decides if a start-up venture is a “bad startup” or not. It does not say whether the start-up is good; just is if it is a “bad startup.” At each layer, a series of keywords are in circles; if a sufficient number of these nodes get activated, then the neural network tells you that the idea is BS.

So, for example, if a startup is going to disrupt the cloud using crypto, and has a web 2.0 big data blockchain, you can assume their initial coin-offering, despite being full-stack, is really just a slow spreadsheet done in a browser by math minors. That’s BS man!

Many thanks to William Gasarch at UMD for inspiring this.

Introducing: Guardia-NN

At Hacktech 2019, I intend to build a functioning prototype of the Guardia-NN system.

The hardware prototype system consists of the following:

A hardware device that features a huge button (representing “I DON’T WANT TO SEE THIS”) and an indicator (representing “YOU SAID YOU DON’T LIKE STUFF LIKE THIS”).

The device solves two problems at once: Firstly, people don’t like to see things that offend/traumatize them. But secondly, people generally don’t like being told by others (who may not share the same views and values) what is and is not offensive or triggering.

In other words, the machine serves as a guardian, using neural networks (hence the name Guardia-NN) — and replaces the almost universally reviled trope of the overly politically-correct “social justice warrior” with a machine. Now that’s automation we can all get behind!

It performs this task by picking up on user-defined trigger warnings (text tags that indicate that content will cause distress, offend, or otherwise bother the user), and relies on neural networks (large chains of decision making procedures) in order to learn from from repeated use what the end-user does and does not want to see. The device warns users that content ahead may trigger them, and allows them to navigate around said content with minimal exposure.

Because of its limited computational power, and reliance upon user interaction, it does not represent a feasible means of censoring others’ internet use. Rather, its intended use is to allow people to have a warning sign before exposure to content that may cause psychological harm and thus mitigate their exposure, akin to an allergen warning system.

Tutorial: Machine Learning Data Set Preparation, Part 4

In this one, I want to talk about the output of data. How we frame the output of an algorithm is as delicate as the original choice of algorithm itself. The informal cycle I have been working with has been data sourcing and preparation, followed by processing (applying statistical methods and machine learning algorithms), and then presenting that in a form that people can understand.

In each of these procedural points, decisions have to be made.

Take a gander at the following two charts:

They look different. But they also look the same. What accounts for this? In the second chart, a great deal of empty space up top evokes some kind of downward, negative pressure. Assuming that the viewer is a native user of a language that reads left-to-right, top-to-bottom, one might be inclined to say that the first graph shows good performance, and the second indicates worse performance.

On closer examination, however, there is no actual difference in the graph. The differences are in the y-axis. In the second chart, the y-axis begins at 200 (rather than 0), and goes all the way up to 1000. The peak, just before 2008, remains just over 700 in both graphs. Both charts use and represent the same data without any distortion. So the manipulation is at the framing-level.

Future installments will cover some sneakier ways to present data. The source code for this example can be found here on my github.

Tutorial: Machine Learning Data Set Preparation, Part 3

To see the entire Machine Learning Tutorial, go here.

Remember that boring data?

NAME COUNTRY OWN A CAR LIKES ICE-CREAM
Eliza Santiago Guatemala Yes No
Fred Winchester Canada No Yes
Marvin Ngoma Ghana Yes No
Xiong Mao USA Yes ???

This example may remind some of the battered adage that “correlation does not imply causality.” This cautionary statement is often the only thing people remember from their brief exposure to statistics. While it is certainly useful, it is not entirely the case.

Let’s remove the extraneous details such as the name and country, replacing them with a generic, indexed tag. After all, we aren’t really (at this point), interested in whether certain details such as the number of vowels in a name, or what part of the world we find a country, have any impact on car ownership or ice-cream preference. To take those into account would be to introduce overfitting, which is the phenomenon of having so much information in a model that it becomes burdensome to separate the data that has a causal effect from that which we ought to consider arbitrary.

To take it a step further, let’s generalize car ownership and ice-cream preference to “A” and “B.” We obtain something very similar to a truth table in deductive logic.

NAME COUNTRY A B
n1 c1 A NOT-B
n2 c2 NOT-A B
n3 c3 A NOT-B
n4 c4 A ???

One of the early promises of inductive and probabilistic models is that by putting data into a sophisticated enough machine, hidden rules will emerge. From this it can become really tempting to treat these hidden rules as having deductive weight, in the same way that statements such as “all birds have wings” and “all creatures with wings can fly” allow one to deduce “all winged creatures can fly.” But there are massive problems with this beyond mere arrogance. The biggest problem is that with data obtained in the wild may not have been generated from a deductive rule (if A then B). With my personal methodology, chaos typically reigns supreme.

One is a threshold problem. As I see it, what is the ideal threshold between underfitting (too little data to gleam any decent insight) and overfitting (too much data to get a reliable model that can provide accurate results in a reasonable amount of time)?

Consider this model:

Animal Feathers? Wings? Can Fly?
Merlin Yes Yes Yes
Kiwi Yes No No
Dolphin No No No
Vampire Bat No Yes ???
Penguin Yes Yes ???

Here, we give our inductive engine (i.e. a machine learning agent) a lot of details from which to issue decisions. We could assume that this engine is intelligent enough not to take “the animal’s name ends with –in” as a criteria, but that is a bold assumption. Sure, if we are doing supervised machine learning, then we should train our machine to answer whether a given animal can fly, based off of a combination of the most relevant information. But just how this machine agent knows what information is relevant and which should be considered a coincidence lays squarely on the humans training that machine.

In unsupervised models, we can’t make the assumption that machines won’t learn from superfluous details such as whether an animal’s name ends with –in or not. Adding the generic, indexed tags as animal names, similar to the tags in the second table of this lesson, can sidestep this and lessen the risk of overfitting.

Given the data on Merlins, Kiwis, Dolphins, Vampire Bats, and Penguins, what answer should we expect regarding bats’ and penguins’ ability to fly?

Tutorial: Machine Learning Data Set Preparation, Part 2

To see the entire Machine Learning Tutorial, go here.

Let’s start with the data.

Data just means “that which is given” in Latin. If I handed you a muffin, that’s data. But that’s not the data we’re going to think about. Data in this case typically means a bunch of information, and we need to extract that information from a source, put it into a format a machine can read, and then parse it in order to gleam something that isn’t immediately obvious from first examination.

What we’re given can be a picture, or a string of letters. The task of machine learning is to infer from what is given, those things that are not given. In some cases, humans will assist in a training role. That is, a person will be asked to identify what object(s) are in a picture (is it a fire hydrant? is it a car?) or what a sentence means. In other cases, there will be no training, and the machine will learn from just the data.

This concept translates fairly well over both visual and linguistic cases, but the linguistic case is a bit easier to start with.

So let’s consider some really boring data.

NAME COUNTRY OWN A CAR LIKES ICE-CREAM
Eliza Santiago Guatemala Yes No
Fred Winchester Canada No Yes
Marvin Ngoma Ghana Yes No
Xiong Mao USA Yes ???

We have some made-up people: Fred Winchester, Eliza Santiago, Marvin Ngoma, and Xiong Mao. And we have their phone numbers, the type of ice-cream they like, and if they drive a car. From this, we must ask part of the data we actually need and what is extraneous. Phone numbers are assigned in a largely arbitrary (but not necessarily random) fashion.

For the sake of reducing details (and thereby lessening the possibility of overfitting), we ought to either discard the phone number entirely, or at least reduce it to just the country code (say, +44 for the UK) and area code prefix. Let’s assume we had their name, their number, and the answer to at least one of the questions about their ice cream preferences and whether they drive.

So the easy question is, provided with this information, do we have sufficient data to know Xiong Mao’s ice cream preference?

The harder question is why this information is or is not sufficient. In subsequent installments of this tutorial I will go into the arguments for and against whether we can make a deductive claim off of incomplete information (this being one of the core premises and promises of machine learning).

Tutorial: Machine Learning Data Set Preparation, Part 1

To see the entire Machine Learning Tutorial, go here.

In this multi-part tutorial, I shall go over the basics of taking “live” human data and putting into a suitable format to feed into one’s machine learning platform of choice. This series will go from general to specific and offer insight on methodology before going into gathering data, putting this data into a machine-readable format, and then feeding this into machine learning platforms such as TensorFlow and Weka.

To start, I have to think of the data set in at least two lights.

One is theme-oriented: the data has to have a thematic character that is neither too broad as to capture spurious correlations, and not so narrow that it confirms the obvious. This is the story told by the data.

The other is feature-oriented. A good data set needs to have its raw data converted into a workable ontology — which is just a fancy word to refer to the objects and the landscape they inhabit. These are the nouns of story told by the data.

These are methodological, but reflect a deeper concern: doxology. What human belief do these reinforce? Will these reinforce a prevailing status quo, something people take as a given, or will the data be such that new insights can be gleamed? This goes beyond merely trying to avoid overfitting (when data is so finely grained that it has too many details and thereby has too much extraneous information to report to be of much use). This means making sure that if your data set is going to be free from racial bias, care has been taken to remove proxies for racial demographics, such as zip code.

Summary: Machine Learning on the Rosanne-ABC Firing Incident Dataset

Summary of results: which methodology/modality “wins?”

Vanilla Merged
Algorithm Speed CCI % ROC AUC RMSE F-1 CCI % ROC AUC F-1 RMSE
ZeroR Instant 37.9333 0.4990 0.4144 NULL 47.1000 0.4990 NULL 0.4536
OneR Instant 43.0000 0.5420 0.5339 NULL 52.9833 0.5660 NULL 0.5599
NaiveBayes Fast 63.8500 0.8160 0.3808 0.6410 63.9667 0.8000 0.6430 0.4374
IBK Fast 56.5333 0.6910 0.4386 0.5230 59.5833 0.6510 0.5470 0.4972
RandomTree Fast 59.5833 0.6800 0.4474 0.5920 62.7167 0.6700 0.6210 0.4954
SimpleLogistic Moderate 73.6500 0.8850 0.3065 0.7320 73.6500 0.8730 0.7300 0.3502
DecisionTable Slow too slow for viable computation on consumer-grade hardware
MultilayerPerceptron Slow
RandomForest Slow
Vanilla Merged
Meta-Classifier Speed CCI % ROC AUC RMSE F-1 CCI % ROC AUC F-1 RMSE
Stack (ZR, NB) Moderate 37.9333 0.4990 0.4144 NULL vacuous results, omitted
Stack (NB. RT) Moderate 63.7000 0.8230 0.3795 0.6350 61.9833 0.6980 0.6130 0.4523
Vote (ZR, NB, RT) Moderate 62.0833 0.8430 0.3414 0.6110 64.0500 0.8330 0.6260 0.3830
CostSensitive (ZR) Instant 37.9333 0.4990 0.4144 NULL 36.6667 0.4990 NULL 0.4623
CostSensitive (OR) Instant 42.7000 0.5400 0.5353 NULL 39.6167 0.5170 NULL 0.6345
CostSensitive (NB) Fast 63.8500 0.8160 0.3808 0.6410 64.0833 0.8010 0.6450 0.4365
CostSensitive (IBK) Fast 56.5333 0.6910 0.4386 0.5230 59.5833 0.6510 0.5470 0.4972
CostSensitive (RT) Fast 59.5833 0.6800 0.4474 0.5920 63.3833 0.7050 0.6350 0.4728
CostSensitive (SL) Moderate 73.6500 0.8850 0.3065 0.7320 74.7833 0.8780 0.7450 0.3478

My results are contained in a separate text file in lab journal format. Salient results consisted of:

Continue reading “Summary: Machine Learning on the Rosanne-ABC Firing Incident Dataset”

Lab Journal: Machine Learning on tweets related to the Roseanne-ABC firing incident

This is my lab journal for the analysis of a data set composed from tweets related to the 2018 firing of Rosanne from ABC over a racist statement.

A summary of these results with methodological comments can be found here.

1) Preparation/Parsing the data set

2) Running the data-set as-is (“vanilla”)

3) Merged data set (Pro, Anti, UncNeut)

4) Meta classifications – Voting and Stacking on the vanilla and merged data sets

5) Introduction of Penalties via CostSensitiveClassifier

Continue reading “Lab Journal: Machine Learning on tweets related to the Roseanne-ABC firing incident”

Turing Tests (Chess)

One of these sets of Chess moves represents a match between two human agents; the other has at least one machine agent as a player.

a) 1. e4 e5 2. Nf3 Nc6 3. Bb5 Nf6 4. d3 Bc5 5.Bxc6 dxc6 6. Nbd2 Bg4 7. h3 Bh5 8. Nf1 Nd7 9. Ng3 Bxf3 10. Qxf3 g6 11. Be3 Qe7 12. 0-0-0 0-0-0 13. Ne2 Rhe8 14. Kb1 b6 15. h4 Kb7 16. h5 Bxe3 17. Qxe3 Nc5 18. hxg6 hxg6 19. g3 a5 20. Rh7 Rh8 21. Rdh1 Rxh7 22. Rxh7 Qf6 23. f4 Rh8 24. Rxh8 Qxh8 25. fxe5 Qxe5 26. Qf3 f5 27. exf5 gxf5 28. c3 Ne6 29. Kc2 (diagram) Ng5 30. Qf2 Ne6 31. Qf3 Ng5 32. Qf2 Ne6 ½–½

b) 1. Nf3 Nf6 2. d4 e6 3. c4 b6 4. g3 Bb7 5. Bg2 Be7 6. O-O O-O 7. d5 exd5 8. Nh4 c6 9. cxd5 Nxd5 10. Nf5 Nc7 11. e4 d5 12. exd5 Nxd5 13. Nc3 Nxc3 14. Qg4 g6 15. Nh6+ Kg7 16. bxc3 Bc8 17. Qf4 Qd6 18. Qa4 g5 19. Re1 Kxh6 20. h4 f6 21. Be3 Bf5 22. Rad1 Qa3 23. Qc4 b5 24. hxg5+ fxg5 25. Qh4+ Kg6 26. Qh1 Kg7 27. Be4 Bg6 28. Bxg6 hxg6 29. Qh3 Bf6 30. Kg2 Qxa2 31. Rh1 Qg8 32. c4 Re8 33. Bd4 Bxd4 34. Rxd4 Rd8 35. Rxd8 Qxd8 36. Qe6 Nd7 37. Rd1 Nc5 38. Rxd8 Nxe6 39. Rxa8 Kf6 40. cxb5 cxb5 41. Kf3 Nd4+ 42. Ke4 Nc6 43. Rc8 Ne7 44. Rb8 Nf5 45. g4 Nh6 46. f3 Nf7 47. Ra8 Nd6+ 48. Kd5 Nc4 49. Rxa7 Ne3+ 50. Ke4 Nc4 51. Ra6+ Kg7 52. Rc6 Kf7 53. Rc5 Ke6 54. Rxg5 Kf6 55. Rc5 g5 56. Kd4 1-0

The answer is below.

Continue reading “Turing Tests (Chess)”