Tuesday, 16 July 2019

Decision trees for feature selection

I asked twitter who is using decision trees and what for. Most were using them, unsurprisingly, to make decisions. It wasn't always clear how the trees themselves were built.

If you are armed with data, so that each row has some features and a category - either yes/no, or one of many classes - you can build a classifier from the data. There are various ways to decided how to split up the data. Nonetheless, each algorithm follows the same overall process. Start with a tree root node, with all the data, and add nodes with part of the data.

Then

  1. Pick a feature
  2. Split the data set, some to the left branch, some to the other branch (or branches) depending on the value of the feature
  3. If all the data at a node is in the same category (or almost all in the same category) form a leaf node
  4. Continue until each node is a leaf node.



This is a bit like a sorting algorithm: in quick sort, you choose a pivot value and split the data down one branch or the other, until you have single points at nodes. Here we don't choose a pivot value but features. The way to pick a feature can be based on statistics, information theory or even at random. At each step, you want to know if all the items in one category tend to have the same value or range of values of a feature. Once you are done you have a tree (or flow chart) you can apply to new data. Each way to split has various pros and cons. You can even build several trees. A random forest will build lots of trees and they vote on the class of new, unseen data. You could build your own voting system, using a variety of tree induction techniques. This might avoid some specific problems, like over-fitting from some techniques.

You can use decision tree induction in a variety of places, even if you don't want a full decision tree or ruleset. A rule set is a tree written in as a sequence of if statements. For example,

If currency is USD then data goes missing.

If you are moving a data source from one provider to another, and some data goes missing, can you spot what the missing items have in common? You could do a bit of manual investigation, say using pivot tables and data filters in Excel. However, a decision tree might find common features far more quickly than you can. This is a form of feature selection - using an algorithm to find pertinent features. Find a decision tree implementation in a language you know, or write one yourself, and have an experiment.

My book, Genetic Algorithms and Machine Learning for Programmers, has a chapter explaining how to build one type of decision tree. Take a look. Lots of machine learning frameworks also have tools to help you build decision trees. Next time you want to know what certain things have in common, try a decision tree and see you you learn. Machine learning is often about humans learning, rather than the machines.