Why Learn The Math Behind The Method
I love learning how and why machine learning works, when to use it, and if there is a better method. Studying mathematics at university has embedded the desire to see what’s under the hood when I am working on a new project or with a new method. Therefore, I am resistant to apply a method that I haven’t studied before or understand how it is derived. One of my biggest fears is applying a method in a situation where the method doesn’t make sense (e.g. reporting an R2 value on a nonlinear regression model). I constantly meet people who say, “I don’t need to know how that works, there is already a package out there that does it for me.” or something similar. I couldn’t disagree more. I recently saw a blurb in a Georgia Tech Edx course that addresses this sentiment and I wanted to share its wisdom.
You may rightly ask, why bother with such details? Here are three reasons it’s worth looking more closely.
- It’s helpful to have some deeper intuition for how one formalizes a mathematical problem and derives a computational solution, in case you ever encounter a problem that does not exactly fit what a canned library can do for you.
- If you have ever used a statistical analysis package, it’s likely you have encountered “strange” numerical errors or warnings. Knowing how problems are derived can help you understand what might have gone wrong. We will see an example below.
- Because data analysis is quickly evolving, it’s likely that new problems and new models will not exactly fit the template of existing models. Therefore, it’s possible you will need to derive a new model or know how to talk to someone who can derive one for you.
I would add that it’s important to know when a model is appropriate and the reasons to choose one model over another. For example, KNN is a very popular classification algorithm because of its flexibility. However, if the goal is to figure out which features are most important or influential, KNN provides zero information. Thus, it may be more informative if one used LDA or logistic regression.