Machine learning is gradually reaching its peak with economists predicting that it will be one of the most coveted jobs in the nearest future. This has led many millennials to start taking courses and trainings in machine learning. Machine learning in its simplest terms is a set of computer systems that automatically improve with experience. That is studying machine learning provides insights on how we can create almost self-sufficient computer systems programmed to be in a never-ending learning process. Why that may seem like a walk in the park, we must approach the study of machine learning with certain theories in mind. In an article posted on EliteDataScience key questions that we must ask ourselves before venturing into machine learning are looked at.
See below 5 theories to know and questions that we must be willing to answer before venturing into machine learning according to the article:
Planning and data collection: Data collection can be an expensive and time-consuming process. What types of data do I need to collect? How much data do I need (hint: it’s different depending on the model)? Is this challenge feasible?
Data assumptions and preprocessing: Different algorithms have different assumptions about the input data. How should I preprocess my data? Should I normalize it? Is my model robust to missing data? How about outliers?
Interpreting model results: The notion that ML is a “black box” is simply false. Yes, not all results are directly interpretable, but you need to be able to diagnose your models to improve them. How can I tell if my model is overfit or underfit? How do I explain these results to business stakeholders? How much room for improvement is left?
Improving and tuning your models: You’ll rarely reach the best model on your first try. You need to understand the nuances of different tuning parameters and regularization methods. If my model is overfit, how can I remedy it? Should I spend more time on feature-engineering or on data collection? Can I ensemble my models?
Driving to business value: ML is never done in a vacuum. If you don’t truly understand the tools in your arsenal, you can’t maximize their effectiveness. Which outcome metrics are most important to optimize? Are there other algorithms that work better here? When is ML not the answer?