Key steps in ML setup

Problem setup
1. what is the existing solution
2. user interaction and batch or online mode
3. input output and historical data
Mode - Scale and latency requirements
1. online then latency matters
2. offline then freshness matters
Metrics - Telemetric
1. offline metrics
  1. AUG, log loss, precision, recall, f1, NDCG etc.
2. online metrics
  1. end-to-end metrics and component metrics
3. user behavior indicators
Architecture
1. Components
Training data
1. Corpus centric - manual label
2. Closed loop - historical user interaction
  1. Maybe from heuristic based first version - first step - no ML
Feature Engineering
1. Make the crucial signal in the data pop
Model Training
1. Select appropriate model structure per problem
2. Tuning hyper-parameters according to offline metric
Piloting
1. Direct small % user to new model, collect telemetric and decide launch or not?
Iterative Model Improvement