- Daily Success Snacks
- Posts
- Data Scientists Keep Saying “It Generalizes” Until Production Says Otherwise
Data Scientists Keep Saying “It Generalizes” Until Production Says Otherwise
Strong metrics don’t survive first contact with reality.

Read time: 2.5 minutes
At some point or another, every data scientist has thought to themselves: "It will generalize."
The testing phase was fantastic. The metrics were uniformly high, and the cross-validation results were consistent. There was a high level of confidence and someone else even actually said: "This should generalize".
But when the production data came in, the columns weren't behaving in the same manner as they would have during testing, and the distribution was eroding. Edge cases multiplied, meaning that, technically, the model was still functional, but certainly not as presented in the document. All of a sudden, the concept of “generalization” changed from being a characteristic of the model to an expectation that had been transferred onto the data set in silence.
What Can “Production Trauma” Teach Us About Data Science?
1. Testing is nice... but production isn't!
Testing sets are created before production. Data in the field often arrives late, is not in the right format, and lacks the necessary preparation.
2. We think the world will not change.
Users change their behaviour; systems evolve; the process for generating data tends to change as well.
3. Strong metrics don't mean you're going to make good decisions.
AUC does not tell you exactly what to do when your model is indecisive or fails for unknown reasons.
4. Your first deployment is just a part of collecting more information about how the model works.
You don't know how a model will act until you start providing an incentive to perform and giving it the actual consequences.
5. Monitoring a model in production is more important than measuring its initial performance.
Models that survive in production are those that continue to be monitored and not just relied upon.
💡Key Takeaway:
Saying "it generalizes" is not an endpoint... it is only a hypothesis. The production phase does not care how confident you were in the md_219. Instead, it rewards the models that are made to withstand what actually occurs after they are deployed, rather than the models that are only created to achieve the best results possible on a set of measurements to date.
👉 LIKE this if your production data wrecked the model you thought was the greatest thing since sliced bread.
👉 SUBSCRIBE now for insights into data science that you will not find in tutorials.
👉 Follow Glenda Carnate for insights about how to demolish your models after they are live.
Instagram: @glendacarnate
LinkedIn: Glenda Carnate on LinkedIn
X (Twitter): @glendacarnate
👉 COMMENT with how much production data has blown you away for the first time.
👉 SHARE this with someone who is about to deploy their "best model yet."
Reply