

there’s already mountains of evidence of static typing making code significantly less buggy on average
What is this mountain of evidence? The only evidence I remember about bugginess of code across languages is that bug count correlates closely to lines of code no matter the language.
That is great. But isn’t this more than the scientific method? It is rigor, and the capacity to look at all the factors in reality instead of cherry-picking one for a laboratory experiment.
I do agree we all could do with more rigor and also avoid cherry-picking factors to study, in any discipline.