Pwns wrote: ↑Sun Nov 13, 2022 1:13 pm
JohnStOnge wrote: ↑Sun Nov 13, 2022 7:41 am
A story that I think illustrates how decisions made by the investigator, even when reasonable, can affect conclusions. Skip to the the paragraph near the end that starts with "But here is the thing" if you just want the gist.
Yesterday I had some fun doing an analysis of the association between Party in total control of States and higher cumulative COVID-19 death rates since the start of the pandemic. I got the info on States with Republican Governors as well as Republicans in control of both State Houses and on States with Democrat Governors and Democrats in Control of both State Houses at
https://ballotpedia.org/Partisan_compos ... #Trifectas .
Since COVID-19 started in 2020 I did not include States where the situation has changed since 2019 (Montana, New Hampshire). That left me with 21 Republican controlled States and 14 Democrat controlled States. I used COVID-19 death rate (deaths per million population) data from Worldometers . com updated through 11/11/2022.
States controlled by Republicans had a higher mean death rate (3544 vs. 2922) than States controlled by Democrats did. The difference is significant at 97% confidence.
I decided to see if the difference would persist if I did a model and controlled for % population >=65, population density, % Black population, and % in Poverty.
I ended up with a model suggesting that, most of the time, Republican control means higher death rate even when those factors are controlled for. When expected values are predicted for the States with all variables as they are except the Party control variable is set at either Republican or Democrat, the mean death rate when all States are set at Republican is 3507 and that when all States are set at Democrat is 3134. In 29 of 35 cases, the expected value for a State was higher when Party control was set at Republican.
But there were exceptions associated with States having high percent Black populations. There is a coefficient in the model that causes it to predict, when all other things are equal, a higher case rate when Party is set at Democrat and percent Black population exceeds 27.2 percent.
But here is the thing: I could have cheated and removed that coefficient from the model. There is a thing called Variance Inflation Factor (VIF). An on line resource I use actually recommended removing the variable from the model because VIF is >5. But my standard approach is to remove a variable when VIF is >=10. And the VIF for the critical variable in this case is 9.9. Close but no cigar. So the variable stayed in my model.
Had I removed it, the model would have always predicted, when all other things are equal, a higher death rate when Party control was set at Republican. And it would have been a reasonable decision because that was recommended by my on line resource. However, leaving it in was also reasonable. My criterion is >=10 because a statistics textbook on multivariable analysis that I obtained back in the 1990s recommends that.
So one reasonable decision yields saying Republican States have done worse overall but with the caveat that the situation is different when percent Black population is high. The other reasonable decision yields saying Republican States have done worse under all circumstances.
It's why I believe that intellectual honesty dictates establishing the rules ahead of time and not being tempted to change them...even slightly...when the results aren't quite what you were expecting or maybe even wanting.
If I'm understanding right, when you had percentage of black people in the model Democrat states had higher death rates. I've seen VIF used different ways in different contexts but what I assume it means here is that the coefficient standard errors grow when you put the percentage of black people in the state in the model. Well, poverty% and black% are correlated so that's going to inflate the variance with only 50 observations and 3 versus 2 variables. Look at what the model does with only the political variable and black% and see how it looks.
You also failed to adjust for age. I would think Republican states generally have older (at least older white) populations.
For your information, here is the model:
Death Rate = -603.2066125 + 0.998075804xD + 5609.471561xB + 23722.48294 x P + 6361.081799xRTx65 -6462.969037xRTxB
Where:
D is population density in persons per square mile
B is % Black population
P is % in poverty in population
RT is a dummy variable assuming the value 1 if Republican Control and 0 if Democrat control
65 is % population
>65
So what happens is that, if Democrats are in control, the last two terms become 0. That means, in Democrat controlled States, has no effect. It always has some effect in Republican controlled States, and the effect increases as u]>[/u]65 rises.
What also happens is that, when % Black population is low, the net effect is higher for Republican States because the 5609.471561xB term positive term has a larger absolute value than the -6462.929037xRxB negative terms.
I did make an error in the discussion above about % Black. What happens is that, if a State is Republican, Death Rate always goes down as % Black goes up. That's because, when RT is 1, you get +54609xB - 6462xB. When RT is 0 for Democrat control, there is no % Black effect. So when you get to high %Black population, it becomes more likely that a lower value will be predicted for Republican control. The 6 States where the model predicted lower case rates, all other things being as they were, for Republican Control have 23.4, 38.8, 26.6.17.3, 32.1, and 27.3 percent Black populations. The median percent Black population among the States is 9.1.
I did test the model for significant departure from normal distribution of residuals and for homogeneity of variances and there were no significant departures. As noted, multicolinearity was on the edge but the model barely passed. I checked to see what would happen if I removed RTxB and continued backwards elimination and ended up with only Population Density, % in Poverty, and RTx65 in the model. But that model was characterized by significant departure from homogeneity of variances so that's another argument for sticking with the first one.
Anyway, I think it is a great example of how one could get the result they want with observational data by changing the rules after the fact. An argument CAN be made for using the model without % Black population in it. Even though the homogeneity of variances assumption is not met with the model with no % Black population in it, one can still say the coefficients are unbiased estimators and still make the general statement that death rate is always predicted to be higher for Republican controlled States. And one could just say RTxB was removed because VIF was >5 without mentioning that they went in with a rule of rejecting a variable if it was
>10. Rejecting it at >5 would be viewed as acceptable and no one would ever know you changed the rules because you didn't get what you wanted.
I personally have always believed that a requirement for publication in a journal should be that you send your detailed methods to the journal beforehand so you have to do what you went in planning to do and can't fudge like that.