I know you guys won't care but I feel compelled to type this. To me there is an aspect of that paper Skyfetti linked that is disturbing. I see the problem all the time but that paper is a good example of it. I Googled each author of the paper to get the institutions they are indicated as affiliated with. Here is the list:
Sabine Rohrmann - University of Zurich
William G. Nelson - Johns Hopkins
Nader Rifai - Harvard
Terry R. Brown - Johns Hopkins
Adrian Dobs - Johns Hopkins
Norma Kanarek - Johns Hopkins
James D. Yager - Johns Hopkins
Elizabeth A. Platz - Johns Hopkins
A lot of prestige and credibility associated with those institutional names.
But as I wrote above they incorrectly stated their results. I already quoted the abstract and I subsequently looked at the entire paper. The conclusions section includes this Statement:
In conclusion, in this large, nationally representative sample, there was no difference in circulating testosterone concentrations between non-Hispanic black and white men overall.
First of all, one can't even evaluate that Statement because they did not report what the unadjusted mean testosterone levels were. What they're referring to is estimated mean testosterone levels derived through "...applying sampling weights and adjusting for age, percent body fat, alcohol, smoking, and activity."
But the big one is making the statement that there was no difference. There actually WAS a difference in their sample non-Hispanic black and white male estimated adjusted mean testosterone levels. Their estimates 5.25 ng/l for non-Hispanic black males and 5.11 ng/l for non-Hispanic white males.
But the difference in sample adjusted means is not LARGE enough to be statistically significant at their chosen rejection level; which is the 95% level. The proper way to state it is something like, "There is not sufficient evidence, at the 95% confidence level, to conclude that the levels in the populations represented are higher once they are adjusted as described."
Or you could do something like put a 95% interval around the estimated difference between the two adjusted estimated means and say that you're 95% confident that the difference is betwee - x and +x. And you could discuss what that means.
But what they did was completely wrong. Two things about it: 1) It's very disturbing that we have people that smart and with that kind of prestige doing that and 2) that language would not have made it through peer review if peer review was really a quality control process like people think it is. If you did that on a college freshman introductory Statistics course test you'd get marked wrong for doing it. The idea that you can never show "no difference" with a statistical hypothesis test is a "big one" in statistics. You wouldn't get away with it.
And the end result is that people will cite that paper as showing there is "no difference" when the paper didn't show that.
One other thing: The paper is deficient in that it does not show the actual p values derived. It just lets you know if they were <0.05 or <0.01. That does not allow you to really judge things. For example: Say someone compares a difference and end up with p = 0.10. And let's say they were using the 95% rejection level. You know it didn't meet the standard for them to accept the difference as shown to exist. But you also know that if they'd picked the 90% rejection level the evidence would have been sufficient. It's significant at the 90% confidence level.
Yes I know that I'm just a regular guy criticizing people with very impressive credentials who are affiliated with very impressive institutions. Also they all probably have way higher IQs than I do. But I'm right. And that's bad.