Sex in Science: The NIH Gets it Wrong?

Published17 May 2014
Author Douglas Fields
Source BrainFacts/SfN

Beginning on October 1, researchers seeking NIH grants must balance male and female cells and animals in their NIH funded research. Under the banner of ending sex bias, this new mandate appears to be a significant advance in the way research is done, but many scientists fear the well-intentioned directive is misguided. The reasons that most research scientists believe this policy is a serious mistake is less likely to appear in public forums because, understandably, scientists are reluctant to criticize the hand that feeds them. But this is an important issue that deserves careful consideration and reasoned discourse between scientists, granting agencies, and the public who support all three.

No one questions that it is critical to have an understanding of the biological differences between males and females. The issue is how best to obtain this vital information.

It is true that females are often excluded from experimental studies. It is a mistake, however, to view this as the New York Times does as “a neglected variable.” This sex bias is intentional. In fact, that bias strengthens experimental research; including illuminating sex differences. It is important to understand the reasons — both scientific and practical for this deliberate bias in experimental design. To comprehend this issue it is necessary to go beyond the PC veneer of public policy righteousness and understand how experimental science is done. This is the crux of the matter.

Finding Truth

In doing scientific research — exploring the unknown — it is important to appreciate that the correct answers are not found in “the back of the book.” No one, not even the scientist who conducts the experiments, can ever be absolutely certain that their conclusions are correct. The scientific method is all we have to guide us. This method never provides what we are seeking — the truth. The method only measures the level of uncertainty, calculated mathematically, that the results we obtained in our data collection could have come about by chance alone.

The problem is that there is always variation in responses, measurement errors, and uncontrolled factors that cause the measurements to vary. If you measure your child’s height by drawing a pencil line on the wall, you will get slightly different values if you do it several times. Obviously, your child’s height did not change, but there is always measurement error due to slop in the measurement method and from uncontrolled factors such as his/her posture. “Stand up straight this time!” These errors can make it difficult to decide if responses between two datasets are indeed different.

Think about Olympic skiers crossing the finish line a fraction of a second apart. Are the two skiers really any different? In a race the decision can be defined arbitrarily, but if one is testing a new drug, the consequences of making an error could be a serious matter. Thus, researchers would never reach a conclusion based on the outcome of one race, for example. Scientists must repeat the same experiment over and over to calculate the exact probability that the differences they obtain might be due simply to chance variation and measurement error.

Two factors go into the decision about whether or not data from two groups are different: (1) the magnitude of the difference between two groups —say two seconds difference between a first and second place finish in a race, and (2) the variation in outcome determined by repeated measurements. The first place skier might have just had an exceptionally good run, avoided a slick patch of ice, or any number of other factors to finish two seconds ahead of his competitor in one race. Scientists would make the two racers do the down-hill run a dozen times; calculate the mean finish time for each racer, and also calculate the amount of variation in each skier’s time. Then they would calculate if the difference they obtain could have been due to chance alone. Obviously, if there is enormous variation around the mean times, it will be increasingly difficult to conclude that the two skier’s times are really different no matter what the magnitude of the mean difference in their scores is. It might happen that after a dozen races the differences between the two skiers cannot be distinguished from chance. In that case, the scientist would have the skiers do a dozen more races and this would increase the precision of the measurements and possibly allow the differences to be distinguished from chance.

Vive la difference!

That there are differences between males and females escapes no one in biology. In designing their experiments scientists must remove or control all sources of variation that might be confounding their data or they may never obtain a meaningful result that can be distinguished from chance. This is why scientists do not use wild rats for biomedical research — they use purebred lines of laboratory rats with well-known and well-controlled genetics. They use rats of precisely the same age range. Clearly the biology of aged individuals is different from young or middle-aged individuals.

They control for the size of the animals, especially in studies where drugs are delivered in precise dosages according to the mass of the animal. If there is any basis for suspecting that sex could be a variable in their experiment, scientists may use only one sex. Usually this is a male because there are many sources of variation in females due to their hormonal cycles. In many experiments there is no reason to suspect that sex could affect the outcome and scientists will eagerly use both sexes. This cuts the animal costs in half.

Now let’s imagine that you have repeated your experiments on 10 male rats in your experimental group and 10 male rats in your control group to determine if a new drug lowers blood pressure. The outcome is that the difference in blood pressure in the animals on and off the drug could have occurred by chance three times out of 100 (3%). In other words, if you took the data that you collected only from the control animals and arbitrarily assigned it randomly into “control” and “experimental” columns in your notebook; taking the averages you would get a difference between these two columns that was as big as the difference you got in your real experiment 3 times if you did this randomly 100 times.

This calculation allows scientists an objective way to decide whether or not to believe that their experimental results are not due to chance alone. Obviously, no one would believe a difference between experimental and control groups that has a 50:50 probability of resulting from chance. Arbitrarily, scientists generally accept that if the probability that the difference they find in their experiments could occur by chance alone is less than 5%, they will accept the result.

You can conclude that the two groups in your study of blood pressure medication are different because there is only a 3% chance that this result could have happened by chance. The new drug works! But now, the NIH insists that you use both males and females in your study. Now the results you obtain are more variable. The outcome of your study results in a 10% chance that your treated and control groups are different. You would then conclude that the drug does not work and a potentially life-saving treatment will be trashed.

It is important to realize that the lack of a significant difference in the study when females are included does not mean that the drug works only on males. It might work equally well on both males and females, but the data are more variable for well-known physiological reasons that make cardiovascular responses in males and females different from each other. The PC intrusion into science has denied men and women a new medication.

You can’t necessarily sift through the data afterwards and look at the male and female responses separately because your sample size has been cut in half. By considering males and females separately (five in each group) you now face the same situation as determining who is the fastest Olympic skier in comparing only 6 runs rather than a dozen. To have the same power of making a decision about males and females you could double the sample size, and use 10 males and 10 females in both groups. But this is rarely practical or ethically justifiable. Can you afford the financial cost of doubling the experiment? Can you afford to take twice the time do the research? Is this duplication in spending a prudent use of taxpayer money? Is it justified to kill twice as many animals?

Bear in mind that research is extremely expensive and very time consuming. All labs stretch budgets to the maximum. Researchers work tirelessly and with iconic devotion to their research. A graduate student may have four years to complete the research project. Likewise for a post-doctoral fellow or an assistant professor who has 5-7 years to make tenure or lose his/her job. Is it justified to take twice as long or to use twice as many researchers to do the study? What about the planned new research project that could not be performed because the NIH dictates that this study must be duplicated with two sexes?

The greatest cost of all from this new dictate may be to bog down the pace of research. For those awaiting new treatments for disease, the pace of biomedical research now is far too slow. Even more devastating is the incalculable loss of discoveries that may never be made because planned new research must be put aside to duplicate the NIH-funded study. The NIH mandates this waste of resources regardless of whether or not the scientist considers sex an important criterion for the hypothesis they are testing, and they do this without providing any funding to support the extravagance they would impose.

If the outcome is that the responses between sexes are the same, you have doubled the time and effort for an answer of equivocal merit. That result is not likely to be published in a high-profile journal. However, the situation becomes far worse in practical terms if differences between sexes start to emerge from the data. The sex differences could be due to hormones. Now the researcher needs to determine which hormone is responsible: testosterone, estrogen, progesterone? A difference that is due to a hormone elevated during one phase of a female menstrual cycle does not mean that the drug might not benefit women who are postmenopausal or who are in a different phase of the cycle from the phase that the female rats happened to be in that were used in the study. The results are getting muddled.

Scientists who investigate sex differences know these issues well and they carryout extensive experiments to draw correct conclusions about sex differences. After finding a difference between males and females, for example, they would remove the ovaries of females to control specific hormones by injection and repeat the experiments. Now how large has your study grown from the original one designed to test a very different hypothesis? The study has multiplied in size.

The situation is even more complicated, because the differences between sexes go well beyond hormonal differences. Males and females have different developmental profiles, for example, which could in effect mix animals of different ages. Females generally live longer so a 65-year-old woman is not equivalent biologically to a 65-year-old man (likewise for rats). Neither is a 12-year-old girl the same biological age as a 12-year-old boy. Males and females are not the same size and they differ in strength and many other aspects. But the fascinating differences between the sexes go even deeper. Male and female cells — every one of them — are different because males and females do not have the same complement of X and Y chromosomes. This genetic difference, having nothing to do with hormones, can make all the difference in how male and female cells respond in a lab dish or in the body. Scientists know this.

Now the experiment has exploded well beyond the specific hypothesis and carefully thought-out design the researcher undertook. The NIH mandate, although well intentioned, runs counter to the fundamental process of experimental research. It is wasteful, bogs down the pace of research, and promotes poor science that will in fact undermine the objective of identifying the important differences in biology between sexes. That goal represents an experiment in itself, and it cannot be obtained as a mandated intrusion into every hypothesis being investigated.

Defying Logic

Advocates of this new directive are dismissive of the fact that including equal numbers of male and female cells or animals in studies will increase variance. This assertion defies logic and it undermines the premise for their argument for the need to incorporate both sexes in experiments. If the sexes were not different, there would be no need to use both. If the sexes are different, variation in the data must increase if both males and females are included. This is not rocket science. If weightlifting in the Olympics were co-ed instead of separated into male and female competitions, the variation in the mean scores for two teams (say pounds lifted in a bench press), would be far greater than if the competition were separated into male and female events. The means would differ and importantly, the variance would be far greater if you mixed both men and women’s scores together.

Some may argue that obviously, men and women must be evaluated separately in something like a heavyweight lifting competition because it is well known that women are not as big and strong as men. That is precisely the point. Why mandate that both sexes be studied in every proposed experiment funded by the NIH regardless of the specific hypothesis under investigation or the biological basis for suspecting that the outcome could differ for the two sexes? Here’s where you need to give biologists some credit for being the best person and the person with the greatest interest at stake to rationally design experiments to test a specific hypothesis as rigorously and as economically as possible. If there is a biological reason to study both sexes, they will necessarily do so. If there is no prior justification for studying both sexes or if the researcher cannot for lack of resources or competence in the area of sex research perform the study, they should not do it.

If a scientist has excluded one sex from their experiments, clearly this means that there is reason to suspect that the results will differ between the sexes. The fact that the researcher has excluded one sex most likely means that no new insight would be gained from studying both sexes. It makes no sense to mix men and women in a footrace. Males are bigger, stronger, and faster than females. Does it make any sense in a study on learning to use both sexes of rats running a maze if males are faster runners? A grant proposal submitted to the NIH to prove that male rats are bigger and stronger than females would never be funded, yet under the new mandate, NIH will require every researcher to prove this and similar well-known sex differences over and over again in every study. The taxpayer should be and will be outraged by the government mandate to prove known differences between sexes of rats and mice in every study funded by the NIH.

Reaching the Shared Goal

We know well that biological responses vary greatly among different groups. Age, race, geography, environment during rearing, nutrition, and a plethora of other differences, including sex, will influence biological responses. Even individual differences are critical. An antibiotic that is life-saving for one person may kill another person who has sensitivity to it. We know these facts from years — decades of research that isolated these variables and tested each one carefully. These findings did not emerge from one study. They never could have.

It is intolerable that our mothers, daughters, sisters, and wives face a situation in which their health, medical treatments and often fundamental understanding of the biological processes at work in their bodies are poorly understood or even incorrect. The solution to this important problem, which directly affects every one of us, requires a solution based on science and the scientific method. Instead of the absurd imposition that testing a sex-based hypothesis be added to every study funded by the NIH, the NIH should instead promote and fund new research programs to investigate sex-based biological differences as a high priority.

This would enable scientists to study this important and complex area if they feel their expertise, resources, and research priorities equip them to do so in the most rigorous manner possible. A feel-good mandate with a PC box to check off on a grant application is not going to resolve the problem. Worse, most scientists feel that a solution to a serious problem resulting from a clash between political rhetoric and science could backfire and undermine not only achieving the intended goals of understanding sex differences in research studies — it could undermine the entire body of NIH-funded research.

The Society for Neuroscience and its partners are not responsible for the opinions and information posted on this page. Terms & conditions.

About the Author

Douglas Fields

R. Douglas Fields is Chief of the Nervous System Development and Plasticity Section at the National Institutes of Health, NICHD, in Bethesda, Maryland, and author of the new book about sudden anger and aggression “Why We Snap,” published by Dutton, and a popular book about glia “The Other Brain” published by Simon and Schuster.