Combined p-values of baseline variables of randomized controlled trials published in 2022 indicate non-randomness beyond chance

Date & Time
Monday, September 4, 2023, 12:30 PM - 2:00 PM
Location Name
Session Type
Statistical methods
Klang R1, Bodnar O2, Olsson L1
1Camtö (Centre for Assessment of Medical Technology in Örebro), Sweden
2Örebro University School of Business, Sweden

Background: Randomized controlled trials (RCT) are crucial for the evaluation of interventions. This, however, requires that the randomization is carried out correctly. The anaesthetist Carlisle has developed a method to test whether the baseline variables of an RCT could reasonably originate from a true randomization, assuming the p-values are uniformly distributed. In a study from 2017, based on 5,087 RCTs from 8 medical journals, 5.6% more RCTs than expected had a combined p-value > 0.95 or p-value < 0.05 [1].
Objectives: Apply Carlisle’s method to a sample of recent RCTs and compare the findings to Carlisle’s results.
Methods: A sample of 1,075 RCTs, published February 2022, indexed with the MeSH term ‘Randomized Controlled Trial’ in MEDLINE, were checked for eligibility. The inclusion criteria were primary/secondary analyses of RCTs providing number of participants, mean, and standard deviation or standard error, of baseline variables. Carlisle’s method adopts Monte Carlo simulation, ANOVA, and t-test to get p-values of baseline variables, and Stouffer’s method combines them for comparison to a uniform distribution, using R software. A smaller combined p-value indicate that the groups are similar; larger indicate that they are dissimilar.
Results: 566 RCTs were included and 13,085 means of 5,780 (range 1-100) baseline variables were extracted. The proportion of p-values within p-value > 0.95 or p-value < 0.05, p-value < 0.01 or p-value < 0.00001 was 22.8%, 4.8% and 0.05% respectively, i.e., 2, 5, and 500 times larger than would be expected by chance (Table 1). Possible non-randomness was more common in this sample compared to Carlisle’s with the arbitrary limit of 0.95 < p-value < 0.05 but was less common for the extreme limit p-value < 0.00001. The distribution of the combined p-values is presented in Figure 1.
Conclusions: The preliminary findings of this sample of recent RCTs indicate that a larger proportion are associated with non-randomness than expected by chance. The findings are not completely in accordance with Carlisle’s results. Further analyses will be conducted, more baseline variables will be added, and subgroups, such as type of intervention, will be compared. Nevertheless, Carlisle’s method seems to be a promising statistical tool for systematic reviews, and the evaluation of RCTs.