Evaluation of database contributions to a review on predictive analytics for disease progression: A SWAR using the R package CiteSource

Date & Time
Tuesday, September 5, 2023, 12:30 PM - 2:00 PM
Location Name
Session Type
Information retrieval
Young S1, Padman R1, Al-Sayouri SA1
1Carnegie Mellon University, USA

Background: Significant advancements in the use of artificial intelligence and machine learning (AI/ML) for the prediction of disease progression has led to a rapidly growing literature in this space. Systematic and scoping reviews in this area could benefit from methods research to improve efficiency of the review process. In the context of a scoping review on AI/ML applications in predicting the progression of chronic kidney disease, we are conducting a study within a review (SWAR) to evaluate database contributions to this topic, which lies at the intersection of biomedical and computer science research.
Objectives: We aim to evaluate the usefulness of five databases for reviews in predictive analytics and disease progression: Scopus, Medline, CINAHL, ACM Digital Library, and IEEE Xplore. We are interested in overlap across databases, unique contributions of each database, and what each database contributes to a set of benchmark studies and the different screening stages of the review.
Methods: We will use a new R package called CiteSource to assess database overlap and contribution to benchmark studies and review stages. The unique records from each database will be further evaluated on characteristics such as publication year, journal titles, topic, and keywords.
Results: We conducted a preliminary analysis of database contribution prior to screening and found that Scopus finds twice as many studies compared with Medline. There are early indications that many of these results are from Medline-indexed journals and are found owing to the liberal application of Emtree terms, searched in Scopus as part of the Keywords field. Moreover, the computer science database ACM Digital Library contributes relatively few, but mostly unique, records to the search. We will investigate these findings in more depth as well as other aspects of the multiple databases searched.
Conclusions: Understanding the contributions of multidisciplinary and discipline-specific databases to the searches for our scoping review can inform decisions about source selection for future reviews on predictive analytics for disease progression. This SWAR provides a case study using the R package CiteSource, presenting new opportunities for methods research related to source selection and search strategy validation.
Patient, public and/or healthcare consumer involvement: None.