Creating efficiencies in the extraction of data from randomized trials a prospective evaluation of a machine learning and text mining tool

BACKGROUND: Machine learning tools that semi-automate data extraction may create efficiencies in systematic review production. We prospectively evaluated an online machine learning and text mining tool's ability to (a) automatically extract data elements from randomized trials, and (b) save tim...

Full description

Bibliographic Details
Main Author: Gates, Allison
Corporate Authors: United States Agency for Healthcare Research and Quality, University of Alberta Evidence-based Practice Center
Format: eBook
Language:English
Published: Rockville, MD Agency for Healthcare Research and Quality August 2021, 2021
Series:Methods research report
Online Access:
Collection: National Center for Biotechnology Information - Collection details see MPG.ReNa
Description
Summary:BACKGROUND: Machine learning tools that semi-automate data extraction may create efficiencies in systematic review production. We prospectively evaluated an online machine learning and text mining tool's ability to (a) automatically extract data elements from randomized trials, and (b) save time compared with manual extraction and verification. METHODS: For 75 randomized trials published in 2017, we manually extracted and verified data for 21 unique data elements. We uploaded the randomized trials to ExaCT, an online machine learning and text mining tool, and quantified performance by evaluating the tool's ability to identify the reporting of data elements (reported or not reported), and the relevance of the extracted sentences, fragments, and overall solutions. For each randomized trial, we measured the time to complete manual extraction and verification, and to review and amend the data extracted by ExaCT (simulating semi-automated data extraction).
Among a median (IQR) 90 percent (86% to 96%) of relevant sentences, pertinent fragments had been highlighted by the system; exact matches were unreliable (median (IQR) 52 percent [32% to 73%]). A median 48 percent of solutions were fully correct, but performance varied greatly across data elements (IQR 21% to 71%). Using ExaCT to assist the first reviewer resulted in a modest time savings compared with manual extraction by a single reviewer (17.9 vs. 21.6 hours total extraction time across 75 randomized trials). CONCLUSIONS: Using ExaCT to assist with data extraction resulted in modest gains in efficiency compared with manual extraction. The tool was reliable for identifying the reporting of most data elements. The tool's ability to identify at least one relevant sentence and highlight pertinent fragments was generally good, but changes to sentence selection and/or highlighting were often required
We summarized the relevance of the extractions for each data element using counts and proportions, and calculated the median and interquartile range (IQR) across data elements. We calculated the median (IQR) time for manual and semiautomated data extraction, and overall time savings. RESULTS: The tool identified the reporting (reported or not reported) of data elements with median (IQR) 91 percent (75% to 99%) accuracy. Performance was perfect for four data elements: eligibility criteria, enrolment end date, control arm, and primary outcome(s). Among the top five sentences for each data element at least one sentence was relevant in a median (IQR) 88 percent (83% to 99%) of cases. Performance was perfect for four data elements: funding number, registration number, enrolment start date, and route of administration.
Physical Description:1 PDF file (various pagings) illustrations