FDA Sentinel Innovation Center
A test bed to identify, develop, and evaluate innovative methods to study drug safety and effectiveness using real-world data.

Funder: United States Food and Drug Administration
Institute Leadership: Darren Toh
Mass General Brigham Leadership: Rishi J Desai
Year Initiated: 2019
The Sentinel Innovation Center is led by Mass General Brigham and Harvard Pilgrim Health Care Institute.
Project Summary
The Sentinel Innovation Center is a test bed to identify, develop, and evaluate innovative methods to study drug safety and effectiveness using real-world data. The IC was created in response to FDA’s Medical Data Enterprise Initiative to build a new system containing electronic health records from 10 million lives and the Sentinel Five-Year Strategy. Achieving this vision requires:
- Developing new analysis tools for unstructured EHR data
- Establishing novel data sources by finding new ways to extract, standardize and quality check clinical data and free-text in EHRs
- Creating state-of-the-art approaches to identify clinical phenotypes, extract key data elements and adjust for confounding in drug safety and effectiveness studies
The Sentinel Innovation Center was created in 2019 to attract new data partnerships, engage external investigators to develop new methods, and grow the types of questions that can be addressed in the Sentinel System. Since development of a strategic plan in 2019, the IC has worked to advance learning in the following strategic priority areas:
- Data infrastructure
- Feature engineering
- Causal inference
- Detection analytics
Data infrastructure and feature engineering support development of a query-ready distributed data network containing electronic health records. The IC has established a query-ready, quality-checked distributed data network containing electronic health records (EHRs) linked with insurance claims data for at least 10 million individuals to expand the utility of real-world data for regulatory decision-making.
The Sentinel Innovation Center has also made significant advancements in feature engineering.
The "Advancing scalable natural language processing (NLP) approaches for unstructured electronic health record data" project utilized information-rich unstructured electronic health record (EHR) data to address the limitations of administrative claims data and enhance medical product safety surveillance. This project yielded algorithms and methods that position Sentinel for addressing future pandemics and advance Sentinel’s long-term objective of enhancing medical product safety surveillance by incorporating EHR data into surveillance methods. Specifically, the project utilized natural language processing (NLP) and text mining methods to identify patients with COVID-19 disease and to extract clinical features from unstructured EHR data.
The Causal Inference work stream has focused on development of methods to improve confounding control, and handle missing data in pharmacoepidemiologic analyses. "Approaches to Handling Partially Observed Confounder Data From Electronic Health Records (EHR) In Non-randomized Studies of Medication Outcomes" systematically investigated approaches to detect underlying missingness mechanisms, compare imputation approaches and showcase sensitivity analyses to build confidence in pharmacoepidemiological analyses with partially observed confounder variables. This study focused on missingness in the context of studying causal treatment effects in Electronic Health Records (EHR) and EHR-linked databases. This project developed standardized “toolkits” that can be readily implemented in EHRs to describe and, when assumptions permit, address missingness in confounding variables.
The final methodological work-stream within the Innovation Center is detection analytics. Data mining approaches such as TreeScan™ have been developed in insurance claims data for signal detection in the Sentinel infrastructure based on grouping of ICD diagnosis codes into hierarchical levels. EHRs offer a potentially promising complementary source of information for medication safety signal detection but may require tailored approaches to account for and leverage differences in data content and structure compared to insurance claims. A current project "Development and evaluation of EHR information extraction pipeline and tree based scan statistic (TBSS) methods for EHR-based signal detection" aims to 1) combine structured diagnosis codes with natural language processing (NLP) extracted terms to a common hierarchical ontology such as MedDRA to serve as a tree of outcomes; 2) develop methods to account for multi-pathway outcomes when scanning across a tree with multi-axial pathways, such as those present in the MedDRA hierarchy; 3) develop TBSS that can incorporate testing of continuous outcomes such as laboratory measurements or vital signs; 4) apply these methods and evaluate results.
Sentinel Innovation Center (IC)
Learn more about the IC's leadership, structure, and objectives.