Development of a Foundation for a Registry and Clinical Trial Information Input Support System Using Natural Language Processing of Electronic Medical Record Data
Abstract
In contemporary clinical medicine, it is essential to identify results from previous studies that are relevant to individual patient care and to provide evidence-based treatment accordingly. To produce research findings that support clinical practice, it is crucial to conduct studies that compile clinical information from large numbers of patients. Among such studies, registry research and clinical trials are particularly important study designs. One of the most time-intensive processes in research is the collection of patient health information and clinical outcomes (referring to medical results). This process is generally carried out by healthcare professionals manually reviewing medical records. Because this task must be performed for numerous patients and across many data points, the current state of clinical research requires considerable human effort for data collection.
In recent years, artificial intelligence (AI) has advanced significantly, achieving a substantial ability to "understand" natural language--the language humans use in daily communication. This capability, known as natural language processing (NLP), underpins large language models (LLMs). Our JH research and development project aims to build a system where LLMs can handle, or assist in, tasks such as collecting patient information and clinical outcomes, thereby improving the efficiency of data extraction.
A unique aspect of this project is its collaboration with six National Centers in Japan (NCs), with the goal of creating a system applicable to a wide range of medical fields. The project plans to utilize research data already collected by these six NCs as training data, enabling the development of a system where LLMs can extract "correct" information from text data in electronic medical records.
Furthermore, the project will involve the development of a consent management system for clinical research, which will allow patients to manage their participation consent directly. This system will employ highly tamper-resistant technology, a blockchain. Additionally, the initiative will include the development of a Personal Health Record (PHR) system, enabling patients to access their health information directly.
Image
Perspectives
- Reducing the burden of data entry and enabling large-scale data collection
- Promoting the recruitment of participants for clinical trials
- Accelerating the creation of historical controls for rare diseases
- Speeding up the acquisition of event rates and covariate information for sample size calculation
- Streamlining the consent acquisition process with blockchain technology
- Promoting secondary use of clinical trial data
Comments from principal researcher
Kunihiro Nishimura, MD, PhD, MPH, MS
Director, Department of Preventive Medicine and Epidemiology, National Cerebral and Cardiovascular Center

This research aims to incorporate advancements in large language models (LLMs) to address workforce shortages in clinical research, reduce the burden and costs for clinicians and on-site personnel, and develop an efficient framework to promote digital transformation (DX) that contributes to work style reforms. While enabling the streamlining and acceleration of clinical research promotion and the publication of academic papers, it will also utilize Personal Health Records (PHR) and blockchain technology to empower patients with autonomy and facilitate swift and reliable monitoring.
Shared Researchers
National Cerebral and Cardiovascular Center (NCVC)
Koji Iihara, Director General, NCVC Hospital
Teruo Noguchi, Deputy Director, NCVC Hospital
Kengo Kusano, Department of Cardiovascular Medicine
Haruhiko Hiramatsu, Risa Sakurai, Department of Medical Informatics
Soshiro Ogata, Department of Preventive Medicine and Epidemiology
Yusuke Yoshikawa, Department of Biostatistics
National Center for Geriatrics and Gerontology
Takashi Sakurai, Director General, Research Institute
Hiroshi Watanabe, Innovation Center for Translational Researc
National Center for Global Health and Medicine
Kengo Miyo, Ryota Nishi, Department of Planning Information and Management
Mitsuru Ohsugi, Diabetes Research Center
National Cancer Center
Katsuya Tanaka, Masami Mukai, Department of Medical Informatics
Takahiro Higashi, Center for Cancer Registries
National Center for Child Health and Development
Satoko Uematsu, Divison of Pediatric Emergency and Transport Services
Takashi Noguchi, Department of Information Technology and Management
National Center of Neurology and Psychiatry
Kenji Hatano, Department of Clinical Data Science, Clinical Research & Education Promotion Division
