Standardized Processing Algorithms of Medical Text Big Data
Project description/goals
Electronic Medical Records (EMRs) are the electronic format storage of patients' medical health data and information, including text, symbols, charts, graphs, data, images and other digital information. Clinical text is the richest data type of medical knowledge in EMRs, and is also a kind of personalized medical data directly for patients, containing clinical experience embodied by doctors based on their own knowledge. Knowledge extraction and mining of clinical experience based on clinical text data is a major research trend in recent years.
Importance/impact, challenges/pain points
The standardized processing algorithms is expected to effectively improve the quality of common data elements of Chinese and Western (CW) medicine clinical text big data, so that the cross-institutional integration and reuse of fragmented clinical text data, knowledge discovery and innovation of clinical experience, knowledge transmission of experienced Chinese medicine practitioners, and cross-institutional migration of intelligent diagnostic models becomes possible, which is a major scientific frontier problem that needs to be solved urgently.
Solution description
Building a corpus of Chinese clinical text; Using a unified information model and normalized metadata modeling method to build a semantically supported, shareable and reusable set of standardized clinical terms for CW medicine descriptions; Using natural language processing technology to achieve automatic extraction, annotation, clustering and association analysis of features of CW medicine clinical text data.
Key contribution/commercial implication
The successful implementation of this project will contribute to: the construction of a standard terminology set of CW medicine clinical texts with semantic support and dynamic update and evolution technology system; the intelligent extraction of disease symptoms, signs, imaging changes, treatment plans, side effects and other entities in CW medicine clinical texts to realize the provision of standardized fine-grained data; the mining of disease condition evolution patterns hidden behind these texts to provide data support for the detection of condition change and early intervention; and finally the sharing and exchange of CW clinical big data to promote the health-related application.
Next steps
We have launched scientific research in three tasks: medical report summary generation, automatic medical report generation and biomedical naming entity recognition, and have already produced research outputs in each of these three parts. The next step is to bring some of these research outputs to the clinical end and eventually to the product.
Collaborators/partners
Guangdong Provincial People's Hospital
Team/contributors
Xiang Wan,Zhihong Chen,Jinpeng Hu,Yang Liu