Clinical concept recognition (CCR) is a fundamental task in clinical natural language processing (NLP) field. Almost all current machine learning-based CCR systems can only recognize clinical concepts of consecutive words (called consecutive clinical concepts, CCCs), but can do nothing about clinical concepts of disjoint words (called disjoint clinical concepts, DCCs), which widely exist in clinical text. In this paper, we proposed two novel types of representations for disjoint clinical concepts, and applied two state-of-the-art machine learning methods to recognizing consecutive and disjoint concepts. Experiments conducted on the 2013 ShARe/CLEF challenge corpus showed that our best system achieved a “strict” F-measure of 0.803 for CCCs, a “strict” F-measure of 0.477 for DCCs, and a “strict” F-measure of 0.783 for all clinical concepts, significantly higher than the baseline systems by 4.2% and 4.1% respectively.
展开▼