TY - JOUR AU - Traore, L. AU - Daniel, C. AU - Jaulent, M.-C. AU - Schrader, T. AU - Racoceanu, D. AU - Kergosien, Y. TI - Sustainable Formal Representation Of Breast Cancer Grading Histopathological Knowledge JF - Diagnostic Pathology; Vol 1 No 8 (2016): 13. European Congress on Digital PathologyDO - 10.17629/www.diagnosticpathology.eu-2016-8:154 KW - N2 - Introduction/ Background Recently, histopathology has seen the introduction of several tools such as slide scanners and virtual slide technologies, creating the conditions for broader adoption of computer aided diagnosis based on whole slide images (WSI) to reduce observation variability between pathologists. This change brings up a number of new scientific challenges such as the sustainable management of the semantics associated to the grading process, image analysis and annotation in order to facilitate pre-filled report generation. The College of American Pathologists cancer checklists and protocols (CAP-CC&P) [1] are reference resources for complete Anatomic Pathology (AP) reporting of malignant tumors. Current terminology systems for AP structured reporting gather terms of very different granularity [2][3] and have not yet been compiled in a systematic approach. Semantic data models are formal representations of knowledge in a given domain that allow both human users and software applications to consistently and accurately interpret domain terminology [4][5]. Aims Our objective is to i) analyze the histopathological knowledge for breast cancer grading available in the reference CAP CC&P and ii) to build a sustainable formal representation of this knowledge based on existing bio- medical ontologies in NCBO Bioportal [6][7] and UMLS semantic types [8]. Methods Our methodology was first experimented in the context of two cancer grading methods for invasive (Nottinghamsystem) and ductal in situ breast carcinoma. A corpus consisting of 5 texts or “notes” was first selected by an AP expert from the two corresponding CAP CC&Ps. From each note the expert also extracted a list of keyconcepts to be used as a “gold standard”. We used NCBO Annotator [9] for automatic analysis of the corpus. Annotator supports the biomedical community in tagging raw texts automatically with concepts from relevant biomedical ontology and terminology repositories. The methodology used consists in: i) Automatic textual analysis and annotation of the corpus based on the 417 ontologies available on the NCBO platform. We selected a subset of ontologies based on the number of identified concepts and evaluated their relevancy with respect to the gold standard. ii) Semantic modeling of the automatically extracted concepts into a sustainable formal representation based on their UMLS semantic types. Results We identified NCIT, SNOMED-CT, NCI CaDSR Values set, LOINC and PathLex as the ontologies providing the highest number of annotated concepts. T a ble 1 shows as percentages the coverages of the concepts of each note by the annotations of the 5 reference ontologies. Percentages can add to more than 100 for a single note due to the possible overlap in ontologies coverages. T a ble 2 uses the same format when only concepts from the gold standards are counted to quantify annotations. From the list of extracted concepts, we made a preliminary formal representation of the histopathological knowledge based on the UMLS semantic types of concepts. F ig ure 1 shows the so proposed semantic modeling in the context of tubular differentiation. The novelty of this approach is the federation of the knowledge issued from different sources (CAP CC&P, NCBO ontologies and UMLS Metathesaurus) and the sustainable management of the associated semantics. This opens the perspective of building an AP observation ontology that will allow an accurate representation of AP reports understandable by both human and software applications. UR - http://www.diagnosticpathology.eu/content/index.php/dpath/article/view/154