Sustainable Formal Representation Of Breast Cancer Grading Histopathological Knowledge

  • L. Traore Université Pierre Marie Curie, UPMC-Paris 6, LIMICS: INSERM U 1142, LIB: CNRS UMR 7371, INSERM U 1146, Paris, France Sorbonne Universités, UPMC Univ Paris 06, INSERM, Université Paris 13, Sorbonne Paris Cité, Laboratoire d’Informatique Médicale et Ingénierie des Connaissances en eSanté (LIMICS - UMR_S 1142), 15 rue de l’école de médecine, Paris, France Sorbonne Universités, UPMC Univ Paris 06, CNRS, INSERM, Laboratoire d’Imagerie Biomédicale (LIB), 75013, Paris, France
  • C. Daniel Sorbonne Universités, UPMC Univ Paris 06, INSERM, Université Paris 13, Sorbonne Paris Cité, Laboratoire d’Informatique Médicale et Ingénierie des Connaissances en eSanté (LIMICS - UMR_S 1142), 15 rue de l’école de médecine, Paris, France Assistance Publique-Hôpitaux de Paris (AP-HP), CCS SI Patient, Paris, France
  • M.-C. Jaulent Sorbonne Universités, UPMC Univ Paris 06, INSERM, Université Paris 13, Sorbonne Paris Cité, Laboratoire d’Informatique Médicale et Ingénierie des Connaissances en eSanté (LIMICS - UMR_S 1142), 15 rue de l’école de médecine, Paris, France
  • T. Schrader University of Applied Sciences Brandenburg Magdeburger, Department Informatics and Media, Brandenburg, Germany
  • D. Racoceanu Sorbonne Universités, UPMC Univ Paris 06, CNRS, INSERM, Laboratoire d’Imagerie Biomédicale (LIB), 75013, Paris, France
  • Y. Kergosien Sorbonne Universités, UPMC Univ Paris 06, INSERM, Université Paris 13, Sorbonne Paris Cité, Laboratoire d’Informatique Médicale et Ingénierie des Connaissances en eSanté (LIMICS - UMR_S 1142), 15 rue de l’école de médecine, Paris, France Département d’Informatique Université de Cergy-Pontoise, Cergy-Pontoise, France

Abstract

Introduction/ Background

Recently, histopathology has seen the introduction of several tools such as slide scanners and virtual slide technologies, creating the conditions for broader adoption of computer aided diagnosis based on whole slide images (WSI) to reduce observation variability between pathologists. This change brings up a number of new scientific challenges such as the sustainable management of the semantics associated to the grading process, image analysis and annotation in order to facilitate pre-filled report generation. The College of American Pathologists cancer checklists and protocols (CAP-CC&P) [1] are reference resources for complete Anatomic Pathology (AP) reporting of malignant tumors. Current terminology systems for AP structured reporting gather terms of very different granularity [2][3] and have not yet been compiled in a systematic approach. Semantic data models are formal representations of knowledge in a given domain that allow both human users and software applications to consistently and accurately interpret domain terminology [4][5].

Aims

Our objective is to i) analyze the histopathological knowledge for breast cancer grading available in the reference CAP CC&P and ii) to build a sustainable formal representation of this knowledge based on existing bio- medical ontologies in NCBO Bioportal [6][7] and UMLS semantic types [8].

Methods

Our methodology was first experimented in the context of two cancer grading methods for invasive (Nottinghamsystem) and ductal in situ breast carcinoma. A corpus consisting of 5 texts or “notes” was first selected by an AP expert from the two corresponding CAP CC&Ps. From each note the expert also extracted a list of keyconcepts to be used as a “gold standard”. We used NCBO Annotator [9] for automatic analysis of the corpus. Annotator supports the biomedical community in tagging raw texts automatically with concepts from relevant biomedical ontology and terminology repositories. The methodology used consists in:

i) Automatic textual analysis and annotation of the corpus based on the 417 ontologies available on the NCBO platform. We selected a subset of ontologies based on the number of identified concepts and evaluated their relevancy with respect to the gold standard.

ii) Semantic modeling of the automatically extracted concepts into a sustainable formal representation based on their UMLS semantic types.

Results

We identified NCIT, SNOMED-CT, NCI CaDSR Values set, LOINC and PathLex as the ontologies providing the highest number of annotated concepts. Table 1 shows as percentages the coverages of the concepts of each note by the annotations of the 5 reference ontologies. Percentages can add to more than 100 for a single note due to the possible overlap in ontologies coverages. Table 2 uses the same format when only concepts from the gold standards are counted to quantify annotations. From the list of extracted concepts, we made a preliminary formal representation of the histopathological knowledge based on the UMLS semantic types of concepts. Figure 1 shows the so proposed semantic modeling in the context of tubular differentiation. The novelty of this approach is the federation of the knowledge issued from different sources (CAP CC&P, NCBO ontologies and UMLS Metathesaurus) and the sustainable management of the associated semantics. This opens the perspective of building an AP observation ontology that will allow an accurate representation of AP reports understandable by both human and software applications.

References

[1] College of American Pathologists, Cancer Protocols and Checklists, 2013: DCIS – Breast Revised: December 18, 2013 Version:
3.2.0.0, Invasive Breast Posted: December 18, 2013 Version:3.2.0.0 , available from: http://www.cap.org/

[2] Daniel C., Booker D., Beckwith B., Della Mea V., García-Rojo M., Havener L., Kennedy M., Klossa J., Laurinavicius A., Macary F.,
Punys V., Scharber W., Schrader T., Standards and specifications in pathology: image management, report management and terminology,
Stud Health Technol Inf. 2012, 179: 105–122.

[3] Haroske G., Schrader T. , A reference model based interface terminology for generic observations in Anatomic Pathology Structured
Reports, Diagnostic Pathology 2014, 9 (1):S4.

[4] Bodenreider O., Biomedical ontologies in action: role in knowledge management, data integration and decision support, Yearb Med
Inform. 2008:67-79.

[5] Rubin D.L., Shah N.H., Noy N.F., Biomedical ontologies: a functional perspective, Briefings in Bioinformatics 2008, 9(1): 75-90.

[6] Musen M.A., Noy N.F., Shah N.H., Whetzel P.L., Chute C.G., Story M.A., Smith B.; NCBO team, The National Center for Biomedical
Ontology, J Am Med Inform Assoc. 2012, 19(2):190-5. Epub 2011 Nov 10.

[7] Whetzel P.L., Noy N.F., Shah N.H., Alexander P.R., Nyulas C., Tudorache T., Musen M.A., Bioportal: enhanced functionality via new
Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications, Nucleic Acids Res.
2011, 39(Web Server issue):W541-5. Epub 2011 Jun 14

[8] Bodenreider O., The Unified Medical Language System (UMLS): integrating biomedical terminology [Internet], [Accessed: 17-Dec-
2015, Avalable from: http:/nar.oxfordjournals.org

[9] Jonquet C., Shah N.H., Musen M.A., The open biomedical annotator, Summit on Translat Bioinforma. 2009, 1:56-60
How to Cite
TRAORE, L. et al. Sustainable Formal Representation Of Breast Cancer Grading Histopathological Knowledge. Diagnostic Pathology, [S.l.], v. 1, n. 8, june 2016. ISSN 2364-4893. Available at: <http://www.diagnosticpathology.eu/content/index.php/dpath/article/view/154>. Date accessed: 26 may 2019. doi: https://doi.org/10.17629/www.diagnosticpathology.eu-2016-8:154.