Sustainable Formal Representation Of Breast Cancer Grading Histopathological Knowledge
Abstract
Introduction/ Background
Recently, histopathology has seen the introduction of several tools such as slide scanners and virtual slide technologies, creating the conditions for broader adoption of computer aided diagnosis based on whole slide images (WSI) to reduce observation variability between pathologists. This change brings up a number of new scientific challenges such as the sustainable management of the semantics associated to the grading process, image analysis and annotation in order to facilitate pre-filled report generation. The College of American Pathologists cancer checklists and protocols (CAP-CC&P) [1] are reference resources for complete Anatomic Pathology (AP) reporting of malignant tumors. Current terminology systems for AP structured reporting gather terms of very different granularity [2][3] and have not yet been compiled in a systematic approach. Semantic data models are formal representations of knowledge in a given domain that allow both human users and software applications to consistently and accurately interpret domain terminology [4][5].
Aims
Our objective is to i) analyze the histopathological knowledge for breast cancer grading available in the reference CAP CC&P and ii) to build a sustainable formal representation of this knowledge based on existing bio- medical ontologies in NCBO Bioportal [6][7] and UMLS semantic types [8].
Methods
Our methodology was first experimented in the context of two cancer grading methods for invasive (Nottinghamsystem) and ductal in situ breast carcinoma. A corpus consisting of 5 texts or “notes†was first selected by an AP expert from the two corresponding CAP CC&Ps. From each note the expert also extracted a list of keyconcepts to be used as a “gold standardâ€. We used NCBO Annotator [9] for automatic analysis of the corpus. Annotator supports the biomedical community in tagging raw texts automatically with concepts from relevant biomedical ontology and terminology repositories. The methodology used consists in:
i) Automatic textual analysis and annotation of the corpus based on the 417 ontologies available on the NCBO platform. We selected a subset of ontologies based on the number of identified concepts and evaluated their relevancy with respect to the gold standard.
ii) Semantic modeling of the automatically extracted concepts into a sustainable formal representation based on their UMLS semantic types.
Results
We identified NCIT, SNOMED-CT, NCI CaDSR Values set, LOINC and PathLex as the ontologies providing the highest number of annotated concepts. Table 1 shows as percentages the coverages of the concepts of each note by the annotations of the 5 reference ontologies. Percentages can add to more than 100 for a single note due to the possible overlap in ontologies coverages. Table 2 uses the same format when only concepts from the gold standards are counted to quantify annotations. From the list of extracted concepts, we made a preliminary formal representation of the histopathological knowledge based on the UMLS semantic types of concepts. Figure 1 shows the so proposed semantic modeling in the context of tubular differentiation. The novelty of this approach is the federation of the knowledge issued from different sources (CAP CC&P, NCBO ontologies and UMLS Metathesaurus) and the sustainable management of the associated semantics. This opens the perspective of building an AP observation ontology that will allow an accurate representation of AP reports understandable by both human and software applications.
Downloads
References
3.2.0.0, Invasive Breast Posted: December 18, 2013 Version:3.2.0.0 , available from: http://www.cap.org/
[2] Daniel C., Booker D., Beckwith B., Della Mea V., GarcÃa-Rojo M., Havener L., Kennedy M., Klossa J., Laurinavicius A., Macary F.,
Punys V., Scharber W., Schrader T., Standards and specifications in pathology: image management, report management and terminology,
Stud Health Technol Inf. 2012, 179: 105–122.
[3] Haroske G., Schrader T. , A reference model based interface terminology for generic observations in Anatomic Pathology Structured
Reports, Diagnostic Pathology 2014, 9 (1):S4.
[4] Bodenreider O., Biomedical ontologies in action: role in knowledge management, data integration and decision support, Yearb Med
Inform. 2008:67-79.
[5] Rubin D.L., Shah N.H., Noy N.F., Biomedical ontologies: a functional perspective, Briefings in Bioinformatics 2008, 9(1): 75-90.
[6] Musen M.A., Noy N.F., Shah N.H., Whetzel P.L., Chute C.G., Story M.A., Smith B.; NCBO team, The National Center for Biomedical
Ontology, J Am Med Inform Assoc. 2012, 19(2):190-5. Epub 2011 Nov 10.
[7] Whetzel P.L., Noy N.F., Shah N.H., Alexander P.R., Nyulas C., Tudorache T., Musen M.A., Bioportal: enhanced functionality via new
Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications, Nucleic Acids Res.
2011, 39(Web Server issue):W541-5. Epub 2011 Jun 14
[8] Bodenreider O., The Unified Medical Language System (UMLS): integrating biomedical terminology [Internet], [Accessed: 17-Dec-
2015, Avalable from: http:/nar.oxfordjournals.org
[9] Jonquet C., Shah N.H., Musen M.A., The open biomedical annotator, Summit on Translat Bioinforma. 2009, 1:56-60

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
4. In case of virtual slide publication the authors agree to copy the article in a structural modified version to the journal's VS archive.