Application Of Ki-67 Analysis In A Distributed Computing Infrastructure
Abstract
Introduction/ Background
Over the last few years, the protein Ki-67 [1] has been established as one of the most important biomarkers for cell proliferation in breast cancer. High Ki-67 values indicate high tumor growth and have direct impact on the patient’s treatment. Several automated image anal- ysis methods for identifying Ki-67-positive and negative tumor cells have been presented.
Aims
For small regions of a virtual slide, the Ki-67 analysis can be realized within an acceptable period of time. However, to analyse an entire whole slide image (WSI [2])most of the current methods are not sufficient yet. On a typical office computer, the processing time of 3,752 tiles, which were extracted from a H–DAB stained WSI, exceeded 24 hours. Therefore, we propose an approach to significantly speed up the process of analysing entire WSIs by using a distributed computing infrastructure.
Methods
To evaluate the approach, an unmodified and validated [3] [4] analysis software for Ki-67 was deployed on a six node setup supporting two different software engines: Hadoop Streaming [5] and Apache Spark [6] . Both tools support the MapReduce methodology whereas Apache Spark offers alternative programing models. In addition, heat maps visualizing the Ki-67 scores for an entire slide were generated which can provide additional informa- tion for clinical research.
Results
First results on automated and reproducible tests have been produced. By processing 3,752 tiles the speedup turned out to increase linearly with the number of tiles. The overall processing time was improved by a factor of 10, more precisely from 28 hours on a typical office computer to three hours on a distributed environment. Further optimization strategies besides WSI partitioning will be considered. To achieve additional improvements in processing speed, the underlying algorithm of a Ki-67 analysis can be examined with focus on how to adapt it towards distributed processing workflows.
Downloads
References
[2] F Ghaznavi, A Evans, A Madabhushi, and M Feldman, (2013), Digital imaging in pathology: whole-slide imaging and beyond., Annual Review of Pathology: Mechanisms of Disease, Vol. 8: 331-359, https://dx.doi. org/10.1146/annurev-pathol-011811-120902
3] F Klauschen, (2015), Standardized Ki67 Diagnostics Using Automated Scoring – Clinical Validation in the GeparTrio Breast Cancer Study, Clinical Cancer Research, https://dx.doi.org/10.1158/1078-0432.CCR-14-1283
[4] S Wienert, D Heim, K Saeger, A Stenzinger, M Beil, P Hufnagl, M Dietel, C Denkert and F Klauschen, (2012), Detection and segmentation of cell nuclei in virtual microscopy images: a minimum-model approach, Scientific Reports, https://dx.doi.org/10.1038/srep00503
[5] Apache Software Foundation, (2016), Hadoop Streaming allows to create and run Map/Reduce jobs with any executable or script as the mapper and/or the reducer., https://hadoop.apache.org/docs/current/ hadoop-streaming/HadoopStreaming.html
[6] M Zaharia, M Chowdhury, T Das, A Dave, J Ma, M McCauley, M J Frank- lin, S Shenker, I Stoica, (2012), Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing, https://people.csail.mit.edu/ matei/papers/2012/nsdi_spark.pdf

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
4. In case of virtual slide publication the authors agree to copy the article in a structural modified version to the journal's VS archive.