Application Of Ki-67 Analysis In A Distributed Computing Infrastructure

  • Marco Strutz University of Applied Sciences, HTW, Berlin, Germany
  • B. Lindequist University of Applied Sciences, HTW, Berlin, Germany
  • M. Witt University of Applied Sciences, HTW, Berlin, Germany
  • H. Heßling University of Applied Sciences, HTW, Berlin, Germany
  • P. Hufnagl Charité - Universitätsmedizin Berlin, Institute of Pathology, Berlin, Germany
  • D. Krefting University of Applied Sciences, HTW, Berlin, Germany


Introduction/ Background

Over the last few years, the protein Ki-67 [1] has been established as one of the most important biomarkers for cell proliferation in breast cancer. High Ki-67 values indicate high tumor growth and have direct impact on the patient’s treatment. Several automated image anal- ysis methods for identifying Ki-67-positive and negative tumor cells have been presented.


For small regions of a virtual slide, the Ki-67 analysis can be realized within an acceptable period of time. However, to analyse an entire whole slide image (WSI [2])most of the current methods are not sufficient yet. On a typical office computer, the processing time of 3,752 tiles, which were extracted from a H–DAB stained WSI, exceeded 24 hours. Therefore, we propose an approach to significantly speed up the process of analysing entire WSIs by using a distributed computing infrastructure.


To evaluate the approach, an unmodified and validated [3] [4] analysis software for Ki-67 was deployed on a six node setup supporting two different software engines: Hadoop Streaming [5] and Apache Spark [6] . Both tools support the MapReduce methodology whereas Apache Spark offers alternative programing models. In addition, heat maps visualizing the Ki-67 scores for an entire slide were generated which can provide additional informa- tion for clinical research.


First results on automated and reproducible tests have been produced. By processing 3,752 tiles the speedup turned out to increase linearly with the number of tiles. The overall processing time was improved by a factor of 10, more precisely from 28 hours on a typical office computer to three hours on a distributed environment. Further optimization strategies besides WSI partitioning will be considered. To achieve additional improvements in processing speed, the underlying algorithm of a Ki-67 analysis can be examined with focus on how to adapt it towards distributed processing workflows.


[1] D G Booth, M Takagi, L Sanchez-Pulido, E Petfalski, G Vargiu, K Same- jima, N Imamoto, C P Ponting, D Tollervey, W C Earnshaw, P Vagnarelli, (2014), Ki-67 is a PP1-interacting protein that organises the mitotic chromo- some periphery, eLife 2014,

[2] F Ghaznavi, A Evans, A Madabhushi, and M Feldman, (2013), Digital imaging in pathology: whole-slide imaging and beyond., Annual Review of Pathology: Mechanisms of Disease, Vol. 8: 331-359, https://dx.doi. org/10.1146/annurev-pathol-011811-120902

3] F Klauschen, (2015), Standardized Ki67 Diagnostics Using Automated Scoring – Clinical Validation in the GeparTrio Breast Cancer Study, Clinical Cancer Research,

[4] S Wienert, D Heim, K Saeger, A Stenzinger, M Beil, P Hufnagl, M Dietel, C Denkert and F Klauschen, (2012), Detection and segmentation of cell nuclei in virtual microscopy images: a minimum-model approach, Scientific Reports,

[5] Apache Software Foundation, (2016), Hadoop Streaming allows to create and run Map/Reduce jobs with any executable or script as the mapper and/or the reducer., hadoop-streaming/HadoopStreaming.html

[6] M Zaharia, M Chowdhury, T Das, A Dave, J Ma, M McCauley, M J Frank- lin, S Shenker, I Stoica, (2012), Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing, matei/papers/2012/nsdi_spark.pdf
How to Cite
STRUTZ, Marco et al. Application Of Ki-67 Analysis In A Distributed Computing Infrastructure. Diagnostic Pathology, [S.l.], v. 1, n. 8, june 2016. ISSN 2364-4893. Available at: <>. Date accessed: 26 may 2019. doi: