Publications

Journal and Magazine Articles

The State of Serverless Applications: Collection, Characterization, and Community Consensus. Eismann, Simon; Scheuner, Joel; van Eyk, Erwin; Schwinger, Maximilian; Grohmann, Johannes; Herbst, Nikolas; Abad, Cristina; Iosup, Alexandru; in Transactions on Software Engineering (2022). 48(10) 4152–4166.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ Download ]
Over the last five years, all major cloud platform providers have increased their serverless offerings. Many early adopters report significant benefits for serverless-based over traditional applications, and many companies are considering moving to serverless themselves. However, currently there exist only few, scattered, and sometimes even conflicting reports on when serverless applications are well suited and what the best practices for their implementation are. We address this problem in the present study about the state of serverless applications. We collect descriptions of 89 serverless applications from open-source projects, academic literature, industrial literature, and domain-specific feedback. We analyze 16 characteristics that describe why and when successful adopters are using serverless applications, and how they are building them. We further compare the results of our characterization study to 10 existing, mostly industrial, studies and datasets; this allows us to identify points of consensus across multiple studies, investigate points of disagreement, and overall confirm the validity of our results. The results of this study can help managers to decide if they should adopt serverless technology, engineers to learn about current practices of building serverless applications, and researchers and platform providers to better understand the current landscape of serverless applications.

@article{eismann2021state, abstract = {Over the last five years, all major cloud platform providers have increased their serverless offerings. Many early adopters report significant benefits for serverless-based over traditional applications, and many companies are considering moving to serverless themselves. However, currently there exist only few, scattered, and sometimes even conflicting reports on when serverless applications are well suited and what the best practices for their implementation are. We address this problem in the present study about the state of serverless applications. We collect descriptions of 89 serverless applications from open-source projects, academic literature, industrial literature, and domain-specific feedback. We analyze 16 characteristics that describe why and when successful adopters are using serverless applications, and how they are building them. We further compare the results of our characterization study to 10 existing, mostly industrial, studies and datasets; this allows us to identify points of consensus across multiple studies, investigate points of disagreement, and overall confirm the validity of our results. The results of this study can help managers to decide if they should adopt serverless technology, engineers to learn about current practices of building serverless applications, and researchers and platform providers to better understand the current landscape of serverless applications.}, author = {Eismann, Simon and Scheuner, Joel and van Eyk, Erwin and Schwinger, Maximilian and Grohmann, Johannes and Herbst, Nikolas and Abad, Cristina and Iosup, Alexandru}, journal = {Transactions on Software Engineering}, keywords = {t_journalmagazine}, number = 10, pages = {4152- 4166}, title = {The State of Serverless Applications: Collection, Characterization, and Community Consensus}, volume = 48, year = 2022 }
SARDE: A Framework for Continuous and Self-Adaptive Resource Demand Estimation. Grohmann, Johannes; Eismann, Simon; Bauer, Andr{é}; Spinner, Simon; Blum, Johannes; Herbst, Nikolas; Kounev, Samuel; in ACM Transactions on Autonomous and Adaptive Systems (2021). 15(2) Association for Computing Machinery, New York, NY, USA.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ Download ]
Resource demands are crucial parameters for modeling and predicting the performance of software systems. Currently, resource demand estimators are usually executed once for system analysis. However, the monitored system, as well as the resource demand itself, are subject to constant change in runtime environments. These changes additionally impact the applicability, the required parametrization as well as the resulting accuracy of individual estimation approaches. Over time, this leads to invalid or outdated estimates, which in turn negatively influence the decision-making of adaptive systems. In this article, we present SARDE, a framework for self-adaptive resource demand estimation in continuous environments. SARDE dynamically and continuously tunes, selects, and executes an ensemble of resource demand estimation approaches to adapt to changes in the environment. This creates an autonomous and unsupervised ensemble estimation technique, providing reliable resource demand estimations in dynamic environments. We evaluate SARDE using two realistic datasets. One set of different micro-benchmarks reflecting different possible system states and one dataset consisting of a continuously running application in a changing environment. Our results show that by continuously applying online optimization, selection and estimation, SARDE is able to efficiently adapt to the online trace and reduce the model error using the resulting ensemble technique.

@article{GrEiBaSpBlHeKo2021-TaaS-SARDE, abstract = {Resource demands are crucial parameters for modeling and predicting the performance of software systems. Currently, resource demand estimators are usually executed once for system analysis. However, the monitored system, as well as the resource demand itself, are subject to constant change in runtime environments. These changes additionally impact the applicability, the required parametrization as well as the resulting accuracy of individual estimation approaches. Over time, this leads to invalid or outdated estimates, which in turn negatively influence the decision-making of adaptive systems. In this article, we present SARDE, a framework for self-adaptive resource demand estimation in continuous environments. SARDE dynamically and continuously tunes, selects, and executes an ensemble of resource demand estimation approaches to adapt to changes in the environment. This creates an autonomous and unsupervised ensemble estimation technique, providing reliable resource demand estimations in dynamic environments. We evaluate SARDE using two realistic datasets. One set of different micro-benchmarks reflecting different possible system states and one dataset consisting of a continuously running application in a changing environment. Our results show that by continuously applying online optimization, selection and estimation, SARDE is able to efficiently adapt to the online trace and reduce the model error using the resulting ensemble technique.}, address = {New York, NY, USA}, author = {Grohmann, Johannes and Eismann, Simon and Bauer, Andr\'{e} and Spinner, Simon and Blum, Johannes and Herbst, Nikolas and Kounev, Samuel}, journal = {ACM Transactions on Autonomous and Adaptive Systems}, keywords = {t_journalmagazine}, month = {06}, number = 2, publisher = {Association for Computing Machinery}, title = {SARDE: A Framework for Continuous and Self-Adaptive Resource Demand Estimation}, volume = 15, year = 2021 }
Serverless Applications:Why, When, and How?. Eismann, Simon; Joel, Scheuner; van Eyk, Erwin; Schwinger, Maximilian; Grohmann, Johannes; Herbst, Nikolas; Abad, Cristina; Iosup, Alexandru; in IEEE Software (2021). 38(1) 32–39.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
Serverless computing shows good promise for efficiency and ease-of-use. Yet, there are only a few, scattered and sometimes conflicting reports on questions such as Why do so many companies adopt serverless?, When are serverless applications well suited?, and How are serverless applications currently implemented? To address these questions, we analyze 89 serverless applications from open-source projects, industrial sources, academic literature, and scientific computing—the most extensive study to date.

@article{eismann2020serverless, abstract = {Serverless computing shows good promise for efficiency and ease-of-use. Yet, there are only a few, scattered and sometimes conflicting reports on questions such as Why do so many companies adopt serverless?, When are serverless applications well suited?, and How are serverless applications currently implemented? To address these questions, we analyze 89 serverless applications from open-source projects, industrial sources, academic literature, and scientific computing—the most extensive study to date.}, author = {Eismann, Simon and Joel, Scheuner and van Eyk, Erwin and Schwinger, Maximilian and Grohmann, Johannes and Herbst, Nikolas and Abad, Cristina and Iosup, Alexandru}, journal = {IEEE Software}, keywords = {t_journalmagazine}, number = 1, pages = {32–39}, title = {Serverless Applications:Why, When, and How?}, volume = 38, year = 2021 }
A Taxonomy of Techniques for SLO Failure Prediction in Software Systems. Grohmann, Johannes; Herbst, Nikolas; Chalbani, Avi; Arian, Yair; Peretz, Noam; Kounev, Samuel; in Computers (2020). 9(1) 10. Multidisciplinary Digital Publishing Institute (MDPI).
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
Failure prediction is an important aspect of self-aware computing systems. Therefore, a multitude of different approaches has been proposed in the literature over the past few years. In this work, we propose a taxonomy for organizing works focusing on the prediction of Service Level Objective (SLO) failures. Our taxonomy classifies related work along the dimensions of the prediction target (e.g., anomaly detection, performance prediction, or failure prediction), the time horizon (e.g., detection or prediction, online or offline application), and the applied modeling type (e.g., time series forecasting, machine learning, or queueing theory). The classification is derived based on a systematic mapping of relevant papers in the area. Additionally, we give an overview of different techniques in each sub-group and address remaining challenges in order to guide future research.

@article{grohmann2020-MDPI-taxonomy, abstract = {Failure prediction is an important aspect of self-aware computing systems. Therefore, a multitude of different approaches has been proposed in the literature over the past few years. In this work, we propose a taxonomy for organizing works focusing on the prediction of Service Level Objective (SLO) failures. Our taxonomy classifies related work along the dimensions of the prediction target (e.g., anomaly detection, performance prediction, or failure prediction), the time horizon (e.g., detection or prediction, online or offline application), and the applied modeling type (e.g., time series forecasting, machine learning, or queueing theory). The classification is derived based on a systematic mapping of relevant papers in the area. Additionally, we give an overview of different techniques in each sub-group and address remaining challenges in order to guide future research.}, author = {Grohmann, Johannes and Herbst, Nikolas and Chalbani, Avi and Arian, Yair and Peretz, Noam and Kounev, Samuel}, journal = {Computers}, keywords = {automated_model_learning}, number = 1, pages = 10, publisher = {Multidisciplinary Digital Publishing Institute (MDPI)}, title = {A Taxonomy of Techniques for SLO Failure Prediction in Software Systems}, volume = 9, year = 2020 }
The SPEC-RG Reference Architecture for FaaS: From Microservices and Containers to Serverless Platforms. van Eyk, Erwin; Grohmann, Johannes; Eismann, Simon; Bauer, Andr{é}; Versluis, Laurens; Toader, Lucian; Schmitt, Norbert; Herbst, Nikolas; Abad, Cristina L.; Iosup, Alexandru; in IEEE Internet Computing (2019). 23(6) 7–18. IEEE.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
Microservices, containers, and serverless computing belong to a trend toward applications composed of many small, self-contained, and automatically managed components. Core to serverless computing, Function-as-a-Service (FaaS) platforms employ state-of-the-art container technology and microservices-based architectures to enable users to manage complex applications without the need for systems-level expertise. Victim of its own success, and partially due to proprietary technology, currently the community has a limited overview of these platforms. To address this, we propose a reference architecture and ecosystem for FaaS platforms. Based on a year-long survey of real-world platforms conducted within the SPEC-RG Cloud Group, we highlight specific components and identify common operational patterns.

@article{vEGrEiBaVeToScHeAbIo-IC-FaaS, abstract = {Microservices, containers, and serverless computing belong to a trend toward applications composed of many small, self-contained, and automatically managed components. Core to serverless computing, Function-as-a-Service (FaaS) platforms employ state-of-the-art container technology and microservices-based architectures to enable users to manage complex applications without the need for systems-level expertise. Victim of its own success, and partially due to proprietary technology, currently the community has a limited overview of these platforms. To address this, we propose a reference architecture and ecosystem for FaaS platforms. Based on a year-long survey of real-world platforms conducted within the SPEC-RG Cloud Group, we highlight specific components and identify common operational patterns.}, author = {van Eyk, Erwin and Grohmann, Johannes and Eismann, Simon and Bauer, Andr{\'e} and Versluis, Laurens and Toader, Lucian and Schmitt, Norbert and Herbst, Nikolas and Abad, Cristina L. and Iosup, Alexandru}, journal = {IEEE Internet Computing}, keywords = {SPEC}, month = 11, number = 6, pages = {7–18}, publisher = {IEEE}, title = {The SPEC-RG Reference Architecture for FaaS: From Microservices and Containers to Serverless Platforms}, volume = 23, year = 2019 }
Online model learning for self-aware computing infrastructures. Spinner, Simon; Grohmann, Johannes; Eismann, Simon; Kounev, Samuel; in Journal of Systems and Software (2019). 147 1–16.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
Performance models are valuable and powerful tools for performance prediction. However, the creation of performance models usually requires significant manual effort. Furthermore, as the modeled structures are subject to frequent change in modern infrastructures, such performance models need to be adapted as well. We therefore propose a reference architecture for online model learning in virtualized environments, which enables the automatic extraction of the aforementioned performance models. We follow an agent-based approach, which enables us to incorporate the extraction of information about the application structure as well as the virtualization structures present in modern computing centers. Our evaluation shows that our collaborating agents are able to reduce the manual effort of performance model extraction by 85.4%. The resulting performance model is able to predict the system utilization with an absolute error of less than 4% and the end-to-end response time with a relative error of less than 21%.

@article{SpGrEiKo2019-JSS-ModelLearning, abstract = {Performance models are valuable and powerful tools for performance prediction. However, the creation of performance models usually requires significant manual effort. Furthermore, as the modeled structures are subject to frequent change in modern infrastructures, such performance models need to be adapted as well. We therefore propose a reference architecture for online model learning in virtualized environments, which enables the automatic extraction of the aforementioned performance models. We follow an agent-based approach, which enables us to incorporate the extraction of information about the application structure as well as the virtualization structures present in modern computing centers. Our evaluation shows that our collaborating agents are able to reduce the manual effort of performance model extraction by 85.4%. The resulting performance model is able to predict the system utilization with an absolute error of less than 4% and the end-to-end response time with a relative error of less than 21%.}, author = {Spinner, Simon and Grohmann, Johannes and Eismann, Simon and Kounev, Samuel}, journal = {Journal of Systems and Software}, keywords = {Virtualization}, pages = {1–16}, title = {Online model learning for self-aware computing infrastructures}, volume = 147, year = 2019 }

Full Conference Papers

Why Is It Not Solved Yet? Challenges for Production-Ready Autoscaling. Straesser, Martin; Grohmann, Johannes; von Kistowski, J{ó}akim; Eismann, Simon; Bauer, Andr{é}; Kounev, Samuel; in Proceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering (2022). 105–115. Association for Computing Machinery, New York, NY, USA.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ Download ]
Autoscaling is a task of major importance in the cloud computing domain as it directly affects both operating costs and customer experience. Although there has been active research in this area for over ten years now, there is still a significant gap between the proposed methods in the literature and the deployed autoscalers in practice. Hence, many research autoscalers do not find their way into production deployments. This paper describes six core challenges that arise in production systems that are still not solved by most research autoscalers. We illustrate these problems through experiments in a realistic cloud environment with a real-world multi-service business application and show that commonly used autoscalers have various shortcomings. In addition, we analyze the behavior of overloaded services and show that these can be problematic for existing autoscalers. Generally, we analyze that these challenges are only insufficiently addressed in the literature and conclude that future scaling approaches should focus on the needs of production systems.

@inproceedings{10.1145/3489525.3511680, abstract = {Autoscaling is a task of major importance in the cloud computing domain as it directly affects both operating costs and customer experience. Although there has been active research in this area for over ten years now, there is still a significant gap between the proposed methods in the literature and the deployed autoscalers in practice. Hence, many research autoscalers do not find their way into production deployments. This paper describes six core challenges that arise in production systems that are still not solved by most research autoscalers. We illustrate these problems through experiments in a realistic cloud environment with a real-world multi-service business application and show that commonly used autoscalers have various shortcomings. In addition, we analyze the behavior of overloaded services and show that these can be problematic for existing autoscalers. Generally, we analyze that these challenges are only insufficiently addressed in the literature and conclude that future scaling approaches should focus on the needs of production systems.}, address = {New York, NY, USA}, author = {Straesser, Martin and Grohmann, Johannes and von Kistowski, J\'{o}akim and Eismann, Simon and Bauer, Andr\'{e} and Kounev, Samuel}, booktitle = {Proceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering}, keywords = {microservices}, pages = {105–115}, publisher = {Association for Computing Machinery}, series = {ICPE '22}, title = {Why Is It Not Solved Yet? Challenges for Production-Ready Autoscaling}, year = 2022 }
{Investigating the Predictability of QoS Metrics in Cellular Networks}. Herrnleben, Stefan; Grohmann, Johannes; Lesch, Veronika; Prantl, Thomas; Metzger, Florian; Hoßfeld, Tobias; Kounev, Samuel; in 2022 IEEE/ACM 30th International Symposium on Quality of Service (IWQoS) (2022). 1–10.
- [ BibTeX ]
@inproceedings{HeGrLePrMeHoKo-2022-IWQoS-QoS-Prediction-Cellular-Networks, author = {Herrnleben, Stefan and Grohmann, Johannes and Lesch, Veronika and Prantl, Thomas and Metzger, Florian and Hoßfeld, Tobias and Kounev, Samuel}, booktitle = {2022 IEEE/ACM 30th International Symposium on Quality of Service (IWQoS)}, keywords = {descartes}, pages = {1-10}, title = {{Investigating the Predictability of QoS Metrics in Cellular Networks}}, year = 2022 }
A Simulation-based Optimization Framework for Online Adaptation of Networks. Herrnleben, Stefan; Grohmann, Johannes; Rygielski, Pitor; Lesch, Veronika; Krupitzer, Christian; Kounev, Samuel; in Proceedings of the 12th EAI International Conference on Simulation Tools and Techniques (SIMUtools), H. Song, D. Jiang (eds.) (2021). 513–532. Springer International Publishing, Cham.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
Today's data centers face a rapid change of deployed services, growing complexity, and increasing performance requirements. Customers expect not only round-the-clock availability of the hosted services but also high responsiveness. Besides optimizing software architectures and deployments, networks have to be adapted to handle the changing and volatile demands. Approaches from self-adaptive systems can be used for optimizing data center networks to continuously meet Service Level Agreements (SLAs) between data center operators and customers. However, existing approaches focus only on specific objectives like topology design, power optimization, or traffic engineering. In this paper, we present an extensible framework that analyzes networks using different types of simulation and adapts them subject to multiple objectives using various adaptation techniques. Analyzing each suggested adaptation ensures that performance requirements and SLAs are continuously met. We evaluate our framework w.r.t. (i) general requirements and assessments of languages and frameworks for adaptation models, (ii) finding Pareto-optimal solutions considering a multi-dimensional cost model, and (iii) scalability. The evaluation shows that our approach detects the bottlenecks and the violated SLAs correctly, outputs valid and cost-optimal adaptations, and keeps the runtime for the adaptation process constant even with increasing network size and an increasing number of alternative configurations.

@inproceedings{HeGrRyLeKrKo2020-SIMUtools-Network-Online-Adaptation, abstract = {Today's data centers face a rapid change of deployed services, growing complexity, and increasing performance requirements. Customers expect not only round-the-clock availability of the hosted services but also high responsiveness. Besides optimizing software architectures and deployments, networks have to be adapted to handle the changing and volatile demands. Approaches from self-adaptive systems can be used for optimizing data center networks to continuously meet Service Level Agreements (SLAs) between data center operators and customers. However, existing approaches focus only on specific objectives like topology design, power optimization, or traffic engineering. In this paper, we present an extensible framework that analyzes networks using different types of simulation and adapts them subject to multiple objectives using various adaptation techniques. Analyzing each suggested adaptation ensures that performance requirements and SLAs are continuously met. We evaluate our framework w.r.t. (i) general requirements and assessments of languages and frameworks for adaptation models, (ii) finding Pareto-optimal solutions considering a multi-dimensional cost model, and (iii) scalability. The evaluation shows that our approach detects the bottlenecks and the violated SLAs correctly, outputs valid and cost-optimal adaptations, and keeps the runtime for the adaptation process constant even with increasing network size and an increasing number of alternative configurations.}, address = {Cham}, author = {Herrnleben, Stefan and Grohmann, Johannes and Rygielski, Pitor and Lesch, Veronika and Krupitzer, Christian and Kounev, Samuel}, booktitle = {Proceedings of the 12th EAI International Conference on Simulation Tools and Techniques (SIMUtools)}, editor = {Song, Houbing and Jiang, Dingde}, keywords = {networking}, month = {08}, pages = {513–532}, publisher = {Springer International Publishing}, series = {SIMUtools 2020}, title = {A Simulation-based Optimization Framework for Online Adaptation of Networks}, year = 2021 }
A Predictive Maintenance Methodology: Predicting the Time-to-Failure of Machines in Industry 4.0. Züfle, Marwin; Agne, Joachim; Grohmann, Johannes; Dörtoluk, Ibrahim; Kounev, Samuel; in Proceedings of the 21st IEEE IES International Conference on Industrial Informatics (2021). IEEE.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
Predictive maintenance is an essential aspect of the concept of Industry 4.0. In contrast to previous maintenance strategies, which plan repairs based on periodic schedules or threshold values, predictive maintenance is normally based on estimating the time-to-failure of machines. Thus, predictive maintenance enables a more efficient and effective maintenance approach. Although much research has already been done on time-to-failure prediction, most existing works provide only specialized approaches for specific machines. In most cases, these are either rotary machines (i.e., bearings) or lithium-ion batteries. To bridge the gap to a more general time-to-failure prediction, we propose a generic end-to-end predictive maintenance methodology for the time-to-failure prediction of industrial machines. Our methodology exhibits a number of novel aspects including a universally applicable method for feature extraction based on different types of sensor data, well-known feature transformation and selection techniques, adjustable target class assignment based on fault records with three different labeling strategies, and the training of multiple state-of-the-art machine learning classification models including hyperparameter optimization. We evaluated our time-to-failure prediction methodology in a real-world case study consisting of monitoring data gathered over several years from a large industrial press. The results demonstrated the effectiveness of the proposed methodology for six different time-to-failure pre-diction windows, as well as for the downscaled binary prediction of impending failures. In this case study, the multi-class feed-forward neural network model achieved the overall best results.

@inproceedings{ZuAgGrDoKo-INDIN2021-PdM, abstract = {Predictive maintenance is an essential aspect of the concept of Industry 4.0. In contrast to previous maintenance strategies, which plan repairs based on periodic schedules or threshold values, predictive maintenance is normally based on estimating the time-to-failure of machines. Thus, predictive maintenance enables a more efficient and effective maintenance approach. Although much research has already been done on time-to-failure prediction, most existing works provide only specialized approaches for specific machines. In most cases, these are either rotary machines (i.e., bearings) or lithium-ion batteries. To bridge the gap to a more general time-to-failure prediction, we propose a generic end-to-end predictive maintenance methodology for the time-to-failure prediction of industrial machines. Our methodology exhibits a number of novel aspects including a universally applicable method for feature extraction based on different types of sensor data, well-known feature transformation and selection techniques, adjustable target class assignment based on fault records with three different labeling strategies, and the training of multiple state-of-the-art machine learning classification models including hyperparameter optimization. We evaluated our time-to-failure prediction methodology in a real-world case study consisting of monitoring data gathered over several years from a large industrial press. The results demonstrated the effectiveness of the proposed methodology for six different time-to-failure pre-diction windows, as well as for the downscaled binary prediction of impending failures. In this case study, the multi-class feed-forward neural network model achieved the overall best results.}, author = {Züfle, Marwin and Agne, Joachim and Grohmann, Johannes and Dörtoluk, Ibrahim and Kounev, Samuel}, booktitle = {Proceedings of the 21st IEEE IES International Conference on Industrial Informatics}, keywords = {t_interdisciplinary}, month = {07}, publisher = {IEEE}, series = {INDIN'21}, title = {A Predictive Maintenance Methodology: Predicting the Time-to-Failure of Machines in Industry 4.0}, year = 2021 }
SuanMing: Explainable Prediction of Performance Degradations in Microservice Applications. Grohmann, Johannes; Straesser, Martin; Chalbani, Avi; Eismann, Simon; Arian, Yair; Herbst, Nikolas; Peretz, Noam; Kounev, Samuel; in Proceedings of the 12th ACM/SPEC International Conference on Performance Engineering (ICPE) (2021). ACM, New York, NY, USA.

Acceptance Rate: 29%
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
Application performance management (APM) tools are useful to observe the performance properties of an application during production. However, APM is normally purely reactive, that is, it can only report about current or past performance degradation. Although some approaches capable of predictive application monitoring have been proposed, they can only report a predicted degradation but cannot explain its root-cause, making it hard to prevent the expected degradation. In this paper, we present SuanMing---a framework for predicting performance degradation of microservice applications running in cloud environments. SuanMing is able to predict future root causes for anticipated performance degradations and therefore aims at preventing performance degradations before they actually occur. We evaluate SuanMing on two realistic microservice applications, TeaStore and TrainTicket, and we show that our approach is able to predict and pinpoint performance degradations with an accuracy of over 90\%.

@inproceedings{GrStChEiArHePeKo2021-ICPE, abstract = {Application performance management (APM) tools are useful to observe the performance properties of an application during production. However, APM is normally purely reactive, that is, it can only report about current or past performance degradation. Although some approaches capable of predictive application monitoring have been proposed, they can only report a predicted degradation but cannot explain its root-cause, making it hard to prevent the expected degradation. In this paper, we present SuanMing—a framework for predicting performance degradation of microservice applications running in cloud environments. SuanMing is able to predict future root causes for anticipated performance degradations and therefore aims at preventing performance degradations before they actually occur. We evaluate SuanMing on two realistic microservice applications, TeaStore and TrainTicket, and we show that our approach is able to predict and pinpoint performance degradations with an accuracy of over 90\%.}, address = {New York, NY, USA}, author = {Grohmann, Johannes and Straesser, Martin and Chalbani, Avi and Eismann, Simon and Arian, Yair and Herbst, Nikolas and Peretz, Noam and Kounev, Samuel}, booktitle = {Proceedings of the 12th ACM/SPEC International Conference on Performance Engineering (ICPE)}, keywords = {performance}, month = {04}, note = {Acceptance Rate: 29%}, organization = {ACM}, title = {SuanMing: Explainable Prediction of Performance Degradations in Microservice Applications}, year = 2021 }
Libra: A Benchmark for Time Series Forecasting Methods. Bauer, Andr{é}; Z{ü}fle, Marwin; Eismann, Simon; Grohmann, Johannes; Herbst, Nikolas; Kounev, Samuel; in Proceedings of the 12th ACM/SPEC International Conference on Performance Engineering (ICPE) (2021). ACM, New York, NY, USA.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
In many areas of decision making, forecasting is an essential pillar. Consequently, there are many different forecasting methods. According to the "No-Free-Lunch Theorem", there is no single forecasting method that performs best for all time series. In other words, each method has its advantages and disadvantages depending on the specific use case. Therefore, the choice of the forecasting method remains a mandatory expert task. However, expert knowledge cannot be fully automated. To establish a level playing field for evaluating the performance of time series forecasting methods in a broad setting, we propose Libra, a forecasting benchmark that automatically evaluates and ranks forecasting methods based on their performance in a diverse set of evaluation scenarios. The benchmark comprises four different use cases, each covering 100 heterogeneous time series taken from different domains. The data set was assembled from publicly available time series and was designed to exhibit much higher diversity than existing forecasting competitions. Based on this benchmark, we perform a comprehensive evaluation to compare different existing time series forecasting methods.

@inproceedings{bauer2021benchmark, abstract = {In many areas of decision making, forecasting is an essential pillar. Consequently, there are many different forecasting methods. According to the "No-Free-Lunch Theorem", there is no single forecasting method that performs best for all time series. In other words, each method has its advantages and disadvantages depending on the specific use case. Therefore, the choice of the forecasting method remains a mandatory expert task. However, expert knowledge cannot be fully automated. To establish a level playing field for evaluating the performance of time series forecasting methods in a broad setting, we propose Libra, a forecasting benchmark that automatically evaluates and ranks forecasting methods based on their performance in a diverse set of evaluation scenarios. The benchmark comprises four different use cases, each covering 100 heterogeneous time series taken from different domains. The data set was assembled from publicly available time series and was designed to exhibit much higher diversity than existing forecasting competitions. Based on this benchmark, we perform a comprehensive evaluation to compare different existing time series forecasting methods.}, address = {New York, NY, USA}, author = {Bauer, Andr{\'e} and Z{\"u}fle, Marwin and Eismann, Simon and Grohmann, Johannes and Herbst, Nikolas and Kounev, Samuel}, booktitle = {Proceedings of the 12th ACM/SPEC International Conference on Performance Engineering (ICPE)}, keywords = {benchmark}, month = {04}, organization = {ACM}, title = {Libra: A Benchmark for Time Series Forecasting Methods}, year = 2021 }
Sizeless: Predicting the Optimal Size of Serverless Functions. Eismann, Simon; Bui, Long; Grohmann, Johannes; Abad, Cristina; Herbst, Nikolas; Kounev, Samuel; in Proceedings of the 22nd International MIDDLEWARE Conference (2021). 248–259.

Best Student Paper Award, ACM Artifacts Evaluated — Functional
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
Serverless functions are an emerging cloud computing paradigm that is being rapidly adopted by both industry and academia. In this cloud computing model, the provider opaquely handles resource management tasks such as resource provisioning, deployment, and auto-scaling. The only resource management task that developers are still in charge of is selecting how much resources are allocated to each worker instance. However, selecting the optimal size of serverless functions is quite challenging, so developers often neglect it despite its significant cost and performance benefits. Existing approaches aiming to automate serverless functions resource sizing require dedicated performance tests, which are time-consuming to implement and maintain. In this paper, we introduce an approach to predict the optimal resource size of a serverless function using monitoring data from a single resource size. As our approach does not require dedicated performance tests, it enables cloud providers to implement resource sizing on a platform level and automate the last resource management task associated with serverless functions. We evaluate our approach on four different serverless applications on AWS, where it predicts the execution time of the other memory sizes based on monitoring data for a single memory size with an average prediction error of 15.3%. Based on these predictions, it selects the optimal memory size for 79.0% of the serverless functions and the secondbest memory size for 12.3% of the serverless functions, which results in an average speedup of 39.7% while also decreasing average costs by 2.6%.

@inproceedings{eismann2021sizeless, abstract = {Serverless functions are an emerging cloud computing paradigm that is being rapidly adopted by both industry and academia. In this cloud computing model, the provider opaquely handles resource management tasks such as resource provisioning, deployment, and auto-scaling. The only resource management task that developers are still in charge of is selecting how much resources are allocated to each worker instance. However, selecting the optimal size of serverless functions is quite challenging, so developers often neglect it despite its significant cost and performance benefits. Existing approaches aiming to automate serverless functions resource sizing require dedicated performance tests, which are time-consuming to implement and maintain. In this paper, we introduce an approach to predict the optimal resource size of a serverless function using monitoring data from a single resource size. As our approach does not require dedicated performance tests, it enables cloud providers to implement resource sizing on a platform level and automate the last resource management task associated with serverless functions. We evaluate our approach on four different serverless applications on AWS, where it predicts the execution time of the other memory sizes based on monitoring data for a single memory size with an average prediction error of 15.3%. Based on these predictions, it selects the optimal memory size for 79.0% of the serverless functions and the secondbest memory size for 12.3% of the serverless functions, which results in an average speedup of 39.7% while also decreasing average costs by 2.6%.}, author = {Eismann, Simon and Bui, Long and Grohmann, Johannes and Abad, Cristina and Herbst, Nikolas and Kounev, Samuel}, booktitle = {Proceedings of the 22nd International MIDDLEWARE Conference}, keywords = {descartes}, note = {Best Student Paper Award, ACM Artifacts Evaluated — Functional}, pages = {248–259}, title = {Sizeless: Predicting the Optimal Size of Serverless Functions}, year = 2021 }
ComBench: A Benchmarking Framework for Publish/Subscribe Communication Protocols Under Network Limitations. Herrnleben, Stefan; Leidinger, Maximilian; Lesch, Veronika; Prantl, Thomas; Grohmann, Johannes; Krupitzer, Christian; Kounev, Samuel; in Performance Evaluation Methodologies and Tools, Q. Zhao, L. Xia (eds.) (2021). 72–92. Springer International Publishing, Cham.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
Efficient and dependable communication is a highly relevant aspect for Internet of Things (IoT) systems in which tiny sensors, actuators, wearables, or other smart devices exchange messages. Various publish/subscribe protocols address the challenges of communication in IoT systems. The selection process of a suitable protocol should consider the communication behavior of the application, the protocol's performance, the resource requirements on the end device, and the network connection quality, as IoT environments often rely on wireless networks. Benchmarking is a common approach to evaluate and compare systems, considering the performance and aspects like dependability or security. In this paper, we present our IoT communication benchmarking framework ComBench for publish/subscribe protocols focusing on constrained networks with varying quality conditions. The benchmarking framework supports system designers, software engineers, and application developers to select and investigate the behavior of communication protocols. Our benchmarking framework contributes to (i) show the impact of fluctuating network quality on communication, (ii) compare multiple protocols, protocol features, and protocol implementations, and (iii) analyze scalability, robustness, and dependability of clients, networks, and brokers in different scenarios. Our case study demonstrates the applicability of our framework to support the decision for the best-suited protocol in various scenarios.

@inproceedings{10.1007/978-3-030-92511-6_5, abstract = {Efficient and dependable communication is a highly relevant aspect for Internet of Things (IoT) systems in which tiny sensors, actuators, wearables, or other smart devices exchange messages. Various publish/subscribe protocols address the challenges of communication in IoT systems. The selection process of a suitable protocol should consider the communication behavior of the application, the protocol's performance, the resource requirements on the end device, and the network connection quality, as IoT environments often rely on wireless networks. Benchmarking is a common approach to evaluate and compare systems, considering the performance and aspects like dependability or security. In this paper, we present our IoT communication benchmarking framework ComBench for publish/subscribe protocols focusing on constrained networks with varying quality conditions. The benchmarking framework supports system designers, software engineers, and application developers to select and investigate the behavior of communication protocols. Our benchmarking framework contributes to (i) show the impact of fluctuating network quality on communication, (ii) compare multiple protocols, protocol features, and protocol implementations, and (iii) analyze scalability, robustness, and dependability of clients, networks, and brokers in different scenarios. Our case study demonstrates the applicability of our framework to support the decision for the best-suited protocol in various scenarios.}, address = {Cham}, author = {Herrnleben, Stefan and Leidinger, Maximilian and Lesch, Veronika and Prantl, Thomas and Grohmann, Johannes and Krupitzer, Christian and Kounev, Samuel}, booktitle = {Performance Evaluation Methodologies and Tools}, editor = {Zhao, Qianchuan and Xia, Li}, keywords = {networking}, pages = {72–92}, publisher = {Springer International Publishing}, title = {ComBench: A Benchmarking Framework for Publish/Subscribe Communication Protocols Under Network Limitations}, year = 2021 }
Baloo: Measuring and Modeling the Performance Configurations of Distributed DBMS. Grohmann, Johannes; Seybold, Daniel; Eismann, Simon; Leznik, Mark; Kounev, Samuel; Domaschka, Jörg; in 2020 28th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS) (2020). 1–8. IEEE.

Acceptance Rate: 27%
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
Correctly configuring a distributed database management system (DBMS) deployed in a cloud environment for maximizing performance poses many challenges to operators. Even if the entire configuration spectrum could be measured directly, which is often infeasible due to the multitude of parameters, single measurements are subject to random variations and need to be repeated multiple times. In this work, we propose Baloo, a framework for systematically measuring and modeling different performance-relevant configurations of distributed DBMS in cloud environments. Baloo dynamically estimates the required number of configurations, as well as the number of required measurement repetitions per configuration based on a desired target accuracy. We evaluate Baloo based on a data set consisting of 900 DBMS configuration measurements conducted in our private cloud setup. Our evaluation shows that the highly configurable framework is able to achieve a prediction error of up to 12% while saving 80% of the measurement effort. We also publish all code and the acquired data set to foster future research.

@inproceedings{GrSeEiLeKoDo2020-MASCOTS-DBMSPerformance, abstract = {Correctly configuring a distributed database management system (DBMS) deployed in a cloud environment for maximizing performance poses many challenges to operators. Even if the entire configuration spectrum could be measured directly, which is often infeasible due to the multitude of parameters, single measurements are subject to random variations and need to be repeated multiple times. In this work, we propose Baloo, a framework for systematically measuring and modeling different performance-relevant configurations of distributed DBMS in cloud environments. Baloo dynamically estimates the required number of configurations, as well as the number of required measurement repetitions per configuration based on a desired target accuracy. We evaluate Baloo based on a data set consisting of 900 DBMS configuration measurements conducted in our private cloud setup. Our evaluation shows that the highly configurable framework is able to achieve a prediction error of up to 12% while saving 80% of the measurement effort. We also publish all code and the acquired data set to foster future research.}, author = {Grohmann, Johannes and Seybold, Daniel and Eismann, Simon and Leznik, Mark and Kounev, Samuel and Domaschka, Jörg}, booktitle = {2020 28th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)}, keywords = {automated_model_learning}, month = 11, note = {Acceptance Rate: 27%}, pages = {1–8}, publisher = {IEEE}, series = {MASCOTS '20}, title = {Baloo: Measuring and Modeling the Performance Configurations of Distributed DBMS}, year = 2020 }
An IoT Network Emulator for Analyzing the Influence of Varying Network Quality. Herrnleben, Stefan; Ailabouni, Rudy; Grohmann, Johannes; Prantl, Thomas; Krupitzer, Christian; Kounev, Samuel; in Proceedings of the 12th EAI International Conference on Simulation Tools and Techniques (SIMUtools) (2020).
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
IoT devices often communicate over wireless or cellular networks with varying connection quality. These fluctuations are caused, among others, by the free-space path loss (FSPL), buildings, topological obstacles, weather, and mobility of the receiver. Varying signal quality affects bandwidth, transmission delays, packet loss, and jitter. Mobile IoT applications exposed to varying connection characteristics have to handle such variations and take them into account during development and testing. However, tests in real mobile networks are complex and challenging to reproduce. Therefore, network emulators can be used to simulate the behavior of real-world networks by adding artificial disturbance. However, existing network emulators often require a lot of technical knowledge and complex setup. Integrating such emulators into automated software testing pipelines could be a challenging task. In this paper, we propose a framework for emulating IoT networks with varying quality characteristics. An existing base emulator is used and integrated into our framework enabling the user to utilize it without extensive network expertise and configuration effort. The evaluation proves that our framework can simulate a variety of different network quality characteristics as well as emulating real-world network traces.

@inproceedings{HeAiGrPrKrKo2020-SIMUtools-IoT-Network-Emulator, abstract = {IoT devices often communicate over wireless or cellular networks with varying connection quality. These fluctuations are caused, among others, by the free-space path loss (FSPL), buildings, topological obstacles, weather, and mobility of the receiver. Varying signal quality affects bandwidth, transmission delays, packet loss, and jitter. Mobile IoT applications exposed to varying connection characteristics have to handle such variations and take them into account during development and testing. However, tests in real mobile networks are complex and challenging to reproduce. Therefore, network emulators can be used to simulate the behavior of real-world networks by adding artificial disturbance. However, existing network emulators often require a lot of technical knowledge and complex setup. Integrating such emulators into automated software testing pipelines could be a challenging task. In this paper, we propose a framework for emulating IoT networks with varying quality characteristics. An existing base emulator is used and integrated into our framework enabling the user to utilize it without extensive network expertise and configuration effort. The evaluation proves that our framework can simulate a variety of different network quality characteristics as well as emulating real-world network traces.}, author = {Herrnleben, Stefan and Ailabouni, Rudy and Grohmann, Johannes and Prantl, Thomas and Krupitzer, Christian and Kounev, Samuel}, booktitle = {Proceedings of the 12th EAI International Conference on Simulation Tools and Techniques (SIMUtools)}, keywords = {networking}, month = {08}, series = {SIMUtools 2020}, title = {An IoT Network Emulator for Analyzing the Influence of Varying Network Quality}, year = 2020 }
Predicting the Costs of Serverless Workflows. Eismann, Simon; Grohmann, Johannes; van Eyk, Erwin; Herbst, Nikolas; Kounev, Samuel; in Proceedings of the 2020 ACM/SPEC International Conference on Performance Engineering (ICPE) (2020). 265–276. Association for Computing Machinery (ACM), New York, NY, USA.

{Acceptance Rate: 23.4% (15/64)}
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
Function-as-a-Service (FaaS) platforms enable users to run arbitrary functions without being concerned about operational issues, while only paying for the consumed resources. Individual functions are often composed into workflows for complex tasks. However, the pay-per-use model and nontransparent reporting by cloud providers make it challenging to estimate the expected cost of a workflow, which prevents informed business decisions. Existing cost-estimation approaches assume a static response time for the serverless functions, without taking input parameters into account. In this paper, we propose a methodology for the cost prediction of serverless workflows consisting of input-parameter sensitive function models and a monte-carlo simulation of an abstract workflow model. Our approach enables workflow designers to predict, compare, and optimize the expected costs and performance of a planned workflow, which currently requires time-intensive experimentation. In our evaluation, we show that our approach can predict the response time and output parameters of a function based on its input parameters with an accuracy of 96.1%. In a case study with two audio-processing workflows, our approach predicts the costs of the two workflows with an accuracy of 96.2%.

@inproceedings{EiGrEyHeKo2020-ICPE-ServerlessWorkflows, abstract = {Function-as-a-Service (FaaS) platforms enable users to run arbitrary functions without being concerned about operational issues, while only paying for the consumed resources. Individual functions are often composed into workflows for complex tasks. However, the pay-per-use model and nontransparent reporting by cloud providers make it challenging to estimate the expected cost of a workflow, which prevents informed business decisions. Existing cost-estimation approaches assume a static response time for the serverless functions, without taking input parameters into account. In this paper, we propose a methodology for the cost prediction of serverless workflows consisting of input-parameter sensitive function models and a monte-carlo simulation of an abstract workflow model. Our approach enables workflow designers to predict, compare, and optimize the expected costs and performance of a planned workflow, which currently requires time-intensive experimentation. In our evaluation, we show that our approach can predict the response time and output parameters of a function based on its input parameters with an accuracy of 96.1%. In a case study with two audio-processing workflows, our approach predicts the costs of the two workflows with an accuracy of 96.2%.}, address = {New York, NY, USA}, author = {Eismann, Simon and Grohmann, Johannes and van Eyk, Erwin and Herbst, Nikolas and Kounev, Samuel}, booktitle = {Proceedings of the 2020 ACM/SPEC International Conference on Performance Engineering (ICPE)}, keywords = {se2}, month = {04}, note = {{Acceptance Rate: 23.4% (15/64)}}, pages = {265–276}, publisher = {Association for Computing Machinery (ACM)}, series = {ICPE '20}, title = {Predicting the Costs of Serverless Workflows}, year = 2020 }
Incremental Calibration of Architectural Performance Models with Parametric Dependencies. Mazkatli, Manar; Monschein, David; Grohmann, Johannes; Koziolek, Anne; in 2020 IEEE International Conference on Software Architecture (ICSA 2020) (2020). 23–34. IEEE.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
Architecture-based Performance Prediction (AbPP) allows evaluation of the performance of systems and to answer what-if questions without measurements for all alternatives. A difficulty when creating models is that Performance Model Parameters (PMPs, such as resource demands, loop iteration numbers and branch probabilities) depend on various influencing factors like input data, used hardware and the applied workload. To enable a broad range of what-if questions, Performance Models (PMs) need to have predictive power beyond what has been measured to calibrate the models. Thus, PMPs need to be parametrized over the influencing factors that may vary. Existing approaches allow for the estimation of parametrized PMPs by measuring the complete system. Thus, they are too costly to be applied frequently, up to after each code change. They do not keep also manual changes to the model when recalibrating. In this work, we present the Continuous Integration of Performance Models (CIPM), which incrementally extracts and calibrates the performance model, including parametric dependencies. CIPM responds to source code changes by updating the PM and adaptively instrumenting the changed parts. To allow AbPP, CIPM estimates the parametrized PMPs using the measurements (generated by performance tests or executing the system in production) and statistical analysis, e.g., regression analysis and decision trees. Additionally, our approach responds to production changes (e.g., load or deployment changes) and calibrates the usage and deployment parts of PMs accordingly. For the evaluation, we used two case studies. Evaluation results show that we were able to calibrate the PM incrementally and accurately.

@inproceedings{MaMoGrKo2020-ICSA, abstract = {Architecture-based Performance Prediction (AbPP) allows evaluation of the performance of systems and to answer what-if questions without measurements for all alternatives. A difficulty when creating models is that Performance Model Parameters (PMPs, such as resource demands, loop iteration numbers and branch probabilities) depend on various influencing factors like input data, used hardware and the applied workload. To enable a broad range of what-if questions, Performance Models (PMs) need to have predictive power beyond what has been measured to calibrate the models. Thus, PMPs need to be parametrized over the influencing factors that may vary. Existing approaches allow for the estimation of parametrized PMPs by measuring the complete system. Thus, they are too costly to be applied frequently, up to after each code change. They do not keep also manual changes to the model when recalibrating. In this work, we present the Continuous Integration of Performance Models (CIPM), which incrementally extracts and calibrates the performance model, including parametric dependencies. CIPM responds to source code changes by updating the PM and adaptively instrumenting the changed parts. To allow AbPP, CIPM estimates the parametrized PMPs using the measurements (generated by performance tests or executing the system in production) and statistical analysis, e.g., regression analysis and decision trees. Additionally, our approach responds to production changes (e.g., load or deployment changes) and calibrates the usage and deployment parts of PMs accordingly. For the evaluation, we used two case studies. Evaluation results show that we were able to calibrate the PM incrementally and accurately.}, author = {Mazkatli, Manar and Monschein, David and Grohmann, Johannes and Koziolek, Anne}, booktitle = {2020 IEEE International Conference on Software Architecture (ICSA 2020)}, keywords = {performance}, month = {03}, pages = {23–34}, publisher = {IEEE}, title = {Incremental Calibration of Architectural Performance Models with Parametric Dependencies}, year = 2020 }
{To Fail Or Not To Fail: Predicting Hard Disk Drive Failure Time Windows}. Z{ü}fle, Marwin; Krupitzer, Christian; Erhard, Florian; Grohmann, Johannes; Kounev, Samuel; in Proceedings of the 20th International GI/ITG Conference on Measurement, Modelling and Evaluation of Computing Systems (2020). 19–36. Springer, Cham.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
Due to the increasing size of today's data centers as well as the expectation of 24/7 availability, the complexity in the administration of hardware continuously icreases. Techniques as the Self-Monitoring, Analysis, and Reporting Technology (S.M.A.R.T.) support the monitoring of the hardware. However, those techniques often lack algorithms for intelligent data analytics. Especially, the integration of machine learning to identify potential failures in advance seems to be promising to reduce administration overhead. In this work, we present three machine learning approaches to (i) identify imminent failures, (ii) predict time windows for failures, as well as (iii) predict the exact time-to-failure. In a case study with real data from 369 hard disks, we achieve an F1-score of up to 98.0% and 97.6% for predicting potential failures with two or multiple time windows, respectively, and a hit rate of 84.9% (with a mean absolute error of 4.5 hours) for predicting the time-to-failure.

@inproceedings{ZuKrErGrKo-MMB2020-ToFailOrNotToFail, abstract = {Due to the increasing size of today's data centers as well as the expectation of 24/7 availability, the complexity in the administration of hardware continuously icreases. Techniques as the Self-Monitoring, Analysis, and Reporting Technology (S.M.A.R.T.) support the monitoring of the hardware. However, those techniques often lack algorithms for intelligent data analytics. Especially, the integration of machine learning to identify potential failures in advance seems to be promising to reduce administration overhead. In this work, we present three machine learning approaches to (i) identify imminent failures, (ii) predict time windows for failures, as well as (iii) predict the exact time-to-failure. In a case study with real data from 369 hard disks, we achieve an F1-score of up to 98.0% and 97.6% for predicting potential failures with two or multiple time windows, respectively, and a hit rate of 84.9% (with a mean absolute error of 4.5 hours) for predicting the time-to-failure.}, address = {Cham}, author = {Z{\"u}fle, Marwin and Krupitzer, Christian and Erhard, Florian and Grohmann, Johannes and Kounev, Samuel}, booktitle = {Proceedings of the 20th International GI/ITG Conference on Measurement, Modelling and Evaluation of Computing Systems}, keywords = {Reliability}, month = {03}, pages = {19–36}, publisher = {Springer}, series = {MMB 2020}, title = {{To Fail Or Not To Fail: Predicting Hard Disk Drive Failure Time Windows}}, year = 2020 }
{Model-based Performance Predictions for SDN-based Networks: A Case Study}. Herrnleben, Stefan; Rygielski, Piotr; Grohmann, Johannes; Eismann, Simon; Hossfeld, Tobias; Kounev, Samuel; in Proceedings of the 20th International GI/ITG Conference on Measurement, Modelling and Evaluation of Computing Systems (2020). Springer, Cham.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
Emerging paradigms for network virtualization like Software-Defined Networking (SDN) and Network Functions Virtualization (NFV) form new challenges for accurate performance modeling and analysis tools. Therefore, performance modeling and prediction approaches that support SDN or NFV technologies help system operators to analyze the performance of a data center and its corresponding network. The Descartes Network Infrastructures (DNI) offers a high-level descriptive language to model SDN-based networks, which can be transformed into various predictive modeling formalisms. However, these modeling concepts have not yet been evaluated in a realistic scenario. In this paper, we present an extensive case study evaluating the DNI modeling capabilities, the transformations to predictive models, and the performance prediction using the OMNeT++ and SimQPN simulation frameworks. We present five realistic scenarios of a content distribution network (CDN), compare the performance predictions with real-world measurements, and discuss modeling gaps and calibration issues causing mispredictions in some scenarios.

@inproceedings{HeRyGrEiHoKo-MMB2020-Model-based-SDN-Performance, abstract = {Emerging paradigms for network virtualization like Software-Defined Networking (SDN) and Network Functions Virtualization (NFV) form new challenges for accurate performance modeling and analysis tools. Therefore, performance modeling and prediction approaches that support SDN or NFV technologies help system operators to analyze the performance of a data center and its corresponding network. The Descartes Network Infrastructures (DNI) offers a high-level descriptive language to model SDN-based networks, which can be transformed into various predictive modeling formalisms. However, these modeling concepts have not yet been evaluated in a realistic scenario. In this paper, we present an extensive case study evaluating the DNI modeling capabilities, the transformations to predictive models, and the performance prediction using the OMNeT++ and SimQPN simulation frameworks. We present five realistic scenarios of a content distribution network (CDN), compare the performance predictions with real-world measurements, and discuss modeling gaps and calibration issues causing mispredictions in some scenarios.}, address = {Cham}, author = {Herrnleben, Stefan and Rygielski, Piotr and Grohmann, Johannes and Eismann, Simon and Hossfeld, Tobias and Kounev, Samuel}, booktitle = {Proceedings of the 20th International GI/ITG Conference on Measurement, Modelling and Evaluation of Computing Systems}, keywords = {SDN}, month = {03}, publisher = {Springer}, series = {MMB 2020}, title = {{Model-based Performance Predictions for SDN-based Networks: A Case Study}}, year = 2020 }
{Detecting Parametric Dependencies for Performance Models Using Feature Selection Techniques}. Grohmann, Johannes; Eismann, Simon; Elflein, Sven; Mazkatli, Manar; von Kistowski, J{ó}akim; Kounev, Samuel; in 2019 IEEE 27th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS) (2019). 309–322. IEEE Computer Society.

{Acceptance Rate: 23.8% (29/122)}
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
Architectural performance models are a common approach to predict the performance properties of a software system. Parametric dependencies, which describe the relation between the input parameters of a component and its performance properties, significantly increase the prediction accuracy of architectural performance models. However, manually modeling parametric dependencies is time-intensive and requires expert knowledge. Existing automated extraction approaches require dedicated performance tests, which are often infeasible. In this paper, we introduce an approach to automatically identify parametric dependencies from monitoring data using feature selection techniques from the area of machine learning. We evaluate the applicability of three techniques selected from each of the three groups of feature selection methods: a filter method, an embedded method, and a wrapper method. Our evaluation shows that the filter technique outperforms the other approaches. Based on these results, we apply this technique to a distributed micro-service web-shop, where it correctly identifies 11 performance-relevant dependencies, achieving a precision of 91.7% based on a manually labeled gold-standard.

@inproceedings{GrEiElMaKiKo2019-MASCOTS-DependencyIdentification, abstract = {Architectural performance models are a common approach to predict the performance properties of a software system. Parametric dependencies, which describe the relation between the input parameters of a component and its performance properties, significantly increase the prediction accuracy of architectural performance models. However, manually modeling parametric dependencies is time-intensive and requires expert knowledge. Existing automated extraction approaches require dedicated performance tests, which are often infeasible. In this paper, we introduce an approach to automatically identify parametric dependencies from monitoring data using feature selection techniques from the area of machine learning. We evaluate the applicability of three techniques selected from each of the three groups of feature selection methods: a filter method, an embedded method, and a wrapper method. Our evaluation shows that the filter technique outperforms the other approaches. Based on these results, we apply this technique to a distributed micro-service web-shop, where it correctly identifies 11 performance-relevant dependencies, achieving a precision of 91.7% based on a manually labeled gold-standard.}, author = {Grohmann, Johannes and Eismann, Simon and Elflein, Sven and Mazkatli, Manar and von Kistowski, J{\'o}akim and Kounev, Samuel}, booktitle = {2019 IEEE 27th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)}, keywords = {Automated_model_learning}, month = 10, note = {{Acceptance Rate: 23.8% (29/122)}}, pages = {309–322}, publisher = {IEEE Computer Society}, series = {MASCOTS '19}, title = {{Detecting Parametric Dependencies for Performance Models Using Feature Selection Techniques}}, year = 2019 }
On Learning in Collective Self-Adaptive Systems: State of Practice and a 3D Framework. {D’Angelo}, M.; {Gerasimou}, S.; {Ghahremani}, S.; {Grohmann}, J.; {Nunes}, I.; {Pournaras}, E.; {Tomforde}, S.; in Proceedings of the 14th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (2019). 13–24. IEEE Press.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
Collective self-adaptive systems (CSAS) are distributed and interconnected systems composed of multiple agents that can perform complex tasks such as environmental data collection, search and rescue operations, and discovery of natural resources. By providing individual agents with learning capabilities, CSAS can cope with challenges related to distributed sensing and decision-making and operate in uncertain environments. This unique characteristic of CSAS enables the collective to exhibit robust behaviour while achieving system-wide and agent-specific goals. Although learning has been explored in many CSAS applications, selecting suitable learning models and techniques remains a significant challenge that is heavily influenced by expert knowledge. We address this gap by performing a multifaceted analysis of existing CSAS with learning capabilities reported in the literature. Based on this analysis, we introduce a 3D framework that illustrates the learning aspects of CSAS considering the dimensions of autonomy, knowledge access, and behaviour, and facilitates the selection of learning techniques and models. Finally, using example applications from this analysis, we derive open challenges and highlight the need for research on collaborative, resilient and privacy-aware mechanisms for CSAS.

@inproceedings{SEAMS2019-CollectiveLearningSurvey, abstract = {Collective self-adaptive systems (CSAS) are distributed and interconnected systems composed of multiple agents that can perform complex tasks such as environmental data collection, search and rescue operations, and discovery of natural resources. By providing individual agents with learning capabilities, CSAS can cope with challenges related to distributed sensing and decision-making and operate in uncertain environments. This unique characteristic of CSAS enables the collective to exhibit robust behaviour while achieving system-wide and agent-specific goals. Although learning has been explored in many CSAS applications, selecting suitable learning models and techniques remains a significant challenge that is heavily influenced by expert knowledge. We address this gap by performing a multifaceted analysis of existing CSAS with learning capabilities reported in the literature. Based on this analysis, we introduce a 3D framework that illustrates the learning aspects of CSAS considering the dimensions of autonomy, knowledge access, and behaviour, and facilitates the selection of learning techniques and models. Finally, using example applications from this analysis, we derive open challenges and highlight the need for research on collaborative, resilient and privacy-aware mechanisms for CSAS.}, author = {{D'Angelo}, M. and {Gerasimou}, S. and {Ghahremani}, S. and {Grohmann}, J. and {Nunes}, I. and {Pournaras}, E. and {Tomforde}, S.}, booktitle = {Proceedings of the 14th International Symposium on Software Engineering for Adaptive and Self-Managing Systems}, keywords = {Self-adaptive-systems}, month = {05}, pages = {13–24}, publisher = {IEEE Press}, series = {SEAMS '19}, title = {On Learning in Collective Self-Adaptive Systems: State of Practice and a 3D Framework}, year = 2019 }
{Integrating Statistical Response Time Models in Architectural Performance Models}. Eismann, Simon; Grohmann, Johannes; Walter, J{ü}rgen; von Kistowski, J{ó}akim; Kounev, Samuel; in Proceedings of the 2019 IEEE International Conference on Software Architecture (ICSA) (2019). 71–80. IEEE.

Acceptance Rate: 21,9\% (21/96)
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
Performance predictions enable software architects to optimize the performance of a software system early in the development cycle. Architectural performance models and statistical response time models are commonly used to derive these performance predictions. However, both methods have significant downsides: Statistical response time models can only predict scenarios for which training data is available, making the prediction of previously unseen system configurations infeasible. In contrast, the time required to simulate an architectural performance model increases exponentially with both system size and level of modeling detail, making the analysis of large, detailed models challenging. Existing approaches use statistical response time models in architectural performance models to avoid modeling subsystems that are difficult or time-consuming to model, yet they do not consider simulation time. In this paper, we propose to model software systems using classical queuing theory and statistical response time models in parallel. This approach allows users to tailor the model for each analysis run, based on the performed adaptations and the requested performance metrics. Our approach enables faster model solution compared to traditional performance models while retaining their ability to predict previously unseen scenarios. In our experiments we observed speedups of up to 94.8%, making the analysis of much larger and more detailed systems feasible.

@inproceedings{EiGrWaKiKo2019-ICSA-Integrating, abstract = {Performance predictions enable software architects to optimize the performance of a software system early in the development cycle. Architectural performance models and statistical response time models are commonly used to derive these performance predictions. However, both methods have significant downsides: Statistical response time models can only predict scenarios for which training data is available, making the prediction of previously unseen system configurations infeasible. In contrast, the time required to simulate an architectural performance model increases exponentially with both system size and level of modeling detail, making the analysis of large, detailed models challenging. Existing approaches use statistical response time models in architectural performance models to avoid modeling subsystems that are difficult or time-consuming to model, yet they do not consider simulation time. In this paper, we propose to model software systems using classical queuing theory and statistical response time models in parallel. This approach allows users to tailor the model for each analysis run, based on the performed adaptations and the requested performance metrics. Our approach enables faster model solution compared to traditional performance models while retaining their ability to predict previously unseen scenarios. In our experiments we observed speedups of up to 94.8%, making the analysis of much larger and more detailed systems feasible.}, author = {Eismann, Simon and Grohmann, Johannes and Walter, J{\"u}rgen and von Kistowski, J{\'o}akim and Kounev, Samuel}, booktitle = {Proceedings of the 2019 IEEE International Conference on Software Architecture (ICSA)}, keywords = {PRISMA}, month = {03}, note = {Acceptance Rate: 21,9\% (21/96)}, pages = {71–80}, publisher = {IEEE}, title = {{Integrating Statistical Response Time Models in Architectural Performance Models}}, year = 2019 }
Monitorless: Predicting Performance Degradation in Cloud Applications with Machine Learning. Grohmann, Johannes; Nicholson, Patrick K.; Iglesias, Jesus Omana; Kounev, Samuel; Lugones, Diego; in Proceedings of the 20th International Middleware Conference (2019). 149–162. Association for Computing Machinery (ACM), New York, NY, USA.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
Today, software operation engineers rely on application key performance indicators (KPIs) for sizing and orchestrating cloud resources dynamically. KPIs are monitored to assess the achievable performance and to configure various cloud-specific parameters such as flavors of instances and autoscaling rules, among others. Usually, keeping KPIs within acceptable levels requires application expertise which is expensive and can slow down the continuous delivery of software. Expertise is required because KPIs are normally based on application-specific quality-of-service metrics, like service response time and processing rate, instead of generic platform metrics, like those typical across various environments (e.g., CPU and memory utilization, I/O rate, etc.)In this paper, we investigate the feasibility of outsourcing the management of application performance from developers to cloud operators. In the same way that the serverless paradigm allows the execution environment to be fully managed by a third party, we discuss a monitorless model to streamline application deployment by delegating performance management. We show that training a machine learning model with platform-level data, collected from the execution of representative containerized services, allows inferring application KPI degradation. This is an opportunity to simplify operations as engineers can rely solely on platform metrics -- while still fulfilling application KPIs -- to configure portable and application agnostic rules and other cloud-specific parameters to automatically trigger actions such as autoscaling, instance migration, network slicing, etc.Results show that monitorless infers KPI degradation with an accuracy of 97% and, notably, it performs similarly to typical autoscaling solutions, even when autoscaling rules are optimally tuned with knowledge of the expected workload.

@inproceedings{GrNiIgKoLu-MIDDLEWARE2019-Monitorless, abstract = {Today, software operation engineers rely on application key performance indicators (KPIs) for sizing and orchestrating cloud resources dynamically. KPIs are monitored to assess the achievable performance and to configure various cloud-specific parameters such as flavors of instances and autoscaling rules, among others. Usually, keeping KPIs within acceptable levels requires application expertise which is expensive and can slow down the continuous delivery of software. Expertise is required because KPIs are normally based on application-specific quality-of-service metrics, like service response time and processing rate, instead of generic platform metrics, like those typical across various environments (e.g., CPU and memory utilization, I/O rate, etc.)In this paper, we investigate the feasibility of outsourcing the management of application performance from developers to cloud operators. In the same way that the serverless paradigm allows the execution environment to be fully managed by a third party, we discuss a monitorless model to streamline application deployment by delegating performance management. We show that training a machine learning model with platform-level data, collected from the execution of representative containerized services, allows inferring application KPI degradation. This is an opportunity to simplify operations as engineers can rely solely on platform metrics – while still fulfilling application KPIs – to configure portable and application agnostic rules and other cloud-specific parameters to automatically trigger actions such as autoscaling, instance migration, network slicing, etc.Results show that monitorless infers KPI degradation with an accuracy of 97% and, notably, it performs similarly to typical autoscaling solutions, even when autoscaling rules are optimally tuned with knowledge of the expected workload.}, address = {New York, NY, USA}, author = {Grohmann, Johannes and Nicholson, Patrick K. and Iglesias, Jesus Omana and Kounev, Samuel and Lugones, Diego}, booktitle = {Proceedings of the 20th International Middleware Conference}, keywords = {Virtualization}, pages = {149–162}, publisher = {Association for Computing Machinery (ACM)}, series = {Middleware '19}, title = {Monitorless: Predicting Performance Degradation in Cloud Applications with Machine Learning}, year = 2019 }
{Predicting Server Power Consumption from Standard Rating Results}. von Kistowski, J{ó}akim; Grohmann, Johannes; Schmitt, Norbert; Kounev, Samuel; in Proceedings of the 19th ACM/SPEC International Conference on Performance Engineering (2019). 301–312. Association for Computing Machinery (ACM), New York, NY, USA.

{Full Paper Acceptance Rate: 18.6\% (13/70)}
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
Data center providers and server operators try to reduce the power consumption of their servers. Finding an energy efficient server for a specific target application is a first step in this regard. Estimating the power consumption of an application on an unavailable server is difficult, as nameplate power values are generally overestimations. Offline power models are able to predict the consumption accurately, but are usually intended for system design, requiring very specific and detailed knowledge about the system under consideration. In this paper, we introduce an offline power prediction method that uses the results of standard power rating tools. The method predicts the power consumption of a specific application for multiple load levels on a target server that is otherwise unavailable for testing. We evaluate our approach by predicting the power consumption of three applications on different physical servers. Our method is able to achieve an average prediction error of 9.49% for three workloads running on real-world, physical servers.

@inproceedings{KiGrScKo2019-ICPE-PowerPrediction, abstract = {Data center providers and server operators try to reduce the power consumption of their servers. Finding an energy efficient server for a specific target application is a first step in this regard. Estimating the power consumption of an application on an unavailable server is difficult, as nameplate power values are generally overestimations. Offline power models are able to predict the consumption accurately, but are usually intended for system design, requiring very specific and detailed knowledge about the system under consideration. In this paper, we introduce an offline power prediction method that uses the results of standard power rating tools. The method predicts the power consumption of a specific application for multiple load levels on a target server that is otherwise unavailable for testing. We evaluate our approach by predicting the power consumption of three applications on different physical servers. Our method is able to achieve an average prediction error of 9.49% for three workloads running on real-world, physical servers.}, address = {New York, NY, USA}, author = {von Kistowski, J{\'o}akim and Grohmann, Johannes and Schmitt, Norbert and Kounev, Samuel}, booktitle = {Proceedings of the 19th ACM/SPEC International Conference on Performance Engineering}, keywords = {Power}, note = {{Full Paper Acceptance Rate: 18.6\% (13/70)}}, pages = {301–312}, publisher = {Association for Computing Machinery (ACM)}, series = {ICPE '19}, title = {{Predicting Server Power Consumption from Standard Rating Results}}, year = 2019 }
{TeaStore: A Micro-Service Reference Application for Benchmarking, Modeling and Resource Management Research}. von Kistowski, J{ó}akim; Eismann, Simon; Schmitt, Norbert; Bauer, Andr{é}; Grohmann, Johannes; Kounev, Samuel; in Proceedings of the 26th IEEE International Symposium on the Modelling, Analysis, and Simulation of Computer and Telecommunication Systems (2018). 223–236. IEEE Computer Society.

{Acceptance Rate: 29.5\% (23/78)}
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
Modern distributed applications offer complex performance behavior and many degrees of freedom regarding deployment and configuration. Researchers employ various methods of analysis, modeling, and management that leverage these degrees of freedom to predict or improve non-functional properties of the software under consideration. In order to demonstrate and evaluate their applicability in the real world, methods resulting from such research areas require test and reference applications that offer a range of different behaviors, as well as the necessary degrees of freedom. Existing production software is often inaccessible for researchers or closed off to instrumentation. Existing testing and benchmarking frameworks, on the other hand, are either designed for specific testing scenarios, or they do not offer the necessary degrees of freedom. Further, most test applications are difficult to deploy and run, or are outdated. In this paper, we introduce the TeaStore, a state-of-the-art micro-service-based test and reference application. TeaStore offers services with different performance characteristics and many degrees of freedom regarding deployment and configuration to be used as a benchmarking framework for researchers. The TeaStore allows evaluating performance modeling and resource management techniques; it also offers instrumented variants to enable extensive run-time analysis. We demonstrate TeaStore's use in three contexts: performance modeling, cloud resource management, and energy efficiency analysis. Our experiments show that TeaStore can be used for evaluating novel approaches in these contexts and also motivates further research in the areas of performance modeling and resource management.

@inproceedings{KiEiScBaGrKo2018-MASCOTS-TeaStore, abstract = {Modern distributed applications offer complex performance behavior and many degrees of freedom regarding deployment and configuration. Researchers employ various methods of analysis, modeling, and management that leverage these degrees of freedom to predict or improve non-functional properties of the software under consideration. In order to demonstrate and evaluate their applicability in the real world, methods resulting from such research areas require test and reference applications that offer a range of different behaviors, as well as the necessary degrees of freedom. Existing production software is often inaccessible for researchers or closed off to instrumentation. Existing testing and benchmarking frameworks, on the other hand, are either designed for specific testing scenarios, or they do not offer the necessary degrees of freedom. Further, most test applications are difficult to deploy and run, or are outdated. In this paper, we introduce the TeaStore, a state-of-the-art micro-service-based test and reference application. TeaStore offers services with different performance characteristics and many degrees of freedom regarding deployment and configuration to be used as a benchmarking framework for researchers. The TeaStore allows evaluating performance modeling and resource management techniques; it also offers instrumented variants to enable extensive run-time analysis. We demonstrate TeaStore's use in three contexts: performance modeling, cloud resource management, and energy efficiency analysis. Our experiments show that TeaStore can be used for evaluating novel approaches in these contexts and also motivates further research in the areas of performance modeling and resource management.}, author = {von Kistowski, J{\'o}akim and Eismann, Simon and Schmitt, Norbert and Bauer, Andr{\'e} and Grohmann, Johannes and Kounev, Samuel}, booktitle = {Proceedings of the 26th IEEE International Symposium on the Modelling, Analysis, and Simulation of Computer and Telecommunication Systems}, keywords = {Power}, month = {09}, note = {{Acceptance Rate: 29.5\% (23/78)}}, pages = {223–236}, publisher = {IEEE Computer Society}, series = {MASCOTS '18}, title = {{TeaStore: A Micro-Service Reference Application for Benchmarking, Modeling and Resource Management Research}}, year = 2018 }
{On the Value of Service Demand Estimation for Auto-Scaling}. Bauer, Andr{é}; Grohmann, Johannes; Herbst, Nikolas; Kounev, Samuel; in Proceedings of 19th International GI/ITG Conference on Measurement, Modelling and Evaluation of Computing Systems (MMB 2018) (2018). (Vol. 10740) 142–156. Springer, Cham.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
In the context of performance models, service demands are key model parameters capturing the average time individual requests of different workload classes are actively processed. In a system under load, due to measurement interference, service demands normally cannot be measured directly, however, a number of estimation approaches exist based on high-level performance metrics. In this paper, we show that service demands provide significant benefits for implementing modern auto-scalers. Auto-scaling describes the process of dynamically adjusting the number of allocated virtual resources (e.g., virtual machines) in a data center according to the incoming workload. We demonstrate that even a simple auto-scaler that leverages information about service demands significantly outperforms auto-scalers solely based on CPU utilization measurements. This is shown by testing two approaches in three different scenarios. Our results show that the service demand-based auto-scaler outperforms the CPU utilization-based one in all scenarios. Our results encourage further research on the application of service demand estimates for resource management in data centers.

@inproceedings{BaGrHeKo2018-MMB-ServiceDemand, abstract = {In the context of performance models, service demands are key model parameters capturing the average time individual requests of different workload classes are actively processed. In a system under load, due to measurement interference, service demands normally cannot be measured directly, however, a number of estimation approaches exist based on high-level performance metrics. In this paper, we show that service demands provide significant benefits for implementing modern auto-scalers. Auto-scaling describes the process of dynamically adjusting the number of allocated virtual resources (e.g., virtual machines) in a data center according to the incoming workload. We demonstrate that even a simple auto-scaler that leverages information about service demands significantly outperforms auto-scalers solely based on CPU utilization measurements. This is shown by testing two approaches in three different scenarios. Our results show that the service demand-based auto-scaler outperforms the CPU utilization-based one in all scenarios. Our results encourage further research on the application of service demand estimates for resource management in data centers.}, address = {Cham}, author = {Bauer, Andr{\'e} and Grohmann, Johannes and Herbst, Nikolas and Kounev, Samuel}, booktitle = {Proceedings of 19th International GI/ITG Conference on Measurement, Modelling and Evaluation of Computing Systems (MMB 2018)}, keywords = {Virtualization}, month = {02}, pages = {142–156}, publisher = {Springer}, series = {Lecture Notes in Computer Science}, title = {{On the Value of Service Demand Estimation for Auto-Scaling}}, volume = 10740, year = 2018 }

Short Conference Papers

Same, Same, but Dissimilar: Exploring Measurements for Workload Time-Series Similarity. Leznik, Mark; Grohmann, Johannes; Kliche, Nina; Bauer, Andr{é}; Seybold, Daniel; Eismann, Simon; Kounev, Samuel; Domaschka, J{ö}rg; in Proceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering (2022). 89–96. Association for Computing Machinery, New York, NY, USA.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ Download ]
Benchmarking is a core element in the toolbox of most systems researchers and is used for analyzing, comparing, and validating complex systems. In the quest for reliable benchmark results, a consensus has formed that a significant experiment must be based on multiple runs. To interpret these runs, mean and standard deviation are often used. In case of experiments where each run produces a time series, applying and comparing the mean is not easily applicable and not necessarily statistically sound. Such an approach ignores the possibility of significant differences between runs with a similar average. In order to verify this hypothesis, we conducted a survey of 1,112 publications of selected performance engineering and systems conferences canvassing open data sets from performance experiments. The identified 3 data sets purely rely on average and standard deviation. Therefore, we propose a novel analysis approach based on similarity analysis to enhance the reliability of performance evaluations. Our approach evaluates 12 (dis-)similarity measures with respect to their applicability in analysing performance measurements and identifies four suitable similarity measures. We validate our approach by demonstrating the increase in reliability for the data sets found in the survey.

@inproceedings{leznik2022dissimilar, abstract = {Benchmarking is a core element in the toolbox of most systems researchers and is used for analyzing, comparing, and validating complex systems. In the quest for reliable benchmark results, a consensus has formed that a significant experiment must be based on multiple runs. To interpret these runs, mean and standard deviation are often used. In case of experiments where each run produces a time series, applying and comparing the mean is not easily applicable and not necessarily statistically sound. Such an approach ignores the possibility of significant differences between runs with a similar average. In order to verify this hypothesis, we conducted a survey of 1,112 publications of selected performance engineering and systems conferences canvassing open data sets from performance experiments. The identified 3 data sets purely rely on average and standard deviation. Therefore, we propose a novel analysis approach based on similarity analysis to enhance the reliability of performance evaluations. Our approach evaluates 12 (dis-)similarity measures with respect to their applicability in analysing performance measurements and identifies four suitable similarity measures. We validate our approach by demonstrating the increase in reliability for the data sets found in the survey.}, address = {New York, NY, USA}, author = {Leznik, Mark and Grohmann, Johannes and Kliche, Nina and Bauer, Andr\'{e} and Seybold, Daniel and Eismann, Simon and Kounev, Samuel and Domaschka, J\"{o}rg}, booktitle = {Proceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering}, keywords = {analysis}, pages = {89–96}, publisher = {Association for Computing Machinery}, series = {ICPE '22}, title = {Same, Same, but Dissimilar: Exploring Measurements for Workload Time-Series Similarity}, year = 2022 }
{An Automated Forecasting Framework based on Method Recommendation for Seasonal Time Series}. Bauer, Andr{é}; Z{ü}fle, Marwin; Grohmann, Johannes; Schmitt, Norbert; Herbst, Nikolas; Kounev, Samuel; in Proceedings of the ACM/SPEC International Conference on Performance Engineering (2020). 48–55. Association for Computing Machinery (ACM), New York, NY, USA.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
Due to the fast-paced and changing demands of their users, computing systems require autonomic resource management. To enable proactive and accurate decision-making for changes causing a particular overhead, reliable forecasts are needed. In fact, choosing the best performing forecasting method for a given time series scenario is a crucial task. Taking the "No-Free-Lunch Theorem" into account, there exists no forecasting method that performs best on all types of time series. To this end, we propose an automated approach that (i) extracts characteristics from a given time series, (ii) selects the best-suited machine learning method based on recommendation, and finally, (iii) performs the forecast. Our approach offers the benefit of not relying on a single method with its possibly inaccurate forecasts. In an extensive evaluation, our approach achieves the best forecasting accuracy.

@inproceedings{BaZuGrScHeKo-ICPE20-Seasonal-Forecast, abstract = {Due to the fast-paced and changing demands of their users, computing systems require autonomic resource management. To enable proactive and accurate decision-making for changes causing a particular overhead, reliable forecasts are needed. In fact, choosing the best performing forecasting method for a given time series scenario is a crucial task. Taking the "No-Free-Lunch Theorem" into account, there exists no forecasting method that performs best on all types of time series. To this end, we propose an automated approach that (i) extracts characteristics from a given time series, (ii) selects the best-suited machine learning method based on recommendation, and finally, (iii) performs the forecast. Our approach offers the benefit of not relying on a single method with its possibly inaccurate forecasts. In an extensive evaluation, our approach achieves the best forecasting accuracy.}, address = {New York, NY, USA}, author = {Bauer, Andr{\'e} and Z{\"u}fle, Marwin and Grohmann, Johannes and Schmitt, Norbert and Herbst, Nikolas and Kounev, Samuel}, booktitle = {Proceedings of the ACM/SPEC International Conference on Performance Engineering}, keywords = {PRISMA}, month = {04}, pages = {48–55}, publisher = {Association for Computing Machinery (ACM)}, series = {ICPE '20}, title = {{An Automated Forecasting Framework based on Method Recommendation for Seasonal Time Series}}, year = 2020 }
{How is Performance Addressed in DevOps?}. Bezemer, Cor{-}Paul; Eismann, Simon; Ferme, Vincenzo; Grohmann, Johannes; Heinrich, Robert; Jamshidi, Pooyan; Shang, Weiyi; van Hoorn, Andr{{é}}; Villavicencio, M{{ó}}nica; Walter, J{{ü}}rgen; Willnecker, Felix; in Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering (2019). 45–50. Association for Computing Machinery (ACM), New York, NY, USA.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
DevOps is a modern software engineering paradigm that is gaining widespread adoption in industry. The goal of DevOps is to bring software changes into production with a high frequency and fast feedback cycles. This conflicts with software quality assurance activities, particularly with respect to performance. For instance, performance evaluation activities --- such as load testing --- require a considerable amount of time to get statistically significant results. We conducted an industrial survey to get insights into how performance is addressed in industrial DevOps settings. In particular, we were interested in the frequency of executing performance evaluations, the tools being used, the granularity of the obtained performance data, and the use of model-based techniques. The survey responses, which come from a wide variety of participants from different industry sectors, indicate that the complexity of performance engineering approaches and tools is a barrier for wide-spread adoption of performance analysis in DevOps. The implication of our results is that performance analysis tools need to have a short learning curve, and should be easy to integrate into the DevOps pipeline in order to be adopted by practitioners.

@inproceedings{BeEiFeGrRhJaShHoViWaWi2019-ICPE-DevOpsSurvey, abstract = {DevOps is a modern software engineering paradigm that is gaining widespread adoption in industry. The goal of DevOps is to bring software changes into production with a high frequency and fast feedback cycles. This conflicts with software quality assurance activities, particularly with respect to performance. For instance, performance evaluation activities — such as load testing — require a considerable amount of time to get statistically significant results. We conducted an industrial survey to get insights into how performance is addressed in industrial DevOps settings. In particular, we were interested in the frequency of executing performance evaluations, the tools being used, the granularity of the obtained performance data, and the use of model-based techniques. The survey responses, which come from a wide variety of participants from different industry sectors, indicate that the complexity of performance engineering approaches and tools is a barrier for wide-spread adoption of performance analysis in DevOps. The implication of our results is that performance analysis tools need to have a short learning curve, and should be easy to integrate into the DevOps pipeline in order to be adopted by practitioners.}, address = {New York, NY, USA}, author = {Bezemer, Cor{-}Paul and Eismann, Simon and Ferme, Vincenzo and Grohmann, Johannes and Heinrich, Robert and Jamshidi, Pooyan and Shang, Weiyi and van Hoorn, Andr{{\'e}} and Villavicencio, M{{\'o}}nica and Walter, J{{\"u}}rgen and Willnecker, Felix}, booktitle = {Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering}, keywords = {SPEC}, pages = {45–50}, publisher = {Association for Computing Machinery (ACM)}, series = {ICPE '19}, title = {{How is Performance Addressed in DevOps?}}, year = 2019 }
{Self-Tuning Resource Demand Estimation}. Grohmann, Johannes; Herbst, Nikolas; Spinner, Simon; Kounev, Samuel; in Proceedings of the 14th IEEE International Conference on Autonomic Computing (ICAC 2017) (2017). 21–26.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
The average time a resource needs to process incoming requests in a monitored workload mix is a key parameter of stochastic performance models. Direct measurement of these resource demands is usually infeasible due to instrumentation overheads causing measurement interferences and perturbation in production environments.Thus, a number of statistical estimation approaches (e.g., based on optimization, regression or Kalman filters) have been proposed in the literature each coming with different strengths and run-time overheads. Most approaches offer parameters in order to customize the behavior of the estimator influencing the estimation quality and the required computation time. However, their configuration usually requires exhaustive testing, as default parameters normally do not provide optimal performance.In this paper, we propose a self-tuning approach based on discrete optimization that can be used to automatically tune the parameters of resource demand estimation methods, tailoring them to the specific application scenario and thus improving their accuracy. We apply and compare different techniques on a representative data set with varying load levels and number of workload classes. We show that our selected approach for parameter tuning can automatically improve the estimation quality of certain estimators by up to 25%.

@inproceedings{GrHeSpKo2017-ICAC-RDE, abstract = {The average time a resource needs to process incoming requests in a monitored workload mix is a key parameter of stochastic performance models. Direct measurement of these resource demands is usually infeasible due to instrumentation overheads causing measurement interferences and perturbation in production environments.Thus, a number of statistical estimation approaches (e.g., based on optimization, regression or Kalman filters) have been proposed in the literature each coming with different strengths and run-time overheads. Most approaches offer parameters in order to customize the behavior of the estimator influencing the estimation quality and the required computation time. However, their configuration usually requires exhaustive testing, as default parameters normally do not provide optimal performance.In this paper, we propose a self-tuning approach based on discrete optimization that can be used to automatically tune the parameters of resource demand estimation methods, tailoring them to the specific application scenario and thus improving their accuracy. We apply and compare different techniques on a representative data set with varying load levels and number of workload classes. We show that our selected approach for parameter tuning can automatically improve the estimation quality of certain estimators by up to 25%.}, author = {Grohmann, Johannes and Herbst, Nikolas and Spinner, Simon and Kounev, Samuel}, booktitle = {Proceedings of the 14th IEEE International Conference on Autonomic Computing (ICAC 2017)}, keywords = {Self-adaptive-systems}, month = {07}, pages = {21–26}, title = {{Self-Tuning Resource Demand Estimation}}, year = 2017 }

Workshop Papers

Optimizing Parametric Dependencies for Incremental Performance Model Extraction. Voneva, Sonya; Mazkatli, Manar; Grohmann, Johannes; Koziolek, Anne; in Companion of the 14th European Conference Software Architecture ({ECSA} 2020), H. Muccini, P. Avgeriou, B. Buhnova, J. C{{á}}mara, M. Caporuscio, M. Franzago, A. Koziolek, P. Scandurra, C. Trubiani, D. Weyns, U. Zdun (eds.) (2020). (Vol. 1269) 228–240. Springer, Cham.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
Model-based performance prediction in agile software development promises to evaluate design alternatives and to reduce the cost of performance tests. To minimize the differences between a real software and its performance model, parametric dependencies are introduced. They express how the performance model parameters (such as loop iteration count, branch transition probabilities, resource demands, and external service call arguments) depend on impacting factors like the input data. The approaches that perform model-based performance prediction in agile software development have two major shortcomings: they are either costly because they do not update the performance models automatically after each commit, or do not consider more complex parametric dependencies than linear. This work extends an approach for continuous integration of performance model during agile development. Our extension aims to optimize the learning of parametric dependencies with a genetic programming algorithm to be able to detect non-linear dependencies. The case study results show that using genetic programming enables detecting more complex dependencies and improves the accuracy of the updated performance model.

@inproceedings{VoMaGrKo-ECSA-Dependencies, abstract = {Model-based performance prediction in agile software development promises to evaluate design alternatives and to reduce the cost of performance tests. To minimize the differences between a real software and its performance model, parametric dependencies are introduced. They express how the performance model parameters (such as loop iteration count, branch transition probabilities, resource demands, and external service call arguments) depend on impacting factors like the input data. The approaches that perform model-based performance prediction in agile software development have two major shortcomings: they are either costly because they do not update the performance models automatically after each commit, or do not consider more complex parametric dependencies than linear. This work extends an approach for continuous integration of performance model during agile development. Our extension aims to optimize the learning of parametric dependencies with a genetic programming algorithm to be able to detect non-linear dependencies. The case study results show that using genetic programming enables detecting more complex dependencies and improves the accuracy of the updated performance model.}, address = {Cham}, author = {Voneva, Sonya and Mazkatli, Manar and Grohmann, Johannes and Koziolek, Anne}, booktitle = {Companion of the 14th European Conference Software Architecture ({ECSA} 2020)}, editor = {Muccini, Henry and Avgeriou, Paris and Buhnova, Barbora and C{\'{a}}mara, Javier and Caporuscio, Mauro and Franzago, Mirco and Koziolek, Anne and Scandurra, Patrizia and Trubiani, Catia and Weyns, Danny and Zdun, Uwe}, keywords = {t_workshop}, month = {09}, pages = {228–240}, publisher = {Springer}, series = {Communications in Computer and Information Science}, title = {Optimizing Parametric Dependencies for Incremental Performance Model Extraction}, volume = 1269, year = 2020 }
Learning to Learn in Collective Adaptive Systems: Mining Design Pattern for Data-driven Reasoning. D’Angelo, Mirko; Ghahremani, Sona; Gerasimou, Simos; Grohmann, Johannes; Nunes, Ingrid; Tomforde, Sven; Pournaras, Evangelos; in 2020 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C) (2020). 121–126. IEEE.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
Engineering collective adaptive systems (CAS) with learning capabilities is a challenging task due to their multidimensional and complex design space. Data-driven approaches for CAS design could introduce new insights enabling system engineers to manage the CAS complexity more cost-effectively at the design-phase. This paper introduces a systematic approach to reason about design choices and patterns of learning-based CAS. Using data from a systematic literature review, reasoning is performed with a novel application of data-driven methodologies such as clustering, multiple correspondence analysis and decision trees. The reasoning based on past experience as well as supporting novel and innovative design choices are demonstrated.

@conference{DaGhGeGrNuToPo2020-eCas-LearningInCAS, abstract = {Engineering collective adaptive systems (CAS) with learning capabilities is a challenging task due to their multidimensional and complex design space. Data-driven approaches for CAS design could introduce new insights enabling system engineers to manage the CAS complexity more cost-effectively at the design-phase. This paper introduces a systematic approach to reason about design choices and patterns of learning-based CAS. Using data from a systematic literature review, reasoning is performed with a novel application of data-driven methodologies such as clustering, multiple correspondence analysis and decision trees. The reasoning based on past experience as well as supporting novel and innovative design choices are demonstrated.}, author = {D'Angelo, Mirko and Ghahremani, Sona and Gerasimou, Simos and Grohmann, Johannes and Nunes, Ingrid and Tomforde, Sven and Pournaras, Evangelos}, booktitle = {2020 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C)}, keywords = {t_workshop}, pages = {121–126}, publisher = {IEEE}, title = {Learning to Learn in Collective Adaptive Systems: Mining Design Pattern for Data-driven Reasoning}, year = 2020 }
{On Learning Parametric Dependencies from Monitoring Data}. Grohmann, Johannes; Eismann, Simon; Kounev, Samuel; in Proceedings of the 10th Symposium on Software Performance 2019 (SSP’19) (2019).
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ Download ]
A common approach to predict system performance are so-called architectural performance models. In these models, parametric dependencies describe the relation between the input parameters of a component and its performance properties and therefore significantly increase the model expressiveness. However, manually modeling parametric dependencies is often infeasible in practice. Existing automated extraction approaches require either application source code or dedicated performance tests, which are not always available. We therefore introduced one approach for identification and one for characterization of parametric dependencies, solely based on run-time monitoring data. In this paper, we propose our idea on combining both techniques in order to create a holistic approach for the identification and characterization of parametric dependencies. Furthermore, we discuss challenges we are currently facing and potential ideas on how to overcome them.

@inproceedings{GrEiKo2019-SSP-LearningDependencies, abstract = {A common approach to predict system performance are so-called architectural performance models. In these models, parametric dependencies describe the relation between the input parameters of a component and its performance properties and therefore significantly increase the model expressiveness. However, manually modeling parametric dependencies is often infeasible in practice. Existing automated extraction approaches require either application source code or dedicated performance tests, which are not always available. We therefore introduced one approach for identification and one for characterization of parametric dependencies, solely based on run-time monitoring data. In this paper, we propose our idea on combining both techniques in order to create a holistic approach for the identification and characterization of parametric dependencies. Furthermore, we discuss challenges we are currently facing and potential ideas on how to overcome them.}, author = {Grohmann, Johannes and Eismann, Simon and Kounev, Samuel}, booktitle = {Proceedings of the 10th Symposium on Software Performance 2019 (SSP'19)}, keywords = {t_workshop}, month = 11, title = {{On Learning Parametric Dependencies from Monitoring Data}}, year = 2019 }
{Systematic Search for Optimal Resource Configurations of Distributed Applications}. Bauer, Andr{é}; Eismann, Simon; Grohmann, Johannes; Herbst, Nikolas; Kounev, Samuel; in 2019 IEEE 4th International Workshops on Foundations and Applications of Self* Systems (FAS*W) (2019). 120–125. IEEE Computer Society, Los Alamitos, CA, USA.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
With the advent of the micro-service paradigm, applications are divided into small, distributed parts. Knowledge of optimal resource configurations of such applications is required both for autonomic resource management as well as its assessment. Due to the high-dimensional search space of all possible configurations, the systematic measuring of the optimal configurations is challenging. To this end, we introduce a search algorithm based on hill-climbing for finding all optimal configurations in a feasible time and integrate it in an existing measuring framework. This approach enables the assessment, comparison and optimization of autonomic resource management approaches for micro-service applications. The evaluation shows that our approach is able to find all optimal configurations in the considered scenarios, while state-of-the-art multi-objective search algorithms do not.

@inproceedings{BaEiGrHeKo-ICAC-BUNGEE, abstract = {With the advent of the micro-service paradigm, applications are divided into small, distributed parts. Knowledge of optimal resource configurations of such applications is required both for autonomic resource management as well as its assessment. Due to the high-dimensional search space of all possible configurations, the systematic measuring of the optimal configurations is challenging. To this end, we introduce a search algorithm based on hill-climbing for finding all optimal configurations in a feasible time and integrate it in an existing measuring framework. This approach enables the assessment, comparison and optimization of autonomic resource management approaches for micro-service applications. The evaluation shows that our approach is able to find all optimal configurations in the considered scenarios, while state-of-the-art multi-objective search algorithms do not.}, address = {Los Alamitos, CA, USA}, author = {Bauer, Andr{\'e} and Eismann, Simon and Grohmann, Johannes and Herbst, Nikolas and Kounev, Samuel}, booktitle = {2019 IEEE 4th International Workshops on Foundations and Applications of Self* Systems (FAS*W)}, keywords = {Optimization}, month = {06}, pages = {120–125}, publisher = {IEEE Computer Society}, title = {{Systematic Search for Optimal Resource Configurations of Distributed Applications}}, year = 2019 }
{Utilizing Clustering to Optimize Resource Demand Estimation Approaches}. Grohmann, Johannes; Eismann, Simon; Bauer, Andre; Zuefle, Marwin; Herbst, Nikolas; Kounev, Samuel; in 2019 IEEE 4th International Workshops on Foundations and Applications of Self* Systems (FAS*W) (2019). 134–139. IEEE.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
Resource demands are crucial parameters for modeling and predicting the performance of software systems. Direct measurement of these resource demands is usually infeasible due to instrumentation overheads causing measurement interferences and perturbation in production environments. Thus, a number of statistical estimation approaches (e.g., based on optimization, regression or Kalman filters) have been proposed in the literature. Most of these approaches are parameterized. These parameters influence the estimation quality and the required computation time. Existing work uses historical data as training sets to optimize those parameters and to minimize the estimation error of those approaches. However, if the data traces are fundamentally different, the optimal parameter settings are different as well. In this paper, we propose to use automated clustering in order to group training sets into groups of similar optimization behavior. This way, optimization can be specifically tailored to certain groups of traces in a self-aware manner. During run-time, every trace is first sorted into a cluster, where the respective cluster-wide parameter optimum can be applied. A preliminary case study shows that clustering can provide promising improvements.

@inproceedings{GrEiBaZuHeKo2019-ICAC-RDEClustering, abstract = {Resource demands are crucial parameters for modeling and predicting the performance of software systems. Direct measurement of these resource demands is usually infeasible due to instrumentation overheads causing measurement interferences and perturbation in production environments. Thus, a number of statistical estimation approaches (e.g., based on optimization, regression or Kalman filters) have been proposed in the literature. Most of these approaches are parameterized. These parameters influence the estimation quality and the required computation time. Existing work uses historical data as training sets to optimize those parameters and to minimize the estimation error of those approaches. However, if the data traces are fundamentally different, the optimal parameter settings are different as well. In this paper, we propose to use automated clustering in order to group training sets into groups of similar optimization behavior. This way, optimization can be specifically tailored to certain groups of traces in a self-aware manner. During run-time, every trace is first sorted into a cluster, where the respective cluster-wide parameter optimum can be applied. A preliminary case study shows that clustering can provide promising improvements.}, author = {Grohmann, Johannes and Eismann, Simon and Bauer, Andre and Zuefle, Marwin and Herbst, Nikolas and Kounev, Samuel}, booktitle = {2019 IEEE 4th International Workshops on Foundations and Applications of Self* Systems (FAS*W)}, keywords = {Optimization}, month = {06}, pages = {134–139}, publisher = {IEEE}, title = {{Utilizing Clustering to Optimize Resource Demand Estimation Approaches}}, year = 2019 }
Black-box Learning of Parametric Dependencies for Performance Models. Ackermann, Vanessa; Grohmann, Johannes; Eismann, Simon; Kounev, Samuel; in Proceedings of 13th International Workshop on Models@run.time (MRT), co-located with ACM/IEEE 21st International Conference on Model Driven Engineering Languages and Systems (MODELS 2018) (2018).
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ Download ]
Modeling parametric dependencies in architectural performance models increases performance prediction accuracy. However, manually modeling parametric dependencies is time-intensive and requires expert knowledge. Existing automated extraction approaches require dedicated performance tests, which is often infeasible. In this paper, we propose to characterize parametric dependencies based on monitoring data. We create a representative dataset and show that different machine learning approaches perform best, depending on the characteristics of the dependency. Based on these results, we introduce a meta-selector that chooses the most suitable machine learning approach based on the dependency characteristics. In our evaluation, the meta-selector reduces the prediction error compared to the best individual machine learning approach, SVR, by 30%. As a proof of concept, we show that our approach is capable of automatically characterizing a manually modeled dependency from a previous case-study, resulting in a response time prediction accuracy of 92.8%.

@inproceedings{AcGrEiKo2018-MRT-DependencyModeling, abstract = {Modeling parametric dependencies in architectural performance models increases performance prediction accuracy. However, manually modeling parametric dependencies is time-intensive and requires expert knowledge. Existing automated extraction approaches require dedicated performance tests, which is often infeasible. In this paper, we propose to characterize parametric dependencies based on monitoring data. We create a representative dataset and show that different machine learning approaches perform best, depending on the characteristics of the dependency. Based on these results, we introduce a meta-selector that chooses the most suitable machine learning approach based on the dependency characteristics. In our evaluation, the meta-selector reduces the prediction error compared to the best individual machine learning approach, SVR, by 30%. As a proof of concept, we show that our approach is capable of automatically characterizing a manually modeled dependency from a previous case-study, resulting in a response time prediction accuracy of 92.8%.}, author = {Ackermann, Vanessa and Grohmann, Johannes and Eismann, Simon and Kounev, Samuel}, booktitle = {Proceedings of 13th International Workshop on Models@run.time (MRT), co-located with ACM/IEEE 21st International Conference on Model Driven Engineering Languages and Systems (MODELS 2018)}, keywords = {Optimization}, month = 10, series = {CEUR Workshop Proceedings}, title = {Black-box Learning of Parametric Dependencies for Performance Models}, year = 2018 }

Vision, Position, Demo, and Poster Papers

Buzzy: Towards Realistic DBMS Benchmarking via Tailored, Representative, Synthetic Workloads. Domaschka, Jörg; Eismann, Simon; Leznik, Mark; Grohmann, Johannes; Kounev, Samuel; Seybold, Daniel; in Companion of the ACM/SPEC International Conference on Performance Engineering (2021). (Vol. ICPE ’21) 175–178. Association for Computing Machinery, New York, NY, USA.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
Distributed Database Management Systems (DBMS) are a crucial component of modern IT applications. Understanding their performance and non-functional properties is of paramount importance. Yet, benchmarking distributed DBMS has proven to be difficult in practice. Either, a realistic workload is often mapped to a synthetic workload without knowing if this mapping is correct or available workload traces are replayed. While the latter approach provides more realistic results, real-world traces are hard to obtain and their scope is limited in time scale and variance.We propose collecting real-world traces and then applying data generation techniques to synthesize similar realistic traces based on it. Based in this approach, we can obtain workloads for benchmarking, exhibit variability with respect to different aspects of interest while still being similar to the original traces. Varying generation parameters, we are able to support benchmarking what-if scenarios with hypothetical workloads and introduced anomalies.

@inproceedings{domaschka2021buzzy, abstract = {Distributed Database Management Systems (DBMS) are a crucial component of modern IT applications. Understanding their performance and non-functional properties is of paramount importance. Yet, benchmarking distributed DBMS has proven to be difficult in practice. Either, a realistic workload is often mapped to a synthetic workload without knowing if this mapping is correct or available workload traces are replayed. While the latter approach provides more realistic results, real-world traces are hard to obtain and their scope is limited in time scale and variance.We propose collecting real-world traces and then applying data generation techniques to synthesize similar realistic traces based on it. Based in this approach, we can obtain workloads for benchmarking, exhibit variability with respect to different aspects of interest while still being similar to the original traces. Varying generation parameters, we are able to support benchmarking what-if scenarios with hypothetical workloads and introduced anomalies.}, address = {New York, NY, USA}, author = {Domaschka, Jörg and Eismann, Simon and Leznik, Mark and Grohmann, Johannes and Kounev, Samuel and Seybold, Daniel}, booktitle = {Companion of the ACM/SPEC International Conference on Performance Engineering}, keywords = {descartes}, pages = {175–178}, publisher = {Association for Computing Machinery}, title = {Buzzy: Towards Realistic DBMS Benchmarking via Tailored, Representative, Synthetic Workloads}, volume = {ICPE '21}, year = 2021 }
TeaStore: A Micro-Service Reference Application for Cloud Researchers. Eismann, Simon; v. Kistowski, J{ó}akim; Grohmann, Johannes; Bauer, Andr{é}; Schmitt, Norbert; Herbst, Nikolas; Kounev, Samuel; in Proceedings of 2018 IEEE/ACM International Conference on Utility and Cloud Computing Companion (UCC Companion) (2018). 11–12. IEEE.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
Researchers propose and employ various methods to analyze, model, optimize and manage modern distributed cloud applications. In order to demonstrate and evaluate these methods in realistic scenarios, researchers rely on reference applications. These applications should offer a range of different behaviors, degrees of freedom allowing for customization and should use a modern and representative technology stack. Existing testing and benchmarking applications are either outdated, designed for specific testing scenarios, or do not offer the necessary degrees of freedom. Further, most cloud reference applications are difficult to deploy and run. In this paper, we present the TeaStore, a micro-service-based test and reference cloud application. TeaStore offers services with various performance characteristics and a high degree of freedom regarding its deployment and configuration to be used as a cloud reference application for researchers. The TeaStore is designed for the evaluation of performance modeling and resource management techniques. We invite cloud researchers to use the TeaStore and provide it open-source, extendable, easily deployable and monitorable.

@inproceedings{EiKiGrBaScHeKo2018-UCC-TeastorePoster, abstract = {Researchers propose and employ various methods to analyze, model, optimize and manage modern distributed cloud applications. In order to demonstrate and evaluate these methods in realistic scenarios, researchers rely on reference applications. These applications should offer a range of different behaviors, degrees of freedom allowing for customization and should use a modern and representative technology stack. Existing testing and benchmarking applications are either outdated, designed for specific testing scenarios, or do not offer the necessary degrees of freedom. Further, most cloud reference applications are difficult to deploy and run. In this paper, we present the TeaStore, a micro-service-based test and reference cloud application. TeaStore offers services with various performance characteristics and a high degree of freedom regarding its deployment and configuration to be used as a cloud reference application for researchers. The TeaStore is designed for the evaluation of performance modeling and resource management techniques. We invite cloud researchers to use the TeaStore and provide it open-source, extendable, easily deployable and monitorable.}, author = {Eismann, Simon and v. Kistowski, J{\'o}akim and Grohmann, Johannes and Bauer, Andr{\'e} and Schmitt, Norbert and Herbst, Nikolas and Kounev, Samuel}, booktitle = {Proceedings of 2018 IEEE/ACM International Conference on Utility and Cloud Computing Companion (UCC Companion)}, keywords = {t_poster}, month = 12, pages = {11–12}, publisher = {IEEE}, title = {TeaStore: A Micro-Service Reference Application for Cloud Researchers}, year = 2018 }
{The Vision of Self-Aware Performance Models}. Grohmann, Johannes; Eismann, Simon; Kounev, Samuel; in 2018 IEEE International Conference on Software Architecture Companion (ICSA-C) (2018). 60–63.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
Performance models are necessary components of self-aware computing systems, as they allow such systems to reason about their own state and behavior. Research in this field has developed a multitude of approaches to create, maintain, and solve performance models. In this paper, we propose a meta-self-aware computing approach making the processes of model creation, maintenance and solution themselves self-aware. This enables the automated selection and adaption of software performance engineering approaches specifically tailored to the system under study.

@inproceedings{GrEiKo-ICSA18-Vision, abstract = {Performance models are necessary components of self-aware computing systems, as they allow such systems to reason about their own state and behavior. Research in this field has developed a multitude of approaches to create, maintain, and solve performance models. In this paper, we propose a meta-self-aware computing approach making the processes of model creation, maintenance and solution themselves self-aware. This enables the automated selection and adaption of software performance engineering approaches specifically tailored to the system under study.}, author = {Grohmann, Johannes and Eismann, Simon and Kounev, Samuel}, booktitle = {2018 IEEE International Conference on Software Architecture Companion (ICSA-C)}, keywords = {Self-adaptive-systems}, month = {04}, pages = {60–63}, title = {{The Vision of Self-Aware Performance Models}}, year = 2018 }
Using Machine Learning for Recommending Service Demand Estimation Approaches. Grohmann, Johannes; Herbst, Nikolas; Spinner, Simon; Kounev, Samuel; in Proceedings of the 8th International Conference on Cloud Computing and Services Science (CLOSER 2018) (2018). 473–480. SciTePress.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
Service demands are key parameters in service and performance modeling. Hence, a variety of different approaches to service demand estimation exist in the literature. However, given a specific scenario, it is not trivial to select the currently best approach, since deep expertise in statistical estimation techniques is required and the requirements and characteristics of the application scenario might change over time (e.g., by varying load patterns). To tackle this problem, we propose the use of machine learning techniques to automatically recommend the best suitable approach for the target scenario. The approach works in an online fashion and can incorporate new measurement data and changing characteristics on-the-fly. Preliminary results show that executing only the recommended estimation approach achieves 99.6% accuracy compared to executing all approaches available, while speeding up the estimation time by 57%.

@inproceedings{GrHeSpKo-Closer18-RecServDemEst, abstract = {Service demands are key parameters in service and performance modeling. Hence, a variety of different approaches to service demand estimation exist in the literature. However, given a specific scenario, it is not trivial to select the currently best approach, since deep expertise in statistical estimation techniques is required and the requirements and characteristics of the application scenario might change over time (e.g., by varying load patterns). To tackle this problem, we propose the use of machine learning techniques to automatically recommend the best suitable approach for the target scenario. The approach works in an online fashion and can incorporate new measurement data and changing characteristics on-the-fly. Preliminary results show that executing only the recommended estimation approach achieves 99.6% accuracy compared to executing all approaches available, while speeding up the estimation time by 57%.}, author = {Grohmann, Johannes and Herbst, Nikolas and Spinner, Simon and Kounev, Samuel}, booktitle = {Proceedings of the 8th International Conference on Cloud Computing and Services Science (CLOSER 2018)}, keywords = {Optimization}, month = {03}, organization = {INSTICC}, pages = {473–480}, publisher = {SciTePress}, title = {Using Machine Learning for Recommending Service Demand Estimation Approaches}, year = 2018 }
{Tools for Declarative Performance Engineering}. Walter, J{ü}rgen; Eismann, Simon; Grohmann, Johannes; Okanovic, Dusan; Kounev, Samuel; in Companion of the 2018 ACM/SPEC International Conference on Performance Engineering (2018). 53–56. Association for Computing Machinery (ACM), New York, NY, USA.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
Performance is of particular relevance to software system design, operation, and evolution. However, the application of performance engineering approaches to solve a given user concern is challenging and requires expert knowledge. In this tutorial paper, we guide the reader step-by-step through the answering of performance concerns following the idea of declarative performance engineering. We explain tools available online, which can be used for automating huge parts of the software performance engineering process. In particular, we present a performance concern language, for which we provide automated answering and visualization referring to measurement-based and model-based analysis. We also detail how to derive performance models using automated extraction of architectural performance models and modeling of parametric dependencies.

@inproceedings{WaEiGrOkKo2018-ICPE-Tools-for-DPE-Tutorial, abstract = {Performance is of particular relevance to software system design, operation, and evolution. However, the application of performance engineering approaches to solve a given user concern is challenging and requires expert knowledge. In this tutorial paper, we guide the reader step-by-step through the answering of performance concerns following the idea of declarative performance engineering. We explain tools available online, which can be used for automating huge parts of the software performance engineering process. In particular, we present a performance concern language, for which we provide automated answering and visualization referring to measurement-based and model-based analysis. We also detail how to derive performance models using automated extraction of architectural performance models and modeling of parametric dependencies.}, address = {New York, NY, USA}, author = {Walter, J{\"u}rgen and Eismann, Simon and Grohmann, Johannes and Okanovic, Dusan and Kounev, Samuel}, booktitle = {Companion of the 2018 ACM/SPEC International Conference on Performance Engineering}, keywords = {DQL}, pages = {53–56}, publisher = {Association for Computing Machinery (ACM)}, series = {ICPE '18}, title = {{Tools for Declarative Performance Engineering}}, year = 2018 }
A {SPEC RG} Cloud Group’s Vision on the Performance Challenges of FaaS Cloud Architectures. van Eyk, Erwin; Iosup, Alexandru; Abad, Cristina L.; Grohmann, Johannes; Eismann, Simon; in Companion of the 2018 ACM/SPEC International Conference on Performance Engineering (2018). 21–24. Association for Computing Machinery (ACM), New York, NY, USA.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ DOI ]
- [ Download ]
As a key part of the serverless computing paradigm, Function-as-a-Service (FaaS) platforms enable users to run arbitrary functions without being concerned about operational issues. However, there are several performance-related issues surrounding the state-of-the-art FaaS platforms that can deter widespread adoption of FaaS, including sizeable overheads, unreliable performance, and new forms of the cost-performance trade-off. In this work we, the SPEC RG Cloud Group, identify six performance-related challenges that arise specifically in this FaaS model, and present our roadmap to tackle these problems in the near future. This paper aims at motivating the community to solve these challenges together.

@inproceedings{vanEyk:2018:SRC:3185768.3186308, abstract = {As a key part of the serverless computing paradigm, Function-as-a-Service (FaaS) platforms enable users to run arbitrary functions without being concerned about operational issues. However, there are several performance-related issues surrounding the state-of-the-art FaaS platforms that can deter widespread adoption of FaaS, including sizeable overheads, unreliable performance, and new forms of the cost-performance trade-off. In this work we, the SPEC RG Cloud Group, identify six performance-related challenges that arise specifically in this FaaS model, and present our roadmap to tackle these problems in the near future. This paper aims at motivating the community to solve these challenges together.}, address = {New York, NY, USA}, author = {van Eyk, Erwin and Iosup, Alexandru and Abad, Cristina L. and Grohmann, Johannes and Eismann, Simon}, booktitle = {Companion of the 2018 ACM/SPEC International Conference on Performance Engineering}, keywords = {SPEC}, pages = {21–24}, publisher = {Association for Computing Machinery (ACM)}, series = {ICPE '18}, title = {A {SPEC RG} Cloud Group's Vision on the Performance Challenges of FaaS Cloud Architectures}, year = 2018 }

Dissertation

Model Learning for Performance Prediction of Cloud-native Microservice Applications. Grohmann, Johannes; (2022, March). Universität Würzburg.

Distinction
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ Download ]
One consequence of the recent coronavirus pandemic is increased demand and use of online services around the globe. At the same time, performance requirements for modern technologies are becoming more stringent as users become accustomed to higher standards. These increased performance and availability requirements, coupled with the unpredictable usage growth, are driving an increasing proportion of applications to run on public cloud platforms as they promise better scalability and reliability. With data centers already responsible for about one percent of the world's power consumption, optimizing resource usage is of paramount importance. Simultaneously, meeting the increasing and changing resource and performance requirements is only possible by optimizing resource management without introducing additional overhead. This requires the research and development of new modeling approaches to understand the behavior of running applications with minimal information. However, the emergence of modern software paradigms makes it increasingly difficult to derive such models and renders previous performance modeling techniques infeasible. Modern cloud applications are often deployed as a collection of fine-grained and interconnected components called microservices. Microservice architectures offer massive benefits but also have broad implications for the performance characteristics of the respective systems. In addition, the microservices paradigm is typically paired with a DevOps culture, resulting in frequent application and deployment changes. Such applications are often referred to as cloud-native applications. In summary, the increasing use of ever-changing cloud-hosted microservice applications introduces a number of unique challenges for modeling the performance of modern applications. These include the amount, type, and structure of monitoring data, frequent behavioral changes, or infrastructure variabilities. This violates common assumptions of the state of the art and opens a research gap for our work. In this thesis, we present five techniques for automated learning of performance models for cloud-native software systems. We achieve this by combining machine learning with traditional performance modeling techniques. Unlike previous work, our focus is on cloud-hosted and continuously evolving microservice architectures, so-called cloud-native applications. Therefore, our contributions aim to solve the above challenges to deliver automated performance models with minimal computational overhead and no manual intervention. Depending on the cloud computing model, privacy agreements, or monitoring capabilities of each platform, we identify different scenarios where performance modeling, prediction, and optimization techniques can provide great benefits. Specifically, the contributions of this thesis are as follows: Monitorless: Application-agnostic prediction of performance degradations. To manage application performance with only platform-level monitoring, we propose Monitorless, the first truly application-independent approach to detecting performance degradation. We use machine learning to bridge the gap between platform-level monitoring and application-specific measurements, eliminating the need for application-level monitoring. Monitorless creates a single and holistic resource saturation model that can be used for heterogeneous and untrained applications. Results show that Monitorless infers resource-based performance degradation with 97% accuracy. Moreover, it can achieve similar performance to typical autoscaling solutions, despite using less monitoring information. SuanMing: Predicting performance degradation using tracing. We introduce SuanMing to mitigate performance issues before they impact the user experience. This contribution is applied in scenarios where tracing tools enable application-level monitoring. SuanMing predicts explainable causes of expected performance degradations and prevents performance degradations before they occur. Evaluation results show that SuanMing can predict and pinpoint future performance degradations with an accuracy of over 90%. SARDE: Continuous and autonomous estimation of resource demands. We present SARDE to learn application models for highly variable application deployments. This contribution focuses on the continuous estimation of application resource demands, a key parameter of performance models. SARDE represents an autonomous ensemble estimation technique. It dynamically and continuously optimizes, selects, and executes an ensemble of approaches to estimate resource demands in response to changes in the application or its environment. Through continuous online adaptation, SARDE efficiently achieves an average resource demand estimation error of 15.96% in our evaluation. DepIC: Learning parametric dependencies from monitoring data. DepIC utilizes feature selection techniques in combination with an ensemble regression approach to automatically identify and characterize parametric dependencies. Although parametric dependencies can massively improve the accuracy of performance models, DepIC is the first approach to automatically learn such parametric dependencies from passive monitoring data streams. Our evaluation shows that DepIC achieves 91.7% precision in identifying dependencies and reduces the characterization prediction error by 30% compared to the best individual approach. Baloo: Modeling the configuration space of databases. To study the impact of different configurations within distributed DBMSs, we introduce Baloo. Our last contribution models the configuration space of databases considering measurement variabilities in the cloud. More specifically, Baloo dynamically estimates the required benchmarking measurements and automatically builds a configuration space model of a given DBMS. Our evaluation of Baloo on a dataset consisting of 900 configuration points shows that the framework achieves a prediction error of less than 11% while saving up to 80% of the measurement effort. Although the contributions themselves are orthogonally aligned, taken together they provide a holistic approach to performance management of modern cloud-native microservice applications. Our contributions are a significant step forward as they specifically target novel and cloud-native software development and operation paradigms, surpassing the capabilities and limitations of previous approaches. In addition, the research presented in this paper also has a significant impact on the industry, as the contributions were developed in collaboration with research teams from Nokia Bell Labs, Huawei, and Google. Overall, our solutions open up new possibilities for managing and optimizing cloud applications and improve cost and energy efficiency.

@phdthesis{Gr2022-Dissertation, abstract = {One consequence of the recent coronavirus pandemic is increased demand and use of online services around the globe. At the same time, performance requirements for modern technologies are becoming more stringent as users become accustomed to higher standards. These increased performance and availability requirements, coupled with the unpredictable usage growth, are driving an increasing proportion of applications to run on public cloud platforms as they promise better scalability and reliability. With data centers already responsible for about one percent of the world's power consumption, optimizing resource usage is of paramount importance. Simultaneously, meeting the increasing and changing resource and performance requirements is only possible by optimizing resource management without introducing additional overhead. This requires the research and development of new modeling approaches to understand the behavior of running applications with minimal information. However, the emergence of modern software paradigms makes it increasingly difficult to derive such models and renders previous performance modeling techniques infeasible. Modern cloud applications are often deployed as a collection of fine-grained and interconnected components called microservices. Microservice architectures offer massive benefits but also have broad implications for the performance characteristics of the respective systems. In addition, the microservices paradigm is typically paired with a DevOps culture, resulting in frequent application and deployment changes. Such applications are often referred to as cloud-native applications. In summary, the increasing use of ever-changing cloud-hosted microservice applications introduces a number of unique challenges for modeling the performance of modern applications. These include the amount, type, and structure of monitoring data, frequent behavioral changes, or infrastructure variabilities. This violates common assumptions of the state of the art and opens a research gap for our work. In this thesis, we present five techniques for automated learning of performance models for cloud-native software systems. We achieve this by combining machine learning with traditional performance modeling techniques. Unlike previous work, our focus is on cloud-hosted and continuously evolving microservice architectures, so-called cloud-native applications. Therefore, our contributions aim to solve the above challenges to deliver automated performance models with minimal computational overhead and no manual intervention. Depending on the cloud computing model, privacy agreements, or monitoring capabilities of each platform, we identify different scenarios where performance modeling, prediction, and optimization techniques can provide great benefits. Specifically, the contributions of this thesis are as follows: Monitorless: Application-agnostic prediction of performance degradations. To manage application performance with only platform-level monitoring, we propose Monitorless, the first truly application-independent approach to detecting performance degradation. We use machine learning to bridge the gap between platform-level monitoring and application-specific measurements, eliminating the need for application-level monitoring. Monitorless creates a single and holistic resource saturation model that can be used for heterogeneous and untrained applications. Results show that Monitorless infers resource-based performance degradation with 97% accuracy. Moreover, it can achieve similar performance to typical autoscaling solutions, despite using less monitoring information. SuanMing: Predicting performance degradation using tracing. We introduce SuanMing to mitigate performance issues before they impact the user experience. This contribution is applied in scenarios where tracing tools enable application-level monitoring. SuanMing predicts explainable causes of expected performance degradations and prevents performance degradations before they occur. Evaluation results show that SuanMing can predict and pinpoint future performance degradations with an accuracy of over 90%. SARDE: Continuous and autonomous estimation of resource demands. We present SARDE to learn application models for highly variable application deployments. This contribution focuses on the continuous estimation of application resource demands, a key parameter of performance models. SARDE represents an autonomous ensemble estimation technique. It dynamically and continuously optimizes, selects, and executes an ensemble of approaches to estimate resource demands in response to changes in the application or its environment. Through continuous online adaptation, SARDE efficiently achieves an average resource demand estimation error of 15.96% in our evaluation. DepIC: Learning parametric dependencies from monitoring data. DepIC utilizes feature selection techniques in combination with an ensemble regression approach to automatically identify and characterize parametric dependencies. Although parametric dependencies can massively improve the accuracy of performance models, DepIC is the first approach to automatically learn such parametric dependencies from passive monitoring data streams. Our evaluation shows that DepIC achieves 91.7% precision in identifying dependencies and reduces the characterization prediction error by 30% compared to the best individual approach. Baloo: Modeling the configuration space of databases. To study the impact of different configurations within distributed DBMSs, we introduce Baloo. Our last contribution models the configuration space of databases considering measurement variabilities in the cloud. More specifically, Baloo dynamically estimates the required benchmarking measurements and automatically builds a configuration space model of a given DBMS. Our evaluation of Baloo on a dataset consisting of 900 configuration points shows that the framework achieves a prediction error of less than 11% while saving up to 80% of the measurement effort. Although the contributions themselves are orthogonally aligned, taken together they provide a holistic approach to performance management of modern cloud-native microservice applications. Our contributions are a significant step forward as they specifically target novel and cloud-native software development and operation paradigms, surpassing the capabilities and limitations of previous approaches. In addition, the research presented in this paper also has a significant impact on the industry, as the contributions were developed in collaboration with research teams from Nokia Bell Labs, Huawei, and Google. Overall, our solutions open up new possibilities for managing and optimizing cloud applications and improve cost and energy efficiency.}, author = {Grohmann, Johannes}, keywords = {performance}, month = {03}, note = {Distinction}, school = {Universität Würzburg}, title = {Model Learning for Performance Prediction of Cloud-native Microservice Applications}, type = {Dissertation}, year = 2022 }

Techreports

A Review of Serverless Use Cases and their Characteristics Eismann, Simon; Scheuner, Joel; van Eyk, Erwin; Schwinger, Maximilian; Grohmann, Johannes; Herbst, Nikolas; Abad, Cristina; Iosup, Alexandru; (2020). SPEC RG.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ Download ]
The serverless computing paradigm promises many desirable properties for cloud applications - low-cost, fine-grained deployment, and management-free operation. Consequently, the paradigm has underwent rapid growth: there currently exist tens of serverless platforms and all global cloud providers host serverless operations. To help tune existing platforms, guide the design of new serverless approaches, and overall contribute to understanding this paradigm, in this work we present a long-term, comprehensive effort to identify, collect, and characterize serverless use cases. We survey 89 use cases, sourced from white and grey literature, and from consultations with experts in areas such as scientific computing. We study each use case using 24 characteristics, including general aspects, but also workload, application, and requirements. When the use cases employ workflows, we further analyze their characteristics. Overall, we hope our study will be useful for both academia and industry, and encourage the community to further share and communicate their use cases.

@techreport{eismann2020review, abstract = {The serverless computing paradigm promises many desirable properties for cloud applications - low-cost, fine-grained deployment, and management-free operation. Consequently, the paradigm has underwent rapid growth: there currently exist tens of serverless platforms and all global cloud providers host serverless operations. To help tune existing platforms, guide the design of new serverless approaches, and overall contribute to understanding this paradigm, in this work we present a long-term, comprehensive effort to identify, collect, and characterize serverless use cases. We survey 89 use cases, sourced from white and grey literature, and from consultations with experts in areas such as scientific computing. We study each use case using 24 characteristics, including general aspects, but also workload, application, and requirements. When the use cases employ workflows, we further analyze their characteristics. Overall, we hope our study will be useful for both academia and industry, and encourage the community to further share and communicate their use cases.}, author = {Eismann, Simon and Scheuner, Joel and van Eyk, Erwin and Schwinger, Maximilian and Grohmann, Johannes and Herbst, Nikolas and Abad, Cristina and Iosup, Alexandru}, keywords = {t_techreport}, month = {06}, publisher = {SPEC RG}, title = {A Review of Serverless Use Cases and their Characteristics}, year = 2020 }

Publications

Journal and Magazine Articles

Full Conference Papers

Short Conference Papers

Workshop Papers

Vision, Position, Demo, and Poster Papers

Dissertation

Techreports

Data privacy protection

Data privacy protection

Picture credits