Why Is It Not Solved Yet? Challenges for Production-Ready Autoscaling. Straesser, Martin; Grohmann, Johannes; von Kistowski, Jóakim; Eismann, Simon; Bauer, André; Kounev, Samuel; in Proceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering (2022). 105–115. Association for Computing Machinery, New York, NY, USA.
Autoscaling is a task of major importance in the cloud computing domain as it directly affects both operating costs and customer experience. Although there has been active research in this area for over ten years now, there is still a significant gap between the proposed methods in the literature and the deployed autoscalers in practice. Hence, many research autoscalers do not find their way into production deployments. This paper describes six core challenges that arise in production systems that are still not solved by most research autoscalers. We illustrate these problems through experiments in a realistic cloud environment with a real-world multi-service business application and show that commonly used autoscalers have various shortcomings. In addition, we analyze the behavior of overloaded services and show that these can be problematic for existing autoscalers. Generally, we analyze that these challenges are only insufficiently addressed in the literature and conclude that future scaling approaches should focus on the needs of production systems.
Sizeless: Predicting the Optimal Size of Serverless Functions. Eismann, Simon; Bui, Long; Grohmann, Johannes; Abad, Cristina; Herbst, Nikolas; Kounev, Samuel; in Proceedings of the 22nd International MIDDLEWARE Conference (2021). 248–259.
Best Student Paper Award, ACM Artifacts Evaluated — Functional
Serverless functions are an emerging cloud computing paradigm that is being rapidly adopted by both industry and academia. In this cloud computing model, the provider opaquely handles resource management tasks such as resource provisioning, deployment, and auto-scaling. The only resource management task that developers are still in charge of is selecting how much resources are allocated to each worker instance. However, selecting the optimal size of serverless functions is quite challenging, so developers often neglect it despite its significant cost and performance benefits. Existing approaches aiming to automate serverless functions resource sizing require dedicated performance tests, which are time-consuming to implement and maintain. In this paper, we introduce an approach to predict the optimal resource size of a serverless function using monitoring data from a single resource size. As our approach does not require dedicated performance tests, it enables cloud providers to implement resource sizing on a platform level and automate the last resource management task associated with serverless functions. We evaluate our approach on four different serverless applications on AWS, where it predicts the execution time of the other memory sizes based on monitoring data for a single memory size with an average prediction error of 15.3%. Based on these predictions, it selects the optimal memory size for 79.0% of the serverless functions and the secondbest memory size for 12.3% of the serverless functions, which results in an average speedup of 39.7% while also decreasing average costs by 2.6%.
Libra: A Benchmark for Time Series Forecasting Methods. Bauer, André; Züfle, Marwin; Eismann, Simon; Grohmann, Johannes; Herbst, Nikolas; Kounev, Samuel; in Proceedings of the 12th ACM/SPEC International Conference on Performance Engineering (ICPE) (2021). ACM, New York, NY, USA.
In many areas of decision making, forecasting is an essential pillar. Consequently, there are many different forecasting methods. According to the "No-Free-Lunch Theorem", there is no single forecasting method that performs best for all time series. In other words, each method has its advantages and disadvantages depending on the specific use case. Therefore, the choice of the forecasting method remains a mandatory expert task. However, expert knowledge cannot be fully automated. To establish a level playing field for evaluating the performance of time series forecasting methods in a broad setting, we propose Libra, a forecasting benchmark that automatically evaluates and ranks forecasting methods based on their performance in a diverse set of evaluation scenarios. The benchmark comprises four different use cases, each covering 100 heterogeneous time series taken from different domains. The data set was assembled from publicly available time series and was designed to exhibit much higher diversity than existing forecasting competitions. Based on this benchmark, we perform a comprehensive evaluation to compare different existing time series forecasting methods.
SuanMing: Explainable Prediction of Performance Degradations in Microservice Applications. Grohmann, Johannes; Straesser, Martin; Chalbani, Avi; Eismann, Simon; Arian, Yair; Herbst, Nikolas; Peretz, Noam; Kounev, Samuel; in Proceedings of the 12th ACM/SPEC International Conference on Performance Engineering (ICPE) (2021). ACM, New York, NY, USA.
Acceptance Rate: 29%
Application performance management (APM) tools are useful to observe the performance properties of an application during production. However, APM is normally purely reactive, that is, it can only report about current or past performance degradation. Although some approaches capable of predictive application monitoring have been proposed, they can only report a predicted degradation but cannot explain its root-cause, making it hard to prevent the expected degradation. In this paper, we present SuanMing---a framework for predicting performance degradation of microservice applications running in cloud environments. SuanMing is able to predict future root causes for anticipated performance degradations and therefore aims at preventing performance degradations before they actually occur. We evaluate SuanMing on two realistic microservice applications, TeaStore and TrainTicket, and we show that our approach is able to predict and pinpoint performance degradations with an accuracy of over 90%.
Baloo: Measuring and Modeling the Performance Configurations of Distributed DBMS. Grohmann, Johannes; Seybold, Daniel; Eismann, Simon; Leznik, Mark; Kounev, Samuel; Domaschka, Jörg; in 2020 28th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS) (2020). 1–8. IEEE.
Acceptance Rate: 27%
Correctly configuring a distributed database management system (DBMS) deployed in a cloud environment for maximizing performance poses many challenges to operators. Even if the entire configuration spectrum could be measured directly, which is often infeasible due to the multitude of parameters, single measurements are subject to random variations and need to be repeated multiple times. In this work, we propose Baloo, a framework for systematically measuring and modeling different performance-relevant configurations of distributed DBMS in cloud environments. Baloo dynamically estimates the required number of configurations, as well as the number of required measurement repetitions per configuration based on a desired target accuracy. We evaluate Baloo based on a data set consisting of 900 DBMS configuration measurements conducted in our private cloud setup. Our evaluation shows that the highly configurable framework is able to achieve a prediction error of up to 12% while saving 80% of the measurement effort. We also publish all code and the acquired data set to foster future research.
Microservices: A Performance Tester’s Dream or Nightmare?. Eismann, Simon; Bezemer, Cor-Paul; Shang, Weiyi; Okanovic, Dusan; van Hoorn, Andre; in Proceedings of the 2020 ACM/SPEC International Conference on Performance Engineering (ICPE) (2020).
Acceptance Rate: 23.4% (15/64), ACM Artifacts Evaluated — Functional
In recent years, there has been a shift in software development towards microservice-based architectures, which consist of small services that focus on one particular functionality. Many companies are migrating their applications to such architectures to reap the benefits of microservices, such as increased flexibility, scalability and a smaller granularity of the offered functionality by a service. On the one hand, the benefits of microservices for functional testing are often praised, as the focus on one functionality and their smaller granularity allow for more targeted and more convenient testing. On the other hand, using microservices has their consequences (both positive and negative) on other types of testing, such as performance testing. Performance testing is traditionally done by establishing the baseline performance of a software version, which is then used to compare the performance testing results of later software versions. However, as we show in this paper, establishing such a baseline performance is challenging in microservice applications. In this paper, we discuss the benefits and challenges of microservices from a performance testers point of view. Through a series of experiments on the TeaStore application, we demonstrate how microservices affect the performance testing process, and we demonstrate that it is not straightforward to achieve reliable performance testing results for a microservice application.
Model-based Performance Predictions for SDN-based Networks: A Case Study. Herrnleben, Stefan; Rygielski, Piotr; Grohmann, Johannes; Eismann, Simon; Hossfeld, Tobias; Kounev, Samuel; in Proceedings of the 20th International GI/ITG Conference on Measurement, Modelling and Evaluation of Computing Systems (2020). Springer, Cham.
Emerging paradigms for network virtualization like Software-Defined Networking (SDN) and Network Functions Virtualization (NFV) form new challenges for accurate performance modeling and analysis tools. Therefore, performance modeling and prediction approaches that support SDN or NFV technologies help system operators to analyze the performance of a data center and its corresponding network. The Descartes Network Infrastructures (DNI) offers a high-level descriptive language to model SDN-based networks, which can be transformed into various predictive modeling formalisms. However, these modeling concepts have not yet been evaluated in a realistic scenario. In this paper, we present an extensive case study evaluating the DNI modeling capabilities, the transformations to predictive models, and the performance prediction using the OMNeT++ and SimQPN simulation frameworks. We present five realistic scenarios of a content distribution network (CDN), compare the performance predictions with real-world measurements, and discuss modeling gaps and calibration issues causing mispredictions in some scenarios.
Predicting the Costs of Serverless Workflows. Eismann, Simon; Grohmann, Johannes; van Eyk, Erwin; Herbst, Nikolas; Kounev, Samuel; in Proceedings of the 2020 ACM/SPEC International Conference on Performance Engineering (ICPE) (2020). 265–276. Association for Computing Machinery (ACM), New York, NY, USA.
Acceptance Rate: 23.4% (15/64)
Function-as-a-Service (FaaS) platforms enable users to run arbitrary functions without being concerned about operational issues, while only paying for the consumed resources. Individual functions are often composed into workflows for complex tasks. However, the pay-per-use model and nontransparent reporting by cloud providers make it challenging to estimate the expected cost of a workflow, which prevents informed business decisions. Existing cost-estimation approaches assume a static response time for the serverless functions, without taking input parameters into account. In this paper, we propose a methodology for the cost prediction of serverless workflows consisting of input-parameter sensitive function models and a monte-carlo simulation of an abstract workflow model. Our approach enables workflow designers to predict, compare, and optimize the expected costs and performance of a planned workflow, which currently requires time-intensive experimentation. In our evaluation, we show that our approach can predict the response time and output parameters of a function based on its input parameters with an accuracy of 96.1%. In a case study with two audio-processing workflows, our approach predicts the costs of the two workflows with an accuracy of 96.2%.
Detecting Parametric Dependencies for Performance Models Using Feature Selection Techniques. Grohmann, Johannes; Eismann, Simon; Elflein, Sven; Mazkatli, Manar; von Kistowski, Jóakim; Kounev, Samuel; in 2019 IEEE 27th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS) (2019). 309–322. IEEE Computer Society.
Acceptance Rate: 23.8% (29/122)
Architectural performance models are a common approach to predict the performance properties of a software system. Parametric dependencies, which describe the relation between the input parameters of a component and its performance properties, significantly increase the prediction accuracy of architectural performance models. However, manually modeling parametric dependencies is time-intensive and requires expert knowledge. Existing automated extraction approaches require dedicated performance tests, which are often infeasible. In this paper, we introduce an approach to automatically identify parametric dependencies from monitoring data using feature selection techniques from the area of machine learning. We evaluate the applicability of three techniques selected from each of the three groups of feature selection methods: a filter method, an embedded method, and a wrapper method. Our evaluation shows that the filter technique outperforms the other approaches. Based on these results, we apply this technique to a distributed micro-service web-shop, where it correctly identifies 11 performance-relevant dependencies, achieving a precision of 91.7% based on a manually labeled gold-standard.
Integrating Statistical Response Time Models in Architectural Performance Models. Eismann, Simon; Grohmann, Johannes; Walter, Jürgen; von Kistowski, Jóakim; Kounev, Samuel; in Proceedings of the 2019 IEEE International Conference on Software Architecture (ICSA) (2019). 71–80. IEEE.
Acceptance Rate: 21,9% (21/96)
Performance predictions enable software architects to optimize the performance of a software system early in the development cycle. Architectural performance models and statistical response time models are commonly used to derive these performance predictions. However, both methods have significant downsides: Statistical response time models can only predict scenarios for which training data is available, making the prediction of previously unseen system configurations infeasible. In contrast, the time required to simulate an architectural performance model increases exponentially with both system size and level of modeling detail, making the analysis of large, detailed models challenging. Existing approaches use statistical response time models in architectural performance models to avoid modeling subsystems that are difficult or time-consuming to model, yet they do not consider simulation time. In this paper, we propose to model software systems using classical queuing theory and statistical response time models in parallel. This approach allows users to tailor the model for each analysis run, based on the performed adaptations and the requested performance metrics. Our approach enables faster model solution compared to traditional performance models while retaining their ability to predict previously unseen scenarios. In our experiments we observed speedups of up to 94.8%, making the analysis of much larger and more detailed systems feasible.
TeaStore: A Micro-Service Reference Application for Benchmarking, Modeling and Resource Management Research. von Kistowski, Jóakim; Eismann, Simon; Schmitt, Norbert; Bauer, André; Grohmann, Johannes; Kounev, Samuel; in Proceedings of the 26th IEEE International Symposium on the Modelling, Analysis, and Simulation of Computer and Telecommunication Systems (2018). 223–236. IEEE Computer Society.
Acceptance Rate: 29.5% (23/78)
Modern distributed applications offer complex performance behavior and many degrees of freedom regarding deployment and configuration. Researchers employ various methods of analysis, modeling, and management that leverage these degrees of freedom to predict or improve non-functional properties of the software under consideration. In order to demonstrate and evaluate their applicability in the real world, methods resulting from such research areas require test and reference applications that offer a range of different behaviors, as well as the necessary degrees of freedom. Existing production software is often inaccessible for researchers or closed off to instrumentation. Existing testing and benchmarking frameworks, on the other hand, are either designed for specific testing scenarios, or they do not offer the necessary degrees of freedom. Further, most test applications are difficult to deploy and run, or are outdated. In this paper, we introduce the TeaStore, a state-of-the-art micro-service-based test and reference application. TeaStore offers services with different performance characteristics and many degrees of freedom regarding deployment and configuration to be used as a benchmarking framework for researchers. The TeaStore allows evaluating performance modeling and resource management techniques; it also offers instrumented variants to enable extensive run-time analysis. We demonstrate TeaStore's use in three contexts: performance modeling, cloud resource management, and energy efficiency analysis. Our experiments show that TeaStore can be used for evaluating novel approaches in these contexts and also motivates further research in the areas of performance modeling and resource management.
Modeling of Parametric Dependencies for Performance Prediction of Component-based Software Systems at Run-time. Eismann, Simon; Walter, Jürgen; von Kistowski, Jóakim; Kounev, Samuel; in 2018 IEEE International Conference on Software Architecture (ICSA) (2018). 135–144.
Acceptance Rate: 25,6% (22/86)
Model-based performance analysis can be leveraged to explore performance properties of software systems. To capture the behavior of varying workload mixes, configurations, and deployments of a software system requires formal modeling of the impact of configuration parameters and user input on the system behavior. Such influences are represented as parametric dependencies in software performance models. Existing modeling approaches focus on modeling parametric dependencies at design-time. This paper identifies runtime specific parametric dependency features, which are not supported by existing work. Therefore, this paper proposes a novel modeling methodology for parametric dependencies and a corresponding graph-based resolution algorithm. This algorithm enables the solution of models containing component instance-level dependencies, variables with multiple descriptions in parallel, and correlations modeled as parametric dependencies. We integrate our work into the Descartes Modeling Language (DML), allowing for accurate and efficient modeling and analysis of parametric dependencies. These performance predictions are valuable for various purposes such as capacity planning, bottleneck analysis, configuration optimization and proactive auto-scaling. Our evaluation analyzes a video store application. The prediction for varying language mixes and video sizes shows a mean error below 5% for utilization and below 10% for response time.