Why Is It Not Solved Yet? Challenges for Production-Ready Autoscaling. Straesser, Martin; Grohmann, Johannes; von Kistowski, Jóakim; Eismann, Simon; Bauer, André; Kounev, Samuel; in Proceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering (2022). 105–115. Association for Computing Machinery, New York, NY, USA.
Autoscaling is a task of major importance in the cloud computing domain as it directly affects both operating costs and customer experience. Although there has been active research in this area for over ten years now, there is still a significant gap between the proposed methods in the literature and the deployed autoscalers in practice. Hence, many research autoscalers do not find their way into production deployments. This paper describes six core challenges that arise in production systems that are still not solved by most research autoscalers. We illustrate these problems through experiments in a realistic cloud environment with a real-world multi-service business application and show that commonly used autoscalers have various shortcomings. In addition, we analyze the behavior of overloaded services and show that these can be problematic for existing autoscalers. Generally, we analyze that these challenges are only insufficiently addressed in the literature and conclude that future scaling approaches should focus on the needs of production systems.
ComBench: A Benchmarking Framework for Publish/Subscribe Communication Protocols Under Network Limitations. Herrnleben, Stefan; Leidinger, Maximilian; Lesch, Veronika; Prantl, Thomas; Grohmann, Johannes; Krupitzer, Christian; Kounev, Samuel; in Performance Evaluation Methodologies and Tools, Q. Zhao, L. Xia (eds.) (2021). 72–92. Springer International Publishing, Cham.
Efficient and dependable communication is a highly relevant aspect for Internet of Things (IoT) systems in which tiny sensors, actuators, wearables, or other smart devices exchange messages. Various publish/subscribe protocols address the challenges of communication in IoT systems. The selection process of a suitable protocol should consider the communication behavior of the application, the protocol's performance, the resource requirements on the end device, and the network connection quality, as IoT environments often rely on wireless networks. Benchmarking is a common approach to evaluate and compare systems, considering the performance and aspects like dependability or security. In this paper, we present our IoT communication benchmarking framework ComBench for publish/subscribe protocols focusing on constrained networks with varying quality conditions. The benchmarking framework supports system designers, software engineers, and application developers to select and investigate the behavior of communication protocols. Our benchmarking framework contributes to (i) show the impact of fluctuating network quality on communication, (ii) compare multiple protocols, protocol features, and protocol implementations, and (iii) analyze scalability, robustness, and dependability of clients, networks, and brokers in different scenarios. Our case study demonstrates the applicability of our framework to support the decision for the best-suited protocol in various scenarios.
Sizeless: Predicting the Optimal Size of Serverless Functions. Eismann, Simon; Bui, Long; Grohmann, Johannes; Abad, Cristina; Herbst, Nikolas; Kounev, Samuel; in Proceedings of the 22nd International MIDDLEWARE Conference (2021). 248–259.
Best Student Paper Award, ACM Artifacts Evaluated — Functional
Serverless functions are an emerging cloud computing paradigm that is being rapidly adopted by both industry and academia. In this cloud computing model, the provider opaquely handles resource management tasks such as resource provisioning, deployment, and auto-scaling. The only resource management task that developers are still in charge of is selecting how much resources are allocated to each worker instance. However, selecting the optimal size of serverless functions is quite challenging, so developers often neglect it despite its significant cost and performance benefits. Existing approaches aiming to automate serverless functions resource sizing require dedicated performance tests, which are time-consuming to implement and maintain. In this paper, we introduce an approach to predict the optimal resource size of a serverless function using monitoring data from a single resource size. As our approach does not require dedicated performance tests, it enables cloud providers to implement resource sizing on a platform level and automate the last resource management task associated with serverless functions. We evaluate our approach on four different serverless applications on AWS, where it predicts the execution time of the other memory sizes based on monitoring data for a single memory size with an average prediction error of 15.3%. Based on these predictions, it selects the optimal memory size for 79.0% of the serverless functions and the secondbest memory size for 12.3% of the serverless functions, which results in an average speedup of 39.7% while also decreasing average costs by 2.6%.
A Predictive Maintenance Methodology: Predicting the Time-to-Failure of Machines in Industry 4.0. Züfle, Marwin; Agne, Joachim; Grohmann, Johannes; Dörtoluk, Ibrahim; Kounev, Samuel; in Proceedings of the 21st IEEE IES International Conference on Industrial Informatics (2021). IEEE.
Predictive maintenance is an essential aspect of the concept of Industry 4.0. In contrast to previous maintenance strategies, which plan repairs based on periodic schedules or threshold values, predictive maintenance is normally based on estimating the time-to-failure of machines. Thus, predictive maintenance enables a more efficient and effective maintenance approach. Although much research has already been done on time-to-failure prediction, most existing works provide only specialized approaches for specific machines. In most cases, these are either rotary machines (i.e., bearings) or lithium-ion batteries. To bridge the gap to a more general time-to-failure prediction, we propose a generic end-to-end predictive maintenance methodology for the time-to-failure prediction of industrial machines. Our methodology exhibits a number of novel aspects including a universally applicable method for feature extraction based on different types of sensor data, well-known feature transformation and selection techniques, adjustable target class assignment based on fault records with three different labeling strategies, and the training of multiple state-of-the-art machine learning classification models including hyperparameter optimization. We evaluated our time-to-failure prediction methodology in a real-world case study consisting of monitoring data gathered over several years from a large industrial press. The results demonstrated the effectiveness of the proposed methodology for six different time-to-failure pre-diction windows, as well as for the downscaled binary prediction of impending failures. In this case study, the multi-class feed-forward neural network model achieved the overall best results.
SuanMing: Explainable Prediction of Performance Degradations in Microservice Applications. Grohmann, Johannes; Straesser, Martin; Chalbani, Avi; Eismann, Simon; Arian, Yair; Herbst, Nikolas; Peretz, Noam; Kounev, Samuel; in Proceedings of the 12th ACM/SPEC International Conference on Performance Engineering (ICPE) (2021). ACM, New York, NY, USA.
Acceptance Rate: 29%
Application performance management (APM) tools are useful to observe the performance properties of an application during production. However, APM is normally purely reactive, that is, it can only report about current or past performance degradation. Although some approaches capable of predictive application monitoring have been proposed, they can only report a predicted degradation but cannot explain its root-cause, making it hard to prevent the expected degradation. In this paper, we present SuanMing---a framework for predicting performance degradation of microservice applications running in cloud environments. SuanMing is able to predict future root causes for anticipated performance degradations and therefore aims at preventing performance degradations before they actually occur. We evaluate SuanMing on two realistic microservice applications, TeaStore and TrainTicket, and we show that our approach is able to predict and pinpoint performance degradations with an accuracy of over 90%.
Libra: A Benchmark for Time Series Forecasting Methods. Bauer, André; Züfle, Marwin; Eismann, Simon; Grohmann, Johannes; Herbst, Nikolas; Kounev, Samuel; in Proceedings of the 12th ACM/SPEC International Conference on Performance Engineering (ICPE) (2021). ACM, New York, NY, USA.
In many areas of decision making, forecasting is an essential pillar. Consequently, there are many different forecasting methods. According to the "No-Free-Lunch Theorem", there is no single forecasting method that performs best for all time series. In other words, each method has its advantages and disadvantages depending on the specific use case. Therefore, the choice of the forecasting method remains a mandatory expert task. However, expert knowledge cannot be fully automated. To establish a level playing field for evaluating the performance of time series forecasting methods in a broad setting, we propose Libra, a forecasting benchmark that automatically evaluates and ranks forecasting methods based on their performance in a diverse set of evaluation scenarios. The benchmark comprises four different use cases, each covering 100 heterogeneous time series taken from different domains. The data set was assembled from publicly available time series and was designed to exhibit much higher diversity than existing forecasting competitions. Based on this benchmark, we perform a comprehensive evaluation to compare different existing time series forecasting methods.
A Simulation-based Optimization Framework for Online Adaptation of Networks. Herrnleben, Stefan; Grohmann, Johannes; Rygielski, Pitor; Lesch, Veronika; Krupitzer, Christian; Kounev, Samuel; in Proceedings of the 12th EAI International Conference on Simulation Tools and Techniques (SIMUtools), H. Song, D. Jiang (eds.) (2021). 513–532. Springer International Publishing, Cham.
Today's data centers face a rapid change of deployed services, growing complexity, and increasing performance requirements. Customers expect not only round-the-clock availability of the hosted services but also high responsiveness. Besides optimizing software architectures and deployments, networks have to be adapted to handle the changing and volatile demands. Approaches from self-adaptive systems can be used for optimizing data center networks to continuously meet Service Level Agreements (SLAs) between data center operators and customers. However, existing approaches focus only on specific objectives like topology design, power optimization, or traffic engineering. In this paper, we present an extensible framework that analyzes networks using different types of simulation and adapts them subject to multiple objectives using various adaptation techniques. Analyzing each suggested adaptation ensures that performance requirements and SLAs are continuously met. We evaluate our framework w.r.t. (i) general requirements and assessments of languages and frameworks for adaptation models, (ii) finding Pareto-optimal solutions considering a multi-dimensional cost model, and (iii) scalability. The evaluation shows that our approach detects the bottlenecks and the violated SLAs correctly, outputs valid and cost-optimal adaptations, and keeps the runtime for the adaptation process constant even with increasing network size and an increasing number of alternative configurations.
Predicting the Costs of Serverless Workflows. Eismann, Simon; Grohmann, Johannes; van Eyk, Erwin; Herbst, Nikolas; Kounev, Samuel; in Proceedings of the 2020 ACM/SPEC International Conference on Performance Engineering (ICPE) (2020). 265–276. Association for Computing Machinery (ACM), New York, NY, USA.
Acceptance Rate: 23.4% (15/64)
Function-as-a-Service (FaaS) platforms enable users to run arbitrary functions without being concerned about operational issues, while only paying for the consumed resources. Individual functions are often composed into workflows for complex tasks. However, the pay-per-use model and nontransparent reporting by cloud providers make it challenging to estimate the expected cost of a workflow, which prevents informed business decisions. Existing cost-estimation approaches assume a static response time for the serverless functions, without taking input parameters into account. In this paper, we propose a methodology for the cost prediction of serverless workflows consisting of input-parameter sensitive function models and a monte-carlo simulation of an abstract workflow model. Our approach enables workflow designers to predict, compare, and optimize the expected costs and performance of a planned workflow, which currently requires time-intensive experimentation. In our evaluation, we show that our approach can predict the response time and output parameters of a function based on its input parameters with an accuracy of 96.1%. In a case study with two audio-processing workflows, our approach predicts the costs of the two workflows with an accuracy of 96.2%.
Model-based Performance Predictions for SDN-based Networks: A Case Study. Herrnleben, Stefan; Rygielski, Piotr; Grohmann, Johannes; Eismann, Simon; Hossfeld, Tobias; Kounev, Samuel; in Proceedings of the 20th International GI/ITG Conference on Measurement, Modelling and Evaluation of Computing Systems (2020). Springer, Cham.
Emerging paradigms for network virtualization like Software-Defined Networking (SDN) and Network Functions Virtualization (NFV) form new challenges for accurate performance modeling and analysis tools. Therefore, performance modeling and prediction approaches that support SDN or NFV technologies help system operators to analyze the performance of a data center and its corresponding network. The Descartes Network Infrastructures (DNI) offers a high-level descriptive language to model SDN-based networks, which can be transformed into various predictive modeling formalisms. However, these modeling concepts have not yet been evaluated in a realistic scenario. In this paper, we present an extensive case study evaluating the DNI modeling capabilities, the transformations to predictive models, and the performance prediction using the OMNeT++ and SimQPN simulation frameworks. We present five realistic scenarios of a content distribution network (CDN), compare the performance predictions with real-world measurements, and discuss modeling gaps and calibration issues causing mispredictions in some scenarios.
To Fail Or Not To Fail: Predicting Hard Disk Drive Failure Time Windows. Züfle, Marwin; Krupitzer, Christian; Erhard, Florian; Grohmann, Johannes; Kounev, Samuel; in Proceedings of the 20th International GI/ITG Conference on Measurement, Modelling and Evaluation of Computing Systems (2020). 19–36. Springer, Cham.
Due to the increasing size of today's data centers as well as the expectation of 24/7 availability, the complexity in the administration of hardware continuously icreases. Techniques as the Self-Monitoring, Analysis, and Reporting Technology (S.M.A.R.T.) support the monitoring of the hardware. However, those techniques often lack algorithms for intelligent data analytics. Especially, the integration of machine learning to identify potential failures in advance seems to be promising to reduce administration overhead. In this work, we present three machine learning approaches to (i) identify imminent failures, (ii) predict time windows for failures, as well as (iii) predict the exact time-to-failure. In a case study with real data from 369 hard disks, we achieve an F1-score of up to 98.0% and 97.6% for predicting potential failures with two or multiple time windows, respectively, and a hit rate of 84.9% (with a mean absolute error of 4.5 hours) for predicting the time-to-failure.
Incremental Calibration of Architectural Performance Models with Parametric Dependencies. Mazkatli, Manar; Monschein, David; Grohmann, Johannes; Koziolek, Anne; in 2020 IEEE International Conference on Software Architecture (ICSA 2020) (2020). 23–34. IEEE.
Architecture-based Performance Prediction (AbPP) allows evaluation of the performance of systems and to answer what-if questions without measurements for all alternatives. A difficulty when creating models is that Performance Model Parameters (PMPs, such as resource demands, loop iteration numbers and branch probabilities) depend on various influencing factors like input data, used hardware and the applied workload. To enable a broad range of what-if questions, Performance Models (PMs) need to have predictive power beyond what has been measured to calibrate the models. Thus, PMPs need to be parametrized over the influencing factors that may vary. Existing approaches allow for the estimation of parametrized PMPs by measuring the complete system. Thus, they are too costly to be applied frequently, up to after each code change. They do not keep also manual changes to the model when recalibrating. In this work, we present the Continuous Integration of Performance Models (CIPM), which incrementally extracts and calibrates the performance model, including parametric dependencies. CIPM responds to source code changes by updating the PM and adaptively instrumenting the changed parts. To allow AbPP, CIPM estimates the parametrized PMPs using the measurements (generated by performance tests or executing the system in production) and statistical analysis, e.g., regression analysis and decision trees. Additionally, our approach responds to production changes (e.g., load or deployment changes) and calibrates the usage and deployment parts of PMs accordingly. For the evaluation, we used two case studies. Evaluation results show that we were able to calibrate the PM incrementally and accurately.
An IoT Network Emulator for Analyzing the Influence of Varying Network Quality. Herrnleben, Stefan; Ailabouni, Rudy; Grohmann, Johannes; Prantl, Thomas; Krupitzer, Christian; Kounev, Samuel; in Proceedings of the 12th EAI International Conference on Simulation Tools and Techniques (SIMUtools) (2020).
IoT devices often communicate over wireless or cellular networks with varying connection quality. These fluctuations are caused, among others, by the free-space path loss (FSPL), buildings, topological obstacles, weather, and mobility of the receiver. Varying signal quality affects bandwidth, transmission delays, packet loss, and jitter. Mobile IoT applications exposed to varying connection characteristics have to handle such variations and take them into account during development and testing. However, tests in real mobile networks are complex and challenging to reproduce. Therefore, network emulators can be used to simulate the behavior of real-world networks by adding artificial disturbance. However, existing network emulators often require a lot of technical knowledge and complex setup. Integrating such emulators into automated software testing pipelines could be a challenging task. In this paper, we propose a framework for emulating IoT networks with varying quality characteristics. An existing base emulator is used and integrated into our framework enabling the user to utilize it without extensive network expertise and configuration effort. The evaluation proves that our framework can simulate a variety of different network quality characteristics as well as emulating real-world network traces.
Baloo: Measuring and Modeling the Performance Configurations of Distributed DBMS. Grohmann, Johannes; Seybold, Daniel; Eismann, Simon; Leznik, Mark; Kounev, Samuel; Domaschka, Jörg; in 2020 28th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS) (2020). 1–8. IEEE.
Acceptance Rate: 27%
Correctly configuring a distributed database management system (DBMS) deployed in a cloud environment for maximizing performance poses many challenges to operators. Even if the entire configuration spectrum could be measured directly, which is often infeasible due to the multitude of parameters, single measurements are subject to random variations and need to be repeated multiple times. In this work, we propose Baloo, a framework for systematically measuring and modeling different performance-relevant configurations of distributed DBMS in cloud environments. Baloo dynamically estimates the required number of configurations, as well as the number of required measurement repetitions per configuration based on a desired target accuracy. We evaluate Baloo based on a data set consisting of 900 DBMS configuration measurements conducted in our private cloud setup. Our evaluation shows that the highly configurable framework is able to achieve a prediction error of up to 12% while saving 80% of the measurement effort. We also publish all code and the acquired data set to foster future research.
Integrating Statistical Response Time Models in Architectural Performance Models. Eismann, Simon; Grohmann, Johannes; Walter, Jürgen; von Kistowski, Jóakim; Kounev, Samuel; in Proceedings of the 2019 IEEE International Conference on Software Architecture (ICSA) (2019). 71–80. IEEE.
Acceptance Rate: 21,9% (21/96)
Performance predictions enable software architects to optimize the performance of a software system early in the development cycle. Architectural performance models and statistical response time models are commonly used to derive these performance predictions. However, both methods have significant downsides: Statistical response time models can only predict scenarios for which training data is available, making the prediction of previously unseen system configurations infeasible. In contrast, the time required to simulate an architectural performance model increases exponentially with both system size and level of modeling detail, making the analysis of large, detailed models challenging. Existing approaches use statistical response time models in architectural performance models to avoid modeling subsystems that are difficult or time-consuming to model, yet they do not consider simulation time. In this paper, we propose to model software systems using classical queuing theory and statistical response time models in parallel. This approach allows users to tailor the model for each analysis run, based on the performed adaptations and the requested performance metrics. Our approach enables faster model solution compared to traditional performance models while retaining their ability to predict previously unseen scenarios. In our experiments we observed speedups of up to 94.8%, making the analysis of much larger and more detailed systems feasible.
On Learning in Collective Self-Adaptive Systems: State of Practice and a 3D Framework. D’Angelo, M.; Gerasimou, S.; Ghahremani, S.; Grohmann, J.; Nunes, I.; Pournaras, E.; Tomforde, S.; in Proceedings of the 14th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (2019). 13–24. IEEE Press.
Collective self-adaptive systems (CSAS) are distributed and interconnected systems composed of multiple agents that can perform complex tasks such as environmental data collection, search and rescue operations, and discovery of natural resources. By providing individual agents with learning capabilities, CSAS can cope with challenges related to distributed sensing and decision-making and operate in uncertain environments. This unique characteristic of CSAS enables the collective to exhibit robust behaviour while achieving system-wide and agent-specific goals. Although learning has been explored in many CSAS applications, selecting suitable learning models and techniques remains a significant challenge that is heavily influenced by expert knowledge. We address this gap by performing a multifaceted analysis of existing CSAS with learning capabilities reported in the literature. Based on this analysis, we introduce a 3D framework that illustrates the learning aspects of CSAS considering the dimensions of autonomy, knowledge access, and behaviour, and facilitates the selection of learning techniques and models. Finally, using example applications from this analysis, we derive open challenges and highlight the need for research on collaborative, resilient and privacy-aware mechanisms for CSAS.
Predicting Server Power Consumption from Standard Rating Results. von Kistowski, Jóakim; Grohmann, Johannes; Schmitt, Norbert; Kounev, Samuel; in Proceedings of the 19th ACM/SPEC International Conference on Performance Engineering (2019). 301–312. Association for Computing Machinery (ACM), New York, NY, USA.
Full Paper Acceptance Rate: 18.6% (13/70)
Data center providers and server operators try to reduce the power consumption of their servers. Finding an energy efficient server for a specific target application is a first step in this regard. Estimating the power consumption of an application on an unavailable server is difficult, as nameplate power values are generally overestimations. Offline power models are able to predict the consumption accurately, but are usually intended for system design, requiring very specific and detailed knowledge about the system under consideration. In this paper, we introduce an offline power prediction method that uses the results of standard power rating tools. The method predicts the power consumption of a specific application for multiple load levels on a target server that is otherwise unavailable for testing. We evaluate our approach by predicting the power consumption of three applications on different physical servers. Our method is able to achieve an average prediction error of 9.49% for three workloads running on real-world, physical servers.
Monitorless: Predicting Performance Degradation in Cloud Applications with Machine Learning. Grohmann, Johannes; Nicholson, Patrick K.; Iglesias, Jesus Omana; Kounev, Samuel; Lugones, Diego; in Proceedings of the 20th International Middleware Conference (2019). 149–162. Association for Computing Machinery (ACM), New York, NY, USA.
Today, software operation engineers rely on application key performance indicators (KPIs) for sizing and orchestrating cloud resources dynamically. KPIs are monitored to assess the achievable performance and to configure various cloud-specific parameters such as flavors of instances and autoscaling rules, among others. Usually, keeping KPIs within acceptable levels requires application expertise which is expensive and can slow down the continuous delivery of software. Expertise is required because KPIs are normally based on application-specific quality-of-service metrics, like service response time and processing rate, instead of generic platform metrics, like those typical across various environments (e.g., CPU and memory utilization, I/O rate, etc.)In this paper, we investigate the feasibility of outsourcing the management of application performance from developers to cloud operators. In the same way that the serverless paradigm allows the execution environment to be fully managed by a third party, we discuss a monitorless model to streamline application deployment by delegating performance management. We show that training a machine learning model with platform-level data, collected from the execution of representative containerized services, allows inferring application KPI degradation. This is an opportunity to simplify operations as engineers can rely solely on platform metrics -- while still fulfilling application KPIs -- to configure portable and application agnostic rules and other cloud-specific parameters to automatically trigger actions such as autoscaling, instance migration, network slicing, etc.Results show that monitorless infers KPI degradation with an accuracy of 97% and, notably, it performs similarly to typical autoscaling solutions, even when autoscaling rules are optimally tuned with knowledge of the expected workload.
Detecting Parametric Dependencies for Performance Models Using Feature Selection Techniques. Grohmann, Johannes; Eismann, Simon; Elflein, Sven; Mazkatli, Manar; von Kistowski, Jóakim; Kounev, Samuel; in 2019 IEEE 27th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS) (2019). 309–322. IEEE Computer Society.
Acceptance Rate: 23.8% (29/122)
Architectural performance models are a common approach to predict the performance properties of a software system. Parametric dependencies, which describe the relation between the input parameters of a component and its performance properties, significantly increase the prediction accuracy of architectural performance models. However, manually modeling parametric dependencies is time-intensive and requires expert knowledge. Existing automated extraction approaches require dedicated performance tests, which are often infeasible. In this paper, we introduce an approach to automatically identify parametric dependencies from monitoring data using feature selection techniques from the area of machine learning. We evaluate the applicability of three techniques selected from each of the three groups of feature selection methods: a filter method, an embedded method, and a wrapper method. Our evaluation shows that the filter technique outperforms the other approaches. Based on these results, we apply this technique to a distributed micro-service web-shop, where it correctly identifies 11 performance-relevant dependencies, achieving a precision of 91.7% based on a manually labeled gold-standard.
On the Value of Service Demand Estimation for Auto-Scaling. Bauer, André; Grohmann, Johannes; Herbst, Nikolas; Kounev, Samuel; in Proceedings of 19th International GI/ITG Conference on Measurement, Modelling and Evaluation of Computing Systems (MMB 2018) (2018). (Vol. 10740) 142–156. Springer, Cham.
In the context of performance models, service demands are key model parameters capturing the average time individual requests of different workload classes are actively processed. In a system under load, due to measurement interference, service demands normally cannot be measured directly, however, a number of estimation approaches exist based on high-level performance metrics. In this paper, we show that service demands provide significant benefits for implementing modern auto-scalers. Auto-scaling describes the process of dynamically adjusting the number of allocated virtual resources (e.g., virtual machines) in a data center according to the incoming workload. We demonstrate that even a simple auto-scaler that leverages information about service demands significantly outperforms auto-scalers solely based on CPU utilization measurements. This is shown by testing two approaches in three different scenarios. Our results show that the service demand-based auto-scaler outperforms the CPU utilization-based one in all scenarios. Our results encourage further research on the application of service demand estimates for resource management in data centers.
TeaStore: A Micro-Service Reference Application for Benchmarking, Modeling and Resource Management Research. von Kistowski, Jóakim; Eismann, Simon; Schmitt, Norbert; Bauer, André; Grohmann, Johannes; Kounev, Samuel; in Proceedings of the 26th IEEE International Symposium on the Modelling, Analysis, and Simulation of Computer and Telecommunication Systems (2018). 223–236. IEEE Computer Society.
Acceptance Rate: 29.5% (23/78)
Modern distributed applications offer complex performance behavior and many degrees of freedom regarding deployment and configuration. Researchers employ various methods of analysis, modeling, and management that leverage these degrees of freedom to predict or improve non-functional properties of the software under consideration. In order to demonstrate and evaluate their applicability in the real world, methods resulting from such research areas require test and reference applications that offer a range of different behaviors, as well as the necessary degrees of freedom. Existing production software is often inaccessible for researchers or closed off to instrumentation. Existing testing and benchmarking frameworks, on the other hand, are either designed for specific testing scenarios, or they do not offer the necessary degrees of freedom. Further, most test applications are difficult to deploy and run, or are outdated. In this paper, we introduce the TeaStore, a state-of-the-art micro-service-based test and reference application. TeaStore offers services with different performance characteristics and many degrees of freedom regarding deployment and configuration to be used as a benchmarking framework for researchers. The TeaStore allows evaluating performance modeling and resource management techniques; it also offers instrumented variants to enable extensive run-time analysis. We demonstrate TeaStore's use in three contexts: performance modeling, cloud resource management, and energy efficiency analysis. Our experiments show that TeaStore can be used for evaluating novel approaches in these contexts and also motivates further research in the areas of performance modeling and resource management.