Profiling Distributed Systems in Lightweight Virtualized Environments with Logs and Resource Metrics

Aidi Pi, Wei Chen, Xiaobo Zhou, Mike Ji

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Understanding and troubleshooting distributed systems in the cloud is considered a very difficult problem because the execution of a single user request is distributed to multiple machines. Further, the multi-tenancy nature of cloud environments further introduces interference that causes performance issues. Most existing troubleshooting tools either focus on log analysis or intrusive tracing methods, leaving resource usage monitoring unexplored.

We propose and implement LRTrace, a non-intrusive tracing and feedback control tool for distributed applications in lightweight virtualized environments. LRTrace profiles both log messages and actual resource consumptions of an application at runtime in a fine-grained manner, which is made possible by lightweight container-based virtualization. By correlating these two kinds of information, LRTrace provides users the ability to build the relationship between changes in resource consumption and application events. Furthermore, LRTrace allows users to define and implement their own feedback control plug-ins to manage the cluster in a semi-automatic manner. In system evaluation, we run Spark and MapReduce applications in a multi-tenant cluster and show that LRTrace can diagnose performance issues caused by either interference or bugs, or both. It also helps users to understand the workflows of data-parallel applications.
Original languageEnglish
Title of host publicationProceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing
Place of PublicationNew York, NY, USA
PublisherACM
Pages168-179
Number of pages12
ISBN (Print)978-1-4503-5785-2
DOIs
Publication statusPublished - 11 Jun 2018
Event27th International Symposium on High-Performance Parallel and Distributed Computing - Tempe, United States
Duration: 11 Jun 201815 Jun 2018
http://www.hpdc.org/2018/

Publication series

NameHPDC '18
PublisherACM

Conference

Conference27th International Symposium on High-Performance Parallel and Distributed Computing
Abbreviated titleHPDC 2018
Country/TerritoryUnited States
CityTempe
Period11/06/1815/06/18
Internet address

Keywords / Materials (for Non-textual outputs)

  • data-parallel applications
  • lightweight virtualization
  • logs
  • resource metrics
  • troubleshooting

Fingerprint

Dive into the research topics of 'Profiling Distributed Systems in Lightweight Virtualized Environments with Logs and Resource Metrics'. Together they form a unique fingerprint.

Cite this