计算机科学
分析
供应
人工智能
数据科学
操作系统
作者
Jananie Jarachanthan,Li Chen,Fei Xu,Bo Li
出处
期刊:IEEE Transactions on Parallel and Distributed Systems
[Institute of Electrical and Electronics Engineers]
日期:2022-05-03
卷期号:33 (12): 3833-3849
被引量:10
标识
DOI:10.1109/tpds.2022.3172069
摘要
With the ability to simplify the code deployment with one-click upload and lightweight execution, serverless computing has emerged as a promising paradigm with increasing popularity. However, there remain open challenges when adapting data-intensive analytics applications to the serverless context, in which users of serverless analytics encounter the difficulty in coordinating computation across different stages and provisioning resources in a large configuration space. This paper presents our design and implementation of Astrea , which configures and orchestrates serverless analytics jobs in an autonomous manner, while taking into account flexibly-specified user requirements. Astrea relies on the modeling of performance and cost which characterizes the intricate interplay among multi-dimensional factors (e.g., function memory size, degree of parallelism at each stage). We formulate an optimization problem based on user-specific requirements towards performance enhancement or cost reduction, and develop a set of algorithms based on graph theory to obtain the optimal job execution. We deploy Astrea in the AWS Lambda platform and conduct real-world experiments over representative benchmarks, including Big Data analytics and machine learning workloads, at different scales. Extensive results demonstrate that Astrea can achieve the optimal execution decision for serverless data analytics, in comparison with various provisioning and deployment baselines. For example, when compared with three provisioning baselines, Astrea manages to reduce the job completion time by 21% to 69% under a given budget constraint, while saving cost by 20% to 84% without violating performance requirements.
科研通智能强力驱动
Strongly Powered by AbleSci AI