ServerlessLLM: Low-latency serverless inference for large language models

Yao Fu, Leyang Xue, Yeqi Huang, Andrei-Octavian Brabete, Dmitrii Ustiugov, Yuvraj Patel, Luo Mai

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Fingerprint

Dive into the research topics of 'ServerlessLLM: Low-latency serverless inference for large language models'. Together they form a unique fingerprint.

Computer Science