Myrtle.ai Enables Microsecond ML Inference Latencies for Larger Models on AMD Alveo V80 Compute Accelerator Card

CAMBRIDGE, England, July 29, 2025 /PRNewswire/ — Myrtle.ai, a recognized leader in accelerating machine learning inference, today released support for its VOLLO® inference accelerator on the AMD Alveo™ V80 compute accelerator card.

Microsecond ML inference

VOLLO achieves industry-leading ML inference compute latencies, which can be less than one microsecond, while delivering excellent throughput, power, and rack space efficiencies. This new release enables ML developers with larger models to benefit from running them on a single FPGA for the lowest latency. For example, a 22 million parameter, 3-layer LSTM model can be run with a p99 latency of under 10 microseconds. Even larger models may be sharded and run across multiple FPGAs, still achieving lower latencies than competing solutions.

VOLLO has been in demand across a wide range of applications, including financial trading, wireless telecommunications, cybersecurity, network management, and others, where running ML inference at the lowest possible latency confers advantages in security, safety, profit, efficiency, and cost.

“Demand for VOLLO has come from both sides,” remarked Peter Baldwin, CEO of Myrtle.ai. “We have customers who have a fixed latency window, and they’re delighted to be able to run larger models at the same latencies that they could only achieve with smaller models before. We also have customers who want the very lowest latency they can achieve for their specific model. Increasing the size of models that can be run on a single FPGA has really helped both.”

“We’re delighted that VOLLO is now supported on the production-ready AMD Alveo V80 compute accelerator for memory-intensive workloads,” said Girish Malipeddi, director for Data Center FPGA business, AMD. “AMD customers can now run ML inference at very low latency using VOLLO, while those wishing to purchase a compute accelerator to run VOLLO have the option to choose the Alveo V80, with the highest model capacity of any single FPGA yet supported by VOLLO.”

The Alveo V80 card is based on the AMD Versal™ Adaptive SoC with 2.6M LUT logic density, 32GB of HBM, and an additional 32GB of DDR4 and 800G network interface. It features FPGA fabric to adapt the hardware to the application, coupled with HBM2e for large data sets and memory-intensive compute.

Interested parties may now download the ML-oriented VOLLO compiler from vollo.myrtle.ai today and discover what latencies can be achieved with their models on the AMD Alveo V80 compute accelerator card.

About Myrtle.ai

Myrtle.ai is an AI/ML software company that delivers world class inference accelerators on FPGA-based platforms from all the leading FPGA suppliers. With neural network expertise across the complete spectrum of ML networks, Myrtle has delivered accelerators for FinTech, Speech Processing, and Recommendation.

AMD, the AMD logo, Alveo, Versal, and combinations thereof are trademarks of Advanced Micro Devices, Inc.

Photo: https://mma.prnasia.com/media2/2739187/Myrtle_ai.jpg?p=medium600
Logo: https://mma.prnasia.com/media2/2739186/Myrtle_ai_Logo.jpg?p=medium600

Related Post

FPT、日本市場向けに初の商用チップを出荷

ATFX 2025: A Milestone Year of Global Expansion and Innovation

廣汽榮獲行業首張汽車數據安全管理體系認證證書

You missed

FPT、日本市場向けに初の商用チップを出荷

ATFX 2025: A Milestone Year of Global Expansion and Innovation

廣汽榮獲行業首張汽車數據安全管理體系認證證書

Xinhua Silk Road: Promotional film “Joining Minhang, Winning the Future” makes global online debut