ANE Acceleration for llama.cpp — ggml CoreML Backend + QVAC Integration

Apply for ANE Acceleration for llama.cpp — ggml CoreML Backend + QVAC Integration

01/05/2026

5,000 USD₮

DescriptionThe Apple Neural Engine (ANE) shipped on M3 / M4 / M5 Macs and A17 Pro / A18 / A19 iPhones and iPads delivers high-throughput int8 matrix multiplication at a fraction of the energy cost of the Metal GPU path. llama.cpp and ggml currently route all LLM inference through Metal or CPU, leaving the ANE unused on every Apple device that ships one.

Draw Things recently demonstrated that ANE can be integrated as a targeted accelerator for int8 matmul inside a custom inference stack — using CoreML exclusively as a front door to ANE — while the host runtime keeps full ownership of intermediate allocations, KV cache, kernel caching, and scheduling. This grant funds the integration of that architectural pattern into llama.cpp / ggml and the QVAC addon stack: ANE accelerates matmul, Metal / CPU continue to own everything else, and the existing Metal path remains the fallback.

ANE Acceleration for llama.cpp — ggml CoreML Backend + QVAC Integration

Apply for ANE Acceleration for llama.cpp — ggml CoreML Backend + QVAC Integration

ANE Acceleration for llama.cpp — ggml CoreML Backend + QVAC Integration

5,000 USD₮

Apply

01/05/2026

QVAC SDK — Swift Client

3,000 USD₮

Apply

01/05/2026

See More Bounties

ANE Acceleration for llama.cpp — ggml CoreML Backend + QVAC Integration

Apply for Bounty