
{"id":9860,"date":"2026-07-04T07:02:10","date_gmt":"2026-07-03T23:02:10","guid":{"rendered":"https:\/\/infernews.com\/blog\/llm-inference-server-with-continuous-batching-amp-ssd-caching-for-apple-silicon\/"},"modified":"2026-07-04T07:06:10","modified_gmt":"2026-07-03T23:06:10","slug":"llm-inference-server-with-continuous-batching-amp-ssd-caching-for-apple-silicon","status":"publish","type":"post","link":"https:\/\/infernews.com\/blog\/llm-inference-server-with-continuous-batching-amp-ssd-caching-for-apple-silicon\/","title":{"rendered":"oMLX\uff1a\u628a Mac \u8b8a\u6210\u672c\u5730 LLM \u63a7\u5236\u53f0"},"content":{"rendered":"\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/infernews.com\/blog\/wp-content\/uploads\/2026\/07\/oMLX-Mac-LLM.jpg\" alt=\"oMLX\"\/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">oMLX \u662f\u4e00\u500b\u91dd\u5c0d Apple Silicon \u7684\u672c\u5730 LLM \u63a8\u7406\u5de5\u5177\uff0c\u4e5f\u662f\u5e36\u6709\u5716\u5f62\u4ecb\u9762\u8207 CLI \u7684\u4f3a\u670d\u5668\u7ba1\u7406\u9805\u76ee\u3002\u5b83\u4e3b\u8981\u89e3\u6c7a\u7684\u4e0d\u662f\u300c\u80fd\u4e0d\u80fd\u8dd1\u6a21\u578b\u300d\uff0c\u800c\u662f\u600e\u6a23\u5728 Mac \u4e0a\u8f03\u7a69\u5b9a\u5730\u7ba1\u7406\u591a\u500b\u6a21\u578b\u3001\u4fdd\u7559 KV cache\uff0c\u4e26\u6e1b\u5c11\u91cd\u8907\u8a08\u7b97\u5e36\u4f86\u7684\u7b49\u5f85\u6642\u9593\u3002<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u9019\u500b\u9805\u76ee\u7684\u53d6\u5411\u5f88\u660e\u78ba\uff1a\u7528\u9078\u55ae\u5217\u4ecb\u9762\u8655\u7406\u5e38\u898b\u64cd\u4f5c\uff0c\u518d\u914d\u5408\u7d42\u7aef\u6a5f\u8207 Apple Shortcuts \u63a7\u5236\u540c\u4e00\u500b\u670d\u52d9\u3002\u5b89\u88dd\u8def\u7dda\u4ea6\u76f8\u7576\u76f4\u63a5\uff0cmacOS \u7528\u6236\u53ef\u900f\u904e .dmg \u5b89\u88dd\uff0c\u53e6\u6709 Homebrew \u65b9\u5f0f\uff1b\u65e5\u5fd7\u4f4d\u7f6e\u3001\u80cc\u666f\u670d\u52d9\u8207 CLI shim \u90fd\u5df2\u4ea4\u4ee3\uff0c\u5c0d\u9700\u8981\u9577\u6642\u9593\u958b\u8457\u672c\u5730\u6a21\u578b\u7684\u4eba\u8f03\u53cb\u5584\u3002<\/p>\n\n\n<figure class=\"wp-block-embed-youtube wp-block-embed is-type-video is-provider-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"lyte-wrapper\" title=\"Finally, The CORRECT Way to Run Local AI on a Mac\" style=\"width:853px;max-width:100%;margin:5px auto;\"><div class=\"lyMe\" id=\"WYL_JpJaEPGzPF4\" itemprop=\"video\" itemscope itemtype=\"https:\/\/schema.org\/VideoObject\"><div><meta itemprop=\"thumbnailUrl\" content=\"https:\/\/infernews.com\/blog\/wp-content\/plugins\/wp-youtube-lyte\/lyteCache.php?origThumbUrl=https%3A%2F%2Fi.ytimg.com%2Fvi%2FJpJaEPGzPF4%2Fhqdefault.jpg\" \/><meta itemprop=\"embedURL\" content=\"https:\/\/www.youtube.com\/embed\/JpJaEPGzPF4\" \/><meta itemprop=\"duration\" content=\"PT9M3S\" \/><meta itemprop=\"uploadDate\" content=\"2026-06-30T14:00:12Z\" \/><\/div><div id=\"lyte_JpJaEPGzPF4\" data-src=\"https:\/\/infernews.com\/blog\/wp-content\/plugins\/wp-youtube-lyte\/lyteCache.php?origThumbUrl=https%3A%2F%2Fi.ytimg.com%2Fvi%2FJpJaEPGzPF4%2Fhqdefault.jpg\" class=\"pL\"><div class=\"tC\"><div class=\"tT\" itemprop=\"name\">Finally, The CORRECT Way to Run Local AI on a Mac<\/div><\/div><div class=\"play\"><\/div><div class=\"ctrl\"><div class=\"Lctrl\"><\/div><div class=\"Rctrl\"><\/div><\/div><\/div><noscript><a href=\"https:\/\/youtu.be\/JpJaEPGzPF4\" rel=\"nofollow noopener\" target=\"_blank\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/infernews.com\/blog\/wp-content\/plugins\/wp-youtube-lyte\/lyteCache.php?origThumbUrl=https%3A%2F%2Fi.ytimg.com%2Fvi%2FJpJaEPGzPF4%2F0.jpg\" alt=\"Finally, The CORRECT Way to Run Local AI on a Mac\" width=\"853\" height=\"460\" \/><br \/>Watch this video on YouTube<\/a><\/noscript><meta itemprop=\"description\" content=\"\u2588\u2580\u2588 \u2588\u2580\u2580 \u2584\u2580\u2588 \u2588\u2580\u2584 \u2588\u2580\u2584\u2580\u2588 \u2588\u2580\u2588 \u2588\u2580\u2588 \u2588\u2580\u2580 \u2588\u2580\u2584 \u2588\u2588\u2584 \u2588\u2580\u2588 \u2588\u2584\u2580 \u2588 \u2580 \u2588 \u2588\u2584\u2588 \u2588\u2580\u2584 \u2588\u2588\u2584 Download oMLX: https:\/\/omlx.ai\/ This video explores why OMLX is the definitive choice for founders looking to reclaim their data and run powerful LLMs locally on Mac hardware. Key Takeaways: - Why OMLX is superior to Ollama and LM Studio for professional Mac workflows. - The technical benefits of SSD-backed caching and LRU policies for persistent context. - How to set up agentic models like Qwen 3.6 MoE for real-world coding tasks. - A breakdown of why the M5 Max is the current sweet spot for personal AI infrastructure. - Practical steps to integrate local models into tools like Pie and Open Code. Code examples: https:\/\/samuelgregory.co.uk\/videos\/finally-the-correct-way-to-run-local-ai-on-a-mac Work with me: https:\/\/samuelgregory.co.uk --- Support the content: https:\/\/www.patreon.com\/0x5am5 Twitter: @0x5am5 &#36; cat tools.txt \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500 Kilo: https:\/\/samuelgregory.co.uk\/kilo-code Replit (Favourite Vibe Code Tool) : https:\/\/samuelgregory.co.uk\/replit Perplexity (deep research): https:\/\/samuelgregory.co.uk\/perplexity Claude Code: https:\/\/claude.ai\/api\/referral\/jZ9vnMedyQ&amp;v=p-CzOtUYEyA Warp Terminal: https:\/\/samuelgregory.co.uk\/warp \u2692\ufe0f more at https:\/\/samuelgregory.co.uk\/tools &#36; cat services.txt \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500 Domain Names: https:\/\/samuelgregory.co.uk\/namecheap Hosting: https:\/\/www.hostg.xyz\/aff_c?offer_id=6&amp;aff_id=130549 Online Storage (&#36;200 credit): https:\/\/samuelgregory.co.uk\/digital-ocean \u2692\ufe0f more at https:\/\/samuelgregory.co.uk\/tools &#36; cat gear.txt \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500 Sony A7c II: https:\/\/amzn.to\/40qaYEJ Lens Sigma 16-28mm: https:\/\/amzn.to\/3IaDzqx Microphone Samson QU2: https:\/\/amzn.to\/3TkshCE Macbook Pro M1 Max: https:\/\/amzn.to\/48736M6 &#36; cat books.txt \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500 The Full Stack Agency: https:\/\/flowst8.dev\/store Lingo: Agile: https:\/\/thefullstackagency.gumroad.com\/l\/agile-lingo Lingo: Startup: https:\/\/thefullstackagency.gumroad.com\/l\/startup-lingo &#36; cat timestamps.txt \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500 00:00 Finally, the correct way to run AI on a Mac 00:29 oMLX has a special trick up its sleeve 02:40 Where do Ollama and LM Studio land? 03:27 Downloading oMLX 03:48 Rundown of the UI and downloading models 04:49 Serving your local model 06:12 Seeing the cache in action 06:54 Playing around with parameters 07:29 Configuring providers in harnesses #LocalLLM #LocalAI #AI\"><\/div><\/div><div class=\"lL\" style=\"max-width:100%;width:853px;margin:5px auto;\"><\/div><figcaption><\/figcaption><\/figure>\n\n\n<p class=\"wp-block-paragraph\">\u5b83\u548c\u4e00\u822c\u672c\u5730 LLM server \u7684\u5dee\u7570\uff0c\u5728\u65bc\u5206\u5c64 KV cache \u8a2d\u8a08\u3002oMLX \u628a\u5e38\u7528\u5167\u5bb9\u7559\u5728 RAM \u7684 hot tier\uff0c\u4e0d\u5920\u4f4d\u6642\u518d\u8f49\u53bb SSD \u7684 cold tier\uff0c\u4e26\u4ee5 safetensors \u683c\u5f0f\u4fdd\u5b58\uff1b\u5373\u4f7f\u4f3a\u670d\u5668\u91cd\u555f\uff0c\u9047\u5230\u76f8\u540c\u524d\u7db4\u5167\u5bb9\u4ecd\u53ef\u91cd\u7528\u5feb\u53d6\uff0c\u9019\u5c0d\u9577\u5c0d\u8a71\u3001\u7de8\u7a0b\u8f14\u52a9\u548c\u5de5\u5177\u8abf\u7528\u5c24\u5176\u6709\u50f9\u503c\u3002<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u53ea\u9700\u9ede\u64ca\u4e00\u4e0b\uff0c\u5373\u53ef\u76f4\u63a5\u5f9e\u7ba1\u7406\u9762\u677f\u8a2d\u5b9a OpenClaw\u3001OpenCode\u3001Codex\u3001Hermes Agent\u3001Copilot \u548c Pi\u3002\u7121\u9700\u624b\u52d5\u7de8\u8f2f\u914d\u7f6e\u3002<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\u652f\u63f4 hot tier\uff08RAM\uff09\u8207 cold tier\uff08SSD\uff09\u5206\u5c64\u5feb\u53d6<\/li>\n\n\n\n<li>\u53ef\u81ea\u52d5\u4ee5 LRU \u65b9\u5f0f\u5378\u8f09\u8f03\u5c11\u4f7f\u7528\u7684\u6a21\u578b<\/li>\n\n\n\n<li>\u7ba1\u7406\u4ecb\u9762\u53ef\u624b\u52d5 load\uff0funload \u6a21\u578b<\/li>\n\n\n\n<li>\u63d0\u4f9b\u9078\u55ae\u5217\u64cd\u4f5c\u3001CLI \u8207 Apple Shortcuts \u6574\u5408<\/li>\n\n\n\n<li>\u9069\u5408\u9700\u8981\u9577\u4e0a\u4e0b\u6587\u8207\u591a\u6a21\u578b\u5207\u63db\u7684 Mac \u5de5\u4f5c\u6d41\u7a0b<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">\u73fe\u6709\u8cc7\u8a0a\u63d0\u5230 continuous batching\u3001context limits \u8207\u57fa\u6e96\u6e2c\u8a66\u9801\u9762\uff0c\u4f46 README \u7247\u6bb5\u672a\u5217\u51fa\u5177\u9ad4\u6578\u5b57\uff0c\u6240\u4ee5\u6027\u80fd\u5224\u65b7\u5b9c\u4fdd\u6301\u5be9\u614e\u3002\u53ef\u78ba\u5b9a\u7684\u662f\uff0c\u5b83\u8f03\u9069\u5408\u5728\u672c\u5730\u505a\u6301\u7e8c\u958b\u767c\u3001\u914d\u5408 Claude Code \u4e00\u985e\u5de5\u5177\uff0c\u4e26\u96c6\u4e2d\u7ba1\u7406\u300c\u5e38\u99d0\u5c0f\u6a21\u578b\uff0b\u6309\u9700\u5207\u63db\u5927\u6a21\u578b\u300d\u7684\u5718\u968a\u6216\u500b\u4eba\u74b0\u5883\uff1b\u76f8\u95dc\u6a21\u578b\u65b9\u9762\uff0c\u5167\u5bb9\u660e\u78ba\u63d0\u5230 everyday models\u3001heavier models\uff0c\u4ee5\u53ca\u53ef\u9078\u7684 GLM-5.2\u3001MiniMax M3 \u539f\u751f custom kernels \u652f\u63f4\u3002<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/github.com\/jundot\/omlx\" rel=\"noopener noreferrer\" target=\"_blank\"><strong>GitHub<\/strong><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>oMLX \u91dd\u5c0d Mac \u672c\u5730 LLM \u90e8\u7f72\u800c\u8a2d\uff0c\u91cd\u9ede\u4e0d\u662f\u6a21\u578b\u672c\u8eab\uff0c\u800c\u662f\u8b93\u5feb\u53d6\u3001\u8f09\u5165\u8207\u5207\u63db\u66f4\u7701\u5fc3\u3002<\/p>\n","protected":false},"author":8,"featured_media":9859,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"ai_generated_summary":"","footnotes":""},"categories":[133,116,166,197,76,187],"tags":[],"class_list":["post-9860","post","type-post","status-publish","format-standard","hentry","category-133","category-agentic","category-mac","category-framework","category-76","category-187"],"_links":{"self":[{"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/posts\/9860","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/users\/8"}],"replies":[{"embeddable":true,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/comments?post=9860"}],"version-history":[{"count":2,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/posts\/9860\/revisions"}],"predecessor-version":[{"id":9863,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/posts\/9860\/revisions\/9863"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/media\/9859"}],"wp:attachment":[{"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/media?parent=9860"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/categories?post=9860"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/tags?post=9860"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}