It could be even easier, we implemented a two click install open-source local AI manager (+RAG and other cool stuff) for Windows / Mac / Linux. You can check it out in shinkai com or check out the code in https://github.com/dcspark/shinkai-apps
Depending on your internet speed, with llamafile you can do it even faster. Go to https://github.com/Mozilla-Ocho/llamafile , find Quickstart section and have fun. Scroll down a bit further for more models.
I only have some light use cases, so I use a cheap laptop (<$250) with a ryzen APU 8gb soldered/shared ram. Then added a 16gb ram stick, booted off a usb bios from github, increased uma buffer to 8gb. I had stable diffusion working, it was slow, but I'm pretty sure its faster than cpu/ram (2 min for 512x512 20-25 step).
Not really, no. You want to use the GPU not the CPU. Macs are neat here since they can use the shared memory with a rather high bandwidth. So even if the GPU is much slower, the ram is much worse than proper vram, and ridiculously overpriced for that,... often the bottlenecks are ram amount and bandwidth.
ollama. download a few models. bobs your uncle. simple as.
better still you can the use python to call it with langchain chatollama and build anything you want with a little help from claude and chatgpt or codeqwen if you want to do it all locally.
absolute AI don. impress the ladies with that one, you'll be beating them off with a stick when they see it in action.
It could be even easier, we implemented a two click install open-source local AI manager (+RAG and other cool stuff) for Windows / Mac / Linux. You can check it out in shinkai com or check out the code in https://github.com/dcspark/shinkai-apps
What exactly does the P2P network do? Does my node communicate with the nodes of strangers?
only when you arent looking
Depending on your internet speed, with llamafile you can do it even faster. Go to https://github.com/Mozilla-Ocho/llamafile , find Quickstart section and have fun. Scroll down a bit further for more models.
I use LM studio https://lmstudio.ai/ for my lazy setups. The 10 minutes is used to download the actual models.
Any M-series mac with 16GB+ can do this also.
Any PC regardless of CPU architecture with 16GB+ can do this, right?
You're supposed to act impressed that they recommended an Apple product.
And yes, but there's no lower limit on the memory; it's entirely dependent on the model or kernel size.
I only have some light use cases, so I use a cheap laptop (<$250) with a ryzen APU 8gb soldered/shared ram. Then added a 16gb ram stick, booted off a usb bios from github, increased uma buffer to 8gb. I had stable diffusion working, it was slow, but I'm pretty sure its faster than cpu/ram (2 min for 512x512 20-25 step).
Not really, no. You want to use the GPU not the CPU. Macs are neat here since they can use the shared memory with a rather high bandwidth. So even if the GPU is much slower, the ram is much worse than proper vram, and ridiculously overpriced for that,... often the bottlenecks are ram amount and bandwidth.
Having run ollama on CPU: Yes, it's just slower. Not even intolerably slow IMO, though I used small models and don't mind some turnaround time.
I've seen llamafile go about 10x faster on CPU if you try it.
Really! Same model and everything? I guess I need to go benchmark them - 10x would legit obviate GPU for me
I've run it on a Pi 5 with 8Gb, and get about a token a second
M-series are a LOT faster :)
Even my M1/16GB gets decent speeds. 7+ tokens/second with llama3
ollama. download a few models. bobs your uncle. simple as.
better still you can the use python to call it with langchain chatollama and build anything you want with a little help from claude and chatgpt or codeqwen if you want to do it all locally.
absolute AI don. impress the ladies with that one, you'll be beating them off with a stick when they see it in action.
just need plenty of VRAM after that.