How to run an LLM on your PC, not in the cloud, in less than 10 minutes

nijaar 4 days ago

It could be even easier, we implemented a two click install open-source local AI manager (+RAG and other cool stuff) for Windows / Mac / Linux. You can check it out in shinkai com or check out the code in https://github.com/dcspark/shinkai-apps

dr_kiszonka 4 days ago

What exactly does the P2P network do? Does my node communicate with the nodes of strangers?
- jackdawipper 4 days ago
  
  only when you arent looking

butz 4 days ago

Depending on your internet speed, with llamafile you can do it even faster. Go to https://github.com/Mozilla-Ocho/llamafile , find Quickstart section and have fun. Scroll down a bit further for more models.

boboche 4 days ago

I use LM studio https://lmstudio.ai/ for my lazy setups. The 10 minutes is used to download the actual models.

sgt101 4 days ago

Any M-series mac with 16GB+ can do this also.

actionfromafar 4 days ago

Any PC regardless of CPU architecture with 16GB+ can do this, right?
- washadjeffmad 4 days ago
  
  You're supposed to act impressed that they recommended an Apple product.
  And yes, but there's no lower limit on the memory; it's entirely dependent on the model or kernel size.
- gentile 4 days ago
  
  I only have some light use cases, so I use a cheap laptop (<$250) with a ryzen APU 8gb soldered/shared ram. Then added a 16gb ram stick, booted off a usb bios from github, increased uma buffer to 8gb. I had stable diffusion working, it was slow, but I'm pretty sure its faster than cpu/ram (2 min for 512x512 20-25 step).
- diffeomorphism 4 days ago
  
  Not really, no. You want to use the GPU not the CPU. Macs are neat here since they can use the shared memory with a rather high bandwidth. So even if the GPU is much slower, the ram is much worse than proper vram, and ridiculously overpriced for that,... often the bottlenecks are ram amount and bandwidth.
  - yjftsjthsd-h 4 days ago
    
    Having run ollama on CPU: Yes, it's just slower. Not even intolerably slow IMO, though I used small models and don't mind some turnaround time.
    
    jart 4 days ago
    
    I've seen llamafile go about 10x faster on CPU if you try it.
    
    yjftsjthsd-h 4 days ago
    
    Really! Same model and everything? I guess I need to go benchmark them - 10x would legit obviate GPU for me
MarcScott 4 days ago

I've run it on a Pi 5 with 8Gb, and get about a token a second
- theshrike79 4 days ago
  
  M-series are a LOT faster :)
  Even my M1/16GB gets decent speeds. 7+ tokens/second with llama3

jackdawipper 4 days ago

ollama. download a few models. bobs your uncle. simple as.

better still you can the use python to call it with langchain chatollama and build anything you want with a little help from claude and chatgpt or codeqwen if you want to do it all locally.

absolute AI don. impress the ladies with that one, you'll be beating them off with a stick when they see it in action.

just need plenty of VRAM after that.