nijaar 4 days ago

It could be even easier, we implemented a two click install open-source local AI manager (+RAG and other cool stuff) for Windows / Mac / Linux. You can check it out in shinkai com or check out the code in https://github.com/dcspark/shinkai-apps

  • dr_kiszonka 4 days ago

    What exactly does the P2P network do? Does my node communicate with the nodes of strangers?

sgt101 4 days ago

Any M-series mac with 16GB+ can do this also.

  • actionfromafar 4 days ago

    Any PC regardless of CPU architecture with 16GB+ can do this, right?

    • washadjeffmad 4 days ago

      You're supposed to act impressed that they recommended an Apple product.

      And yes, but there's no lower limit on the memory; it's entirely dependent on the model or kernel size.

    • gentile 4 days ago

      I only have some light use cases, so I use a cheap laptop (<$250) with a ryzen APU 8gb soldered/shared ram. Then added a 16gb ram stick, booted off a usb bios from github, increased uma buffer to 8gb. I had stable diffusion working, it was slow, but I'm pretty sure its faster than cpu/ram (2 min for 512x512 20-25 step).

    • diffeomorphism 4 days ago

      Not really, no. You want to use the GPU not the CPU. Macs are neat here since they can use the shared memory with a rather high bandwidth. So even if the GPU is much slower, the ram is much worse than proper vram, and ridiculously overpriced for that,... often the bottlenecks are ram amount and bandwidth.

      • yjftsjthsd-h 4 days ago

        Having run ollama on CPU: Yes, it's just slower. Not even intolerably slow IMO, though I used small models and don't mind some turnaround time.

        • jart 4 days ago

          I've seen llamafile go about 10x faster on CPU if you try it.

          • yjftsjthsd-h 4 days ago

            Really! Same model and everything? I guess I need to go benchmark them - 10x would legit obviate GPU for me

  • MarcScott 4 days ago

    I've run it on a Pi 5 with 8Gb, and get about a token a second

    • theshrike79 4 days ago

      M-series are a LOT faster :)

      Even my M1/16GB gets decent speeds. 7+ tokens/second with llama3

jackdawipper 4 days ago

ollama. download a few models. bobs your uncle. simple as.

better still you can the use python to call it with langchain chatollama and build anything you want with a little help from claude and chatgpt or codeqwen if you want to do it all locally.

absolute AI don. impress the ladies with that one, you'll be beating them off with a stick when they see it in action.

just need plenty of VRAM after that.