Run LLMs Locally | Fardeem Munir

Just use GPT4All.

It comes with a nice wrapper app that you can use just like ChatGPT but have the option to download a lot more models. Like:

MPT-7B Chat from Mosaic
Stable Vicuna from StabilityAI
WizardLM

And yes, there's an API!

import gpt4all

# Takes a bit the first time to download the model
# Caches for later use
gptj = gpt4all.GPT4All("ggml-gpt4all-j-v1.3-groovy")

messages = [{
  "role": "user",
  "content": "Name 3 colors"
}]

gptj.chat_completion(messages)

I was worried about inference speed but on my M1 Macbook it was pretty fast. I didn't think it was considerably slower than ChatGPT.

Happy LLM-ing!