Just use GPT4All.
It comes with a nice wrapper app that you can use just like ChatGPT but have the option to download a lot more models. Like:
And yes, there's an API!
import gpt4all
# Takes a bit the first time to download the model
# Caches for later use
gptj = gpt4all.GPT4All("ggml-gpt4all-j-v1.3-groovy")
messages = [{
"role": "user",
"content": "Name 3 colors"
}]
gptj.chat_completion(messages)
I was worried about inference speed but on my M1 Macbook it was pretty fast. I didn't think it was considerably slower than ChatGPT.
Happy LLM-ing!