Rhino RAG
Modelfiles.zip (676.4 KB)
Let me post my progress so far in case it’s useful to anyone. My basic goal is to have offline, local LLMs accessing the Rhinocommon documentation and either finding items within the documentation and explaining how to implement them, or using the queried information to generate code.
Elements
For Software, I’m using Ollama with Open WebUI, plus MSTY for downloading .guff files from huggingface.co (it sets them up nicely for Ollama to find).
The next thing, is that I create a modelfile (ollama/docs/modelfile.md at main · ollama/ollama · GitHub) that describes the purpose of the LLM and it’s parameters. Because I am using quite basic LLMs, having a good system prompt and carefully controlling the parameters really helps to ensure that the outcome is of a decent quality.
Attached you should find 3 modelfiles. One is for Codestral, which is larger (my preferred local LLM for coding), another is for Phi4, which is kind of medium sized. I use both of these for generating code. The final one is for Granite, which is smaller, but fairly robust, I use that for querying the documentation because it’s faster.
The final components are the actual documentations themselves, which are formatted text documents, with a layout that is easy for the embedding LLMs to digest. I have created two versions of the Rhinocommon documentation, one is the full version and the other is smaller for just the essentials (focused around Geometry).
Creating the Models
Obviously install the 3 packages above, if you haven’t already. (My preferred method is via python for open-webui rather than docker).
In the terminal, you need to create your models. So that would be:
ollama create NAME -f LOCATION
Where NAME is the name that you want your model to be called (e.g. Rhino_RAG), and LOCATION is the place where your modelfile is located.
I have included a little python script app. When you run it, you can just drag and drop your modelfiles onto it and then hit create and it will create the models for you.
Please note that if you don’t already have the LLMs downloaded, then the command will automatically download the model for you, and then wrap it in the modelfile to create the new model.
Initial Setting
Let me just explain how RAG works in Open WebUI. Your prompt and the information generated gets passed between different LLMs:
Your Prompt → Rewritten for the Embedding LLM → Embedding LLM gets Document Data → Reranker selects information to add to your Prompt → Prompt + Document Data given to the main model to process.
The main model is not suitable for rewriting the prompt for the Embedding LLM, so we need to find some thing small and fast. In Open WebUI, we define this by through the Admin Panel settings. Click your name in the bottom left and select Admin Panel, then Settings at the top, then click ‘Interface’ on the left.
Here you will be able to select a ‘Local Model’, which will do all of those background AI things for us, including giving the prompt to the embedding LLM. From the dropdown choose a small model. I use Gemma3 (gemma3:4b-it-qat).
Set Up Embedding LLMs
Next, still in Admin Panel and Settings, we need to go to ‘Documents’ on the left. This is where you set up the LLMs that are going to translate (embed) your documents into a vector database. They will also be the LLMs that retrieve the data from the database and add it to the context for your main model.
Here I have used two LLMs, Mixedbread (mxbai-embed-large:latest from Ollama) for the embedding, and using MSTY I’ve downloaded jina reranker (jina-reranker-v1-turbo-en.f16-1746414494057).
In order to see the option for the reranker, you need to turn on ‘Hybrid’. I’ve also reduced the size of the Chunks to 700 and Overlap to 100, because it seems to match the documentation size better.
‘Top K’ defines how many results the embedding LLM should return, so I put that quite high at 21. ‘Top K Reranker’ defines how many results the reranker LLM should give to your main model (you don’t want to overwhelm it) 10 seems to be a good number. ‘Relevance Threshold’ helps to trim off those results that have a low relevance. 0.2 has worked well for me so far.
Embedding the Documents
We embed the documents through the ‘Knowledge’ section. So if you go to ‘Workspace’ in the top left you will see Knowledge, click that. Press the + on the right to create a new ‘Collection’, this is like a library of documents. Give it a name and brief description, then inside on the right click the + to add your files. It takes a while to process, so be patient.
I have created two ‘Collections’, one with the Essentials version of the documentation, and the other for the Full version.
Giving the Document to the LLM
There are different ways to do this in Open WebUI, you do it in the chat, which I find repetitive, or you can directly link it to the model. Since the purpose of this model is RAG, I just link it directly.
To do this we need to go to the Admin Panel and Settings. This time if we select ‘Models’ you will see a list of your models that we can now edit. Select the model that you want to attach the ‘Knowledge’ database to. Just go to ‘Knowledge → Select Knowledge’ and add which ever ones you like. You can also edit other details of the model here. Keep citation ticked.
(You could actually define the whole model here, notice there is ‘System Prompt’ and ‘Advanced Params’, which is the same as information in the model file, the only problem is that you won’t be able to use the model in other apps).
Using the Models
Now create a New Chat, select the model that you have just created from the drop down list and ask it a question. One of my test queries is:
Create a rounded rectangle component in Grasshopper using Rhino C#. The inputs are width, height and radius. It should have two outputs, a list of the curves, and a single curve that is made from joining the curves.
It takes a little while to answer because it has to query the documentation first, so be patient!
(You should see the updated query that your Local Model has generated for the embedding next to the spinning icon.)
Further Uses
If you would prefer a different interface for Ollama, you can do a similar setup through MSTY, or directly in VS Code. I’ve been using the Continue extension with some success. Cline and Copilot itself also allow access your models on Ollama, but you will need to set up the Documents differently in each.