Training AI on Rhinocommon using RAG

Hello,

I’ve just been playing around with Ollama and Open WebUI, and I found that it is possible to optimize LLMs using the RAG method. Ideally, I would like to train a small coding LLM on the Rhinocommon sdk documentation, but I couldn’t find a .chm file.

Is there any offline documentation that I could use? Has anyone tried this before?

There must be bunch of Rhino pros out there who have tried this before, or have some sage advice on how to proceed.

Ash

Hello,
I don’t know of a single file that contains all the Rhino documentation.
Perhaps you can consider using Google’s NotebookLM or Perplexity. Both have free versions and allow you to query documents or web links. You just need to enter the index URL of the documentation web page to search all the pages.
I haven’t tested with the Rhino documentation but with other documentation it works very well

You can download the offline RhinoCommon SDK documentation in the form of an XML help file from the Rhino Developer website. It works well for tools like RAG since it provides structured data.

1 Like

Here’s a giant json file that is used to drive all of the data on the RhinoCommon API website

10 Likes

Thanks @kitjmv that sounds like a good approach. It never occurred to me that Perplexity had that functionality, and I haven’t tried NotebookLM. To be honest, I am looking for something that can also work offline, because where I live Google et al are often blocked. I should try these approaches and compare the results.

Thanks again for your advice!

@stevebaer and @Paul39 that’s brilliant!

The XML and JSON approaches sound good. It appears that they are easy to format into the blocks that RAG requires. Although, it might take me a little while, as I’ve not done a lot with these languages in the past.

There was also some documentation and samples on the Mcneel GitHub page, so I’ll have a look at what I can do with that too.

Thanks again gentlemen, I do appreciate your help!

Why limit to Rhinocommon when you can build your own PointNet model that works excellently with Rhino

Just some ideas, you can do some very crazy integrations.

Also you can integrate :

Result :

AI really is the future, I am collaborating with one of the biggest Italian S/Y shipyards to create a full on drafter, way better than my first preview video, It can place symbology and more. NDA so I can’t share a sample here.

That’s just some ideas anyway.
Hope this gives you some food for though
Farouk
farouk.serragedine@gmail.com for any inquieries :wink:

5 Likes

Hi Farouk,

Thanks for this. This stuff is really cool. You are obviously much further down this rabbit hole than I am. Being slow and taking a while to digest stuff, it might be a few months until I have an idea about how I might integrate these into my workflow, so apologies for my tardiness. Currenlty, I’m still hedging towards the ideas of getting AI to generate tools to do tasks, rather than to directly generate data. However, I am happy to be proven wrong. I should drop you an email.

Thanks again, I do appreciate it.

@stevebaer , just wondering if the same kind of api document exists for the Grasshoppper API? (Currently, I can only find the .chm file: grasshopper-api-docs/api/grasshopper at gh-pages · mcneel/grasshopper-api-docs · GitHub)

No, I don’t think we ever got to building out an API site for Grasshopper like we did for RhinoCommon. It is probably possible to tweak the ‘docify’ application I wrote to spit out a big json file for grasshopper. You can find this project at

2 Likes

That’s fantastic! Thanks @stevebaer this is a big help. When I’m a little further along, I’ll post what I have below.

1 Like

Rhino RAG

Modelfiles.zip (676.4 KB)

Let me post my progress so far in case it’s useful to anyone. My basic goal is to have offline, local LLMs accessing the Rhinocommon documentation and either finding items within the documentation and explaining how to implement them, or using the queried information to generate code.

Elements

For Software, I’m using Ollama with Open WebUI, plus MSTY for downloading .guff files from huggingface.co (it sets them up nicely for Ollama to find).

The next thing, is that I create a modelfile (ollama/docs/modelfile.md at main · ollama/ollama · GitHub) that describes the purpose of the LLM and it’s parameters. Because I am using quite basic LLMs, having a good system prompt and carefully controlling the parameters really helps to ensure that the outcome is of a decent quality.

Attached you should find 3 modelfiles. One is for Codestral, which is larger (my preferred local LLM for coding), another is for Phi4, which is kind of medium sized. I use both of these for generating code. The final one is for Granite, which is smaller, but fairly robust, I use that for querying the documentation because it’s faster.

The final components are the actual documentations themselves, which are formatted text documents, with a layout that is easy for the embedding LLMs to digest. I have created two versions of the Rhinocommon documentation, one is the full version and the other is smaller for just the essentials (focused around Geometry).

Creating the Models

Obviously install the 3 packages above, if you haven’t already. (My preferred method is via python for open-webui rather than docker).

In the terminal, you need to create your models. So that would be:
ollama create NAME -f LOCATION
Where NAME is the name that you want your model to be called (e.g. Rhino_RAG), and LOCATION is the place where your modelfile is located.
I have included a little python script app. When you run it, you can just drag and drop your modelfiles onto it and then hit create and it will create the models for you.
Please note that if you don’t already have the LLMs downloaded, then the command will automatically download the model for you, and then wrap it in the modelfile to create the new model.

Initial Setting

Let me just explain how RAG works in Open WebUI. Your prompt and the information generated gets passed between different LLMs:

Your Prompt → Rewritten for the Embedding LLM → Embedding LLM gets Document Data → Reranker selects information to add to your Prompt → Prompt + Document Data given to the main model to process.

The main model is not suitable for rewriting the prompt for the Embedding LLM, so we need to find some thing small and fast. In Open WebUI, we define this by through the Admin Panel settings. Click your name in the bottom left and select Admin Panel, then Settings at the top, then click ‘Interface’ on the left.

Here you will be able to select a ‘Local Model’, which will do all of those background AI things for us, including giving the prompt to the embedding LLM. From the dropdown choose a small model. I use Gemma3 (gemma3:4b-it-qat).

Set Up Embedding LLMs

Next, still in Admin Panel and Settings, we need to go to ‘Documents’ on the left. This is where you set up the LLMs that are going to translate (embed) your documents into a vector database. They will also be the LLMs that retrieve the data from the database and add it to the context for your main model.

Here I have used two LLMs, Mixedbread (mxbai-embed-large:latest from Ollama) for the embedding, and using MSTY I’ve downloaded jina reranker (jina-reranker-v1-turbo-en.f16-1746414494057).

In order to see the option for the reranker, you need to turn on ‘Hybrid’. I’ve also reduced the size of the Chunks to 700 and Overlap to 100, because it seems to match the documentation size better.

‘Top K’ defines how many results the embedding LLM should return, so I put that quite high at 21. ‘Top K Reranker’ defines how many results the reranker LLM should give to your main model (you don’t want to overwhelm it) 10 seems to be a good number. ‘Relevance Threshold’ helps to trim off those results that have a low relevance. 0.2 has worked well for me so far.

Embedding the Documents

We embed the documents through the ‘Knowledge’ section. So if you go to ‘Workspace’ in the top left you will see Knowledge, click that. Press the + on the right to create a new ‘Collection’, this is like a library of documents. Give it a name and brief description, then inside on the right click the + to add your files. It takes a while to process, so be patient.

I have created two ‘Collections’, one with the Essentials version of the documentation, and the other for the Full version.

Giving the Document to the LLM

There are different ways to do this in Open WebUI, you do it in the chat, which I find repetitive, or you can directly link it to the model. Since the purpose of this model is RAG, I just link it directly.

To do this we need to go to the Admin Panel and Settings. This time if we select ‘Models’ you will see a list of your models that we can now edit. Select the model that you want to attach the ‘Knowledge’ database to. Just go to ‘Knowledge → Select Knowledge’ and add which ever ones you like. You can also edit other details of the model here. Keep citation ticked.

(You could actually define the whole model here, notice there is ‘System Prompt’ and ‘Advanced Params’, which is the same as information in the model file, the only problem is that you won’t be able to use the model in other apps).

Using the Models

Now create a New Chat, select the model that you have just created from the drop down list and ask it a question. One of my test queries is:

Create a rounded rectangle component in Grasshopper using Rhino C#. The inputs are width, height and radius. It should have two outputs, a list of the curves, and a single curve that is made from joining the curves.

It takes a little while to answer because it has to query the documentation first, so be patient!
(You should see the updated query that your Local Model has generated for the embedding next to the spinning icon.)

Further Uses

If you would prefer a different interface for Ollama, you can do a similar setup through MSTY, or directly in VS Code. I’ve been using the Continue extension with some success. Cline and Copilot itself also allow access your models on Ollama, but you will need to set up the Documents differently in each.

9 Likes

Have you tried the qwen3 models for this? I have been blown away by the performance of qwen3:8b having been a bit frustrated by qwen2.5:7b previously. It does a surprisingly good job of following instructions, coding and even running re-ACT agent workflows.

1 Like

Great question! To be honest, I’ve been put off any model that has a ‘thinking’ mode, because it took so long to return an answer. I did a few tests I did a while back, and at that time I was getting quicker results from Llama3.3, even though it was a 70B parameter model and spat out ~20 words a minute. Qwen3 has been getting rave reviews, so it’s interesting that you have brought it up. That’s definitely worth exploring.

If you wanted to edit any of the modelfiles to be Qwen3 compatible, you would just have to change the first line to:

FROM qwen3:8b
1 Like

use /no_think to disable thinking mode on qwen 3 - it still emits <think> tags but they are empty.

“/no_thinking” qwen3:30b was remarkabley fast, and out of the box, without a modelfile, the results were pretty good for the coding structure, but a bit buggy in the logic. qwen3:30b gave moderately better results than qwen3:32b.

It also is familiar with a lot of Rhinocommon, so it will use the correct methods, even if it’s not clear how to input the parameters, which is not a massive problem, and could make it useful for generating code.

For RAG, it just didn’t use the references. I think the reason is that the context window is relatively small (32k), and even if technically, it is big enough to take in the additional content from the embedding LLM, it is common for LLMs to get a bit overwhelmed, or forgetful when the context gets larger.

What it will be useful for is code refactoring and optimization. I’m experimenting with modelfiles and context documents for that too, and will share anything that works well.

1 Like