[FEEDBACK] Performance: Please optimize the code core by using multi-core and data oriented cache allocation layout

AlanMattano · November 27, 2019, 3:08am

I was making a mirror of this complex model. Multiplication of x*-1 in one axis. And I notice that the mirror tool is pretty fast to create an instance as a user interface and show it up as if it is all in cash. I can rotate it and place it where I want with no problem. The problem arrives when applying that transformation. It takes like a minute or several minutes. (i5 9600K ram32Gb 1080Ti) Applying It looks like several orders of magnitude slower. I understand that there is a lot under the hood. And Rhino was alway having this behavior. But it looks like not optimized. Also copy and past suffers from this behavior. probably scale too (*10). I can imagine that at some point, I’m unable to work because of applying some transformation take too much time and I give up.

And after that, I wait like a minute till the mirror was done.

I’m pretty sure (or at least it feels like) that can be optimized by allocating in memory and applying only the transformation of what is needed. Probably you are working on that. Also, consider multithreading using 128 cores to come. Allow the user to go on over the draft model while Rhino uses all the other cores to make the transformations to the final model so that it does not freeze.

Thanks for reading. Rhino is super awesome.

JimCarruthers · November 27, 2019, 4:07am

“Content creation” as the kids call it these days just isn’t inherently multithread-able, outside of very specific parts of very specific operations that can easily be split into independent units. You’re mostly doing this thing then that thing then the other thing, 9 women can’t make a baby in 1 month. There’s basically zero chance of ever seeing massive general speedups for any kind of ‘productivity’ application from parallelization, only being able to have more stuff going on at once.

FYI, copying and pasting the same thing as exporting and importing a file, that’s literally what it’s doing.

Your Rhino model is at the core a kind of database, and it’s not possible(I’m sure there’s an asterisk there but what Google does with thousands of distributed servers isn’t really applicable to this) to let more than one ‘thing’ happen to it at a time while making sure it won’t blow up and implementing Undo. The quick preview you see for something like that Mirror is only working on a fraction of the data that has to be copied make a new object, and it’ll be forgotten the moment you hit Enter.

AlanMattano · November 28, 2019, 2:35am

Hi Jim, Nice car airplane. I was doing something very similar (tailless) at school 90s. Then at Pininfarina, we try to sell it but thinking better in that idea of having an airplane on the road, we give up. An airplane is better to keep it intact in a safe environment. Not vulnerable to a hostile environment as it is the street. Safety first. Still, the evolution of the materials will say.

Thanks for the answer. I appreciate this deeper inside code.

Yes, there will be more amounts of cashe levels IPC and Hz, but Moore’s law no longer holds. ALMA tells me different. That progress is unstoppable. The evolution for CPU from now on will be only more cores (thousands) for each microchip and more chips clusters inside one single main CPU. Can also grow vertically. So Rhino must implement multi-core, as well as games, are starting to do so. I’m a certified C# game developer (with a little of experience) making a multi-core game. If not, cores will seat there doing just nothing. And Rhino in 2020 will be using about 5 or 1 percent of the full capacity of an advance CPU.In other words a mirror will always takes mints unless programmers change the code data layout.

This kind of exporting and importing for a task takes too much time for each function. I’m using a M.2 PCIe SSD capable of 3Gb sec. And the file is just 43Mb. So is not there the problem. What does this import-export mean? that is moving a lot of data from ram to CPU? Where is the bottleneck?

Also, I notice that Rhino is not using 100% of that single core. It is moving all data containing all the undo history to that single CPU core?

The quick preview can be extended in time and not delete after pressing enter. So if I hit Undo, or orbit around my model to inspect it and later I hit undo, (we can inspect it after applying the function) and Rhino in a second core is making the operation or canceling that. This quick preview must be the way of working and inspecting (not forgotten after we hit enter) and Rhino on other tread is making all the operations one after another. At some point, when it catches up, replace it with the new object. We pass a lot of time inspecting the object. Because is frustrating that I can’t inspect the object when I hit enter. And I know that after a minute waiting for the result (that will be exactly the same as the forgotten object!) I will press undo because often the user make the same things several times.

It looks like it is manipulating too much data for just simple arithmetic operations on vectors.
Probably a better option can be just to extract what is needed to manipulate, put it in order and execute the calculations only on that using better the stack and the heap. And not on the entire data ( data-oriented code design) .

AlanMattano · November 28, 2019, 3:42am

I also found this answer from @dale

Hi Robin,

Rhino is not a “multi-threaded” application. Well, it does split off a few minor processes to other cores but nothing major. That’s because modeling is a serial process. Modeling has to be done “in order.”

Let’s use the example of a box with filleted edges in a shaded display. The render mesh needed for the shaded display can’t be generated until after the edges of the box are filleted, and the fillets themselves can’t be made until after the box is created. First the box is made, then the edges are filleted, then the render mesh created. You can’t put the box creation in one thread, the filleting in a second, and the mesh generation in a third and run all three processes at the same time. They have to be done one at a time, in the right order.

Some tasks in computer work can be multi-threaded. Rendering is a good example. Since an array of pixels are being generated into an image, the image can be broken into 4 quadrants, and each processor can work on one quadrant independently.

With this all said, we are and will continue to look for areas where we can increase performance by threading a calculation or an operation.

Thanks,

– Dale

But it does not convince me. I can understand that modeling. But If you make a mirror of a mesh you just need few ms to multiply by -1 the mesh points in one axis. The same applies to move the polysurface object. You do not need to rebuild the entire object from scratch each time to maintain a nondestructive way. And if you need to do so you can do it in the background on a second core and laying out the data to use 100% of that core.

In other words, I wish Rhino does not forget that object at the moment you hit Enter and eventually let you continue working.

JimCarruthers · November 28, 2019, 1:44pm

Of course that’s not necessarily what it does, you don’t know if it uses that or not. What you’re missing is that applying a transform to a mesh is NOT updating the model database, it’s a minuscule fraction of that work.

The oft-requested dream of having tasks churn away in the backgound while you do something else isn’t likely to happen because how is Rhino supposed to stop the database from breaking if you change something else–remember Rhino has a plugin architecture that lets people do anything they want at any time–and how much the added overhead would be vs just letting the task finish or finding other ways to optimize.

AlanMattano · November 28, 2019, 9:41pm

Working with a preview can be very handy. I understand the difficulty but not the problem. I think is possible. A preview, working with a fraction of that work can be handy.

For example, if you copy and paste a mesh, with few 8000 triangular polygons it takes 10 seconds (i5 9600K 4,8Gz). And this mesh is just imported fresh and new. If I take out the material, takes around 8 seconds to copy 8k poly. This instance copy must take only like 10 milliseconds in the interface (or in GPU). So the user can move the instance, visualize it in the final position, meanwhile you can orbit around, think, make a new decision, at that point probably the 8sec copy will be done. And press new functions script. If the copy (in the background) did not finish thinking or updating the instance, it will freeze until finish coping and later will start the new action.

Just making a mirror is pretty obvious about this problem and solution.

AlanMattano · November 28, 2019, 10:04pm

I’m starting to think that mys slow down could be a bug related with normal map textures or material problem. If I save small taking out everything the mesh is not the problem and Rhino is pretty fast.

3DM files
V1 past and copy slow (includes a normal map texture)
V2 past and copy fast (does not include a normal map texture)

PolyReductionV1.3dm (450.4 KB)
PolyReductionV2.3dm (410.5 KB)

Gijs · November 28, 2019, 11:09pm

no difference here between the models. can you post the normal map?

AlanMattano · November 28, 2019, 11:15pm

OK looks like it was an 8k normal map. I’m unable to upload that one. I use 8k occasionally
here a 1K

Gijs · November 28, 2019, 11:27pm

no strange things here… copying 144 tires took about 5 secs, pasting 2 secs

AlanMattano · November 28, 2019, 11:28pm

upscale this normal map to 8k and copy one mesh

Gijs · November 28, 2019, 11:34pm

ok, some strange things: when I do that, no matter how many I copy (1 or many), it takes about 4 seconds to paste. If however I make a copy with gumball, the copy/paste (move with copy) is always instant

AlanMattano · November 28, 2019, 11:41pm

Are you using 8k texture map?
And if you make a mirror?

Gijs · November 28, 2019, 11:56pm

sorry, I was using 4K… at 8k it’s about 10 seconds to paste

this goes instantly, same as gumball copy, also at 8k

nathanletwory · November 29, 2019, 5:20am

Copy (CopyToClipboard) also takes along all the materials in the document. If there are textures, those are embedded with that copy. Huge textures make that extra slow.

Note that 1k -> 8k is an increase of 64 times the data…

There are some bugs in our tracker related to this copy&paste problem. I’ve nudged them a little.

Gijs · November 29, 2019, 7:48am

@nathanletwory both seem to make no sense to me. Why is it needed to copy all materials instead of just the material(s) of the object? And copying the textures seems completely unnecessary as these are always referenced.
Also what’s odd is that copying doesn’t take much time, it’s the paste that is becoming very slow.

wim · November 29, 2019, 8:15am

Hoi Gijs,

as Nathan wrote, the current copy & paste behavior is considered to be a bug - RH-47093.

Gijs · November 29, 2019, 8:43am

@wim it seems a different issue. Here copy is not slow but paste is. Also there is no mention in that youtrack about ALL materials being copied?

wim · November 29, 2019, 9:32am

@Gijs - it seems to me that the copy and paste terms are used somewhat interchangeably…
The RH-47093 issue was marked “Related to” RH-25208 (“Paste Takes Forever”) that is visible to developers only:

In that issue, I take “737 unused materials” to mean ALL materials.
-wim

Topic		Replies	Views
Rhino 5 - very slow performance Rhino for Windows	8	13908	October 24, 2020
More cores? Windows Hardware	17	6416	October 7, 2015
2D is Slow... Trying FreeCAD after 27 years w/Rhino Rhino for Windows windows	4	233	October 30, 2024
Increase performance tips! - Please add your suggestions! Rhino for Windows	8	1550	June 12, 2017
ReduceMesh CPU use Rhino for Windows	3	746	January 21, 2014

[FEEDBACK] Performance: Please optimize the code core by using multi-core and data oriented cache allocation layout

Related topics