Threading issue with Reduce Mesh?

Holo · January 8, 2015, 7:46pm

I have tested Holomark on a dual Xeon x5650 that I just purchased, and the Reduce mesh test in Holomark baffled me, it was half the speed of my i7, AND it used 100% of all 24 threads…

And I also had it tested on a i7-4930K
Edit: and it is 3x slower than my old i7 950.
No Rhino plugins installed on neither.

CPU scores: 4756
CPU_01 - 16.22 sec - Booleans and Contours
CPU_02 - 2.19 sec - Twist and Taper (UDT)
CPU_03 - 7.35 sec - Meshing Mini
CPU_04 - 0.03 sec - Extract Render Mesh
CPU_05 - 0.06 sec - Join Render Mesh
CPU_06 - 67.24 sec - Reduce Mesh
CPU_07 - 6.14 sec - Calculating Technical display
CPU_08 - 5.88 sec - Making Silhouettes

Gigabyte Technology Co., Ltd.
To be filled by O.E.M.

NVIDIA GeForce GTX 780 - 3072.0 MB
DriverVersion: 9.18.13.4475

Intel® Core™ i7-4930K CPU @ 3.40GHz
NumberOfCores: 6 NumberOfLogicalProcessors: 12
MaxClockSpeed: 3.7 GHz

Holo · January 10, 2015, 1:36pm

Bump

CPearase · January 11, 2015, 12:06am

Tried it with hyper-threading off ?

Holo · January 11, 2015, 11:53am

I tried both on and off on the 12 core (dual x5650) machine and the results are:

17.5 sec with HT on - (100% over 24 threads)
12.5 sec with HT off - (100% over 12 threads)

menno · January 11, 2015, 1:29pm

I suspect that you get a lot of inter-CPU communication when using the Xeons due to the fact that you are using two distinct CPUs for the Xeon and one distinct CPU for the i7’s. The bus between CPUs will be slower than the on-chip communication.

This seems to be supported by the fact that turning off hyper threading actually speeds up your benchmark. I wonder what happens if you turn off 1 Xeon and turn on hyper threading.

Holo · January 11, 2015, 8:32pm

This could be the case, I don’t know if I can turn one off in bios, but I can look next week.
BUT it doesn’t explain why that new six core i7 is even slower than my old four core i7, or do you see it differently?

menno · January 12, 2015, 6:55am

Hmmm, that is strange indeed…

Holo · February 27, 2015, 7:02pm

Hi guys, a new Holomark score just came in, with an 8 core xeon, and it too has reduce mesh issues.
Can you take a look? Holomark 2 Released!

wattzie · March 2, 2015, 8:02pm

I just saw this thread. All 8 cores (16 threads) of the xeon are jumping up to 100% on the reduce mesh section of that test. I will try to turn off hyperthreading later today and see what effect that has.

I am more than happy to test some other fixes if you have any.

wattzie · March 2, 2015, 8:49pm

Just ran it without hyperthreading. The major differences are:

GPU_16 (40 units with HT vs 43 w/o HT)
GPU_21 (8.3fps vs 11fps)
and the big one:
CPU_06 - Reduce mesh (14.96 vs 12.37sec)

Everything else was the same or slightly quicker with hyperthreading on.

pascal · March 2, 2015, 8:51pm

@deranen, any input on this?

thanks,

-Pascal

DavidEranen · March 3, 2015, 8:15am

I’m not able to repeat this with a hyper threaded CPU since we don’t have one at the office here in Finland. I have noticed, however, that the v5 ReduceMesh command is severely bottlenecked by a repeated UI-related function call. This has been fixed in v6, and testing shows that not only is the v6 command 3x faster out of the gate, but gets faster as the number of cores are increased (at least going from 1 to 4 cores).

@pascal, could you or someone at your office try ReduceMesh with a hyper-threaded CPU, and compare to the Commands.rhp plug-in “TestReduceMesh” in v6? What are the differences, and how does it scale?

Thanks,
-David

DavidEranen · March 3, 2015, 8:48am

Holo, do you have access to Serengeti? If so, please try the TestReduceMesh command and tell me what results you get.

@pascal, nevermind testing hyperthreading in general, but if you have access to Xeons then give them a spin with v5 and v6 ReduceMesh (TestReduceMesh in v6). We do have hyperthreaded CPUs here.

Holo · March 3, 2015, 9:42am

I just tried on V5 and WIP, with a single 200K mesh that I reduced to 20K:

WIP ReduceMesh: 23.2 sec
WIP TestReduce…: 18.4 sec
V5: ReduceMesh: 16.8 sec

So no improvements there.
All done on a dual core HT enabled older i5 laptop.
I have not tried on the dual cpu Xeon workstation (yet)

nathanletwory · March 3, 2015, 10:02am

@deranen, you might want to have users try tweaking some of https://msdn.microsoft.com/en-us/library/6sfk977f.aspx and see if that improves behaviour.

Perhaps craft some .bat files that set these, then run rhino

edit also see this SO thread.

/Nathan

DavidEranen · March 3, 2015, 10:41am

@jesterKing, thanks for the links. Definitely try fiddling with the environment variables and see if they help.

@Holo, I’m getting wildly different timings compared to you, but one thing we can agree on is that WIP ReduceMesh is the slowest. This is OK, since we’re going to use TestReduceMesh instead. The 4x improvement I mentioned seems to be between WIP ReduceMesh and WIP TestReduceMesh, at least on computers in our office.

-David

wattzie · March 3, 2015, 1:49pm

If you send me a link to the WIP I can give it a go here