.NETCore plugin loader inconsistencies

Could someone explain me following inconsistencies with the same Rhino build when running with .NETCore ?

Rhino 8 SR5 2024-3-12 (Rhino 8, 8.5.24072.13001) on Windows 10 (10.0.19045) with .NET 7.0.0 (PL) running in EN without PL lang pack does not complain, loads plugin and works fine.

While two other machines:

  • Rhino 8 SR5 2024-3-12 (Rhino 8, 8.5.24072.13001) on Windows 11 (10.0.22631) with .NET 7.0.9 (CN) using EN lang pack
  • Rhino 8 SR5 2024-3-12 (Rhino 8, 8.5.24072.13001) on Windows 10 (10.0.19045) with .NET 7.0.0 (CN) using EN lang pack

Also this one with a higher build:

  • Rhino 8 SR6 2024-4-2 (Rhino 8, 8.6.24093.11001) on Windows 10 (10.0.19045) with .NET 7.0.0 (CZ) using CZ pack

Does not work, fails to load… BUT

The next one works fine being:

  • (Rhino 8, 8.7.24088.06001) on Windows 11 (10.0.22631) with .NET ? (PL) - don’t know the lang pack

In Addition just received another info that such works fine:

  • Rhino 8 SR5 2024-3-12 (Rhino 8, 8.5.24072.13001) on Windows 11 (10.0.22631) with .NET ? (CN) with CN lang pack

Btw. how can you decide if it runs on 7.0.0 or 7.0.9 ? I suspect this could an issue though it’s weird if flat 7 works but .9 not, isn’t it ? On the other hand 8.6 have 7.0.0 :face_with_monocle:

Rhino plugin compact checker literally explodes right at the beginning while checking “.NETCore compatibility” on load and fails so quickly that my logger isn’t even triggered.

I don’t know how this can be helpful for diagnosis but it looks like laptops are less prone to loader compat thing issue.


OK. First time ever I was able to recreate this issue on different local machine, BUT Rhino Compat thing does not offer anything. To be consistent:

  • Fresh install of - Rhino 8 SR5 2024-3-12 (Rhino 8, 8.5.24072.13001) on Windows 10 (10.0.19045) with .NET 7.0.0 (System PL) using EN lang pack only

Interesting is that it seems the most probable “fail” factor is if PC is a desktop or a laptop …

The only output provided by Rhino Compat tool is:

Rhino Nature, Version=1.0.4.16355, Culture=neutral, PublicKeyToken=null

*Note: No difference if I call Compat.exe on clean or obfuscated rhp, the above-listed client devices also have not found any output besides the mentioned one line. OFC clean also fails to load.

Nothing else, no error, no warning, no nothing… Sorry guys for pinging but no idea who I should ask about it @dale @pascal @scottd ?

Hey @D-W,

I’m sure you’ve likely checked and know about this, but do all of these computers have Rhino running in .NET Core? Rhino 8 can run using either .NET Framework or .NET Core so perhaps the ones configured to run in .NET Framework are the ones that are successful. Using the SetDotNetRuntime or SystemInfo commands will tell you which runtime is currently in use.

Only one version of the .NET 7.0 desktop runtime can be installed, so if 7.0.9 is installed Rhino will use that. We still ship with 7.0.0, but any version should work and there’s no known issues with doing so. We will also likely be updating the runtime we ship in a future service release.

One first step that could be tried would be to update the .NET Runtime to the latest (7.0.17) by downloading it here: Download .NET 7.0 Desktop Runtime (v7.0.17) - Windows x64 Installer

Rhino 8 runs compat on all assemblies of your plugin, not just the .rhp. So you have to run it on any .rhp, .gha, and .dll in your plugin folder to get the full report. Rhino should write out the report when compat fails in %temp%/RhinoCompat, so perhaps check the machines that failed for any information there.

I would be absolutely flabbergasted that that would be the cause, but perhaps it is a CPU (Intel/AMD) difference?

Do you have any exception details or other information that would help? When it fails to load, does it bring down Rhino or does it just say it’s not compatible?

Cheers,
Curtis.

Hey @curtisw ,

thanks for picking up the topic. Yes, this topic as well as the provided client devices info are strictly bound to Rhino set to .NETCore. So far haven’t received any reports about Rhino 8 with .NETFramework set, however, I asked for ‘_SystemInfo’ data and I verified that this is only .NETCore related.

I understand your assumption that .NETFramework’ers are not reporting issues, but that’s not the case (I mean this particular problem only occurs on some, not all .NETCore setups), one of CN clients after having trouble on the desktop, installed RN on his laptop and was surprised that it just worked like that on the mentioned R8 (8.5.24072.13001) Win 11 and NETCore, so exactly as my case on my laptop it runs perfectly on .NETCore with same Rhino but Win 10.

Also as mentioned I was able to reproduce this in-house on the Dual Xeon workstation, the laptop is a bit old i7, so Intel/AMD is rather not the thing.

It does but file contains only the one line which I provided in the previous post, nothing more. See later screen.

I will try that.

Understood, however, RN has no satellite dll’s, all including localization files are embedded resources. besides that which are References but those are RhCommon,System or Framework DLLs - anyway, if this would be the case it won’t run on any .NETCore Rhino setup (am I right?).

And so am I when I’m seeing this what is reported and what I’m able to reproduce… :open_mouth:

I don’t have any exceptions, if I would I would track it as far as possible. Rhino is steady, it just doesn’t want to load it complaining that it needs .NETFramwork - though I put quite a lot of effort into making it work with .NETFramework and .NETCore…

Just to be clear I’m adding screens, first is my laptop (i7) which works fine in R8.5,Win 10 and NET Core, with tip for user to consider .NETFramework if any issue is found,

Second Dual Xeon with same Rhino same Win and same .NET 7.0.0

Is there anything more I could investigate?

Do you mean all plugins (maybe except Rhino Nature) fail to load?

I don’t know that, I’m chasing this bug only for RN, as the userbase expects RN to be running with .NETCore also, and though it was fitted with the current build to match requirements and IS running on it (at least laptops), Rhino refuses to load it (on desktops) due to yet unknown reasons.

Maybe my explanations are not so clear so I’ll explain it like that (and in RN case):

  • Plugin loads with Rhino set to .NETFramwork and works as expected,
  • Plugin loads only on some PCs with Rhino set to .NETCore and works as expected if it loads, according to reports and attempts of replication, desktops fails to load, while laptops are running fine.

This sounds so weird but as for now considering CPUs and platforms it looks like this:

(DESKTOP) Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz - Fail
(DESKTOP) Intel Dual Xeon - Fail
(DESKTOP) AMD Ryzen 9 3900XT - Fail
(DESKTOP) AMD Ryzen ? (family 25 model 97 stepping 2) - Fail
(LAPTOP) Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz - Success
(LAPTOP) AMD R9-5900HX - Success

NOTE: Rhino built-in tool Compat.exe does provide only one line output on both, desktops and laptops, indicating plugin build, no success or failure info whatsoever.

For what it’s worth, if you are keeping statistics…

(DESKTOP) AMD Ryzen 7 2700X @ 4.15 GHz - Fail

-wim

1 Like

Clean fails to load also and leave no trace here.

I triple-checked checked non-obfuscated version against Compat.exe on my dual xeon rig, and the same thing one line output. However, even if you get this exception looking like StackOverflowException how its possible that it’s not happening on laptops?

Plus, can you elaborate on how I can also get such a result?

  1. Enable user dump
  2. Clean compat cache %appdata%\McNeel\Rhinoceros\8.0\compat_cache_netcore
  3. Run Rhino & wait for Compat.exe to crash
  4. Find the crashed dump file as specified in step 1.
  5. Use WinDbg to load it, try to find some traces, possibly by .clrstack

Or you may debug a live version of Rhino by WinDbg/VS, use .stoponexception or similar technique to capture CLR exceptions.

It’s possible the issue is not related to stack overflow at all. Just my two cents.

I managed to identify where Compat died.

System.Boolean SimpleLogger.SimpleLog::???(System.String,System.String)
? PASS System.Boolean System.String::op_Equality(System.String,System.String) < mscorlib
System.Boolean SimpleLogger.SimpleLog::???(System.String)

It looks like somehow calls to System.Diagnostics.EventLog breaks Compat on my computer.

1 Like

It is interesting finding, SimpleLogger is an external piece so it will be easy to detach all at once and rebuild without it and check if it indeed creates an issue, though … this still doesn’t explain why/how laptops are doing great in this whole thing :flying_saucer:

afaik there’s another class, seemingly a floating license class, that uses EventLog. Anyway you may do whatever you like, clean the compat cache, and check if it still crashes.

@gankeyu Thanks for your input but seems it’s not the right path, completely kicking out logger and any possible EventLog related thing doesn’t change a thing, cleared the cache, but still no out and failed to load, however, I’ll try to break it down with your CLR approach - it sounds reasonable so who knows maybe I’ll get lucky.

@curtisw @gankeyu The funny thing is that when I’ll run “Rhino.exe” through WinDbg, it runs the plugin without any issue (the original/release/obfuscated one) on the Dual Xeon, and doesn’t complain, nothing special in WinDbg or I’m not enough experienced here. But without WinDbg, failed to load.

EDIT: Wrong, after it loaded with WinDbg now its working fine, plus finally recieved report - a break through with laptops, first that failed to load, after @gankeyu input I realized one thing, probably the cache is the reason why it actually loads plugin without complaining, as it is old cache. So here question how long cache is valid.

Can you share with me the file without EventLog reference?

are you familiar with fuslogvw?

I have used this many times in the past to diagnose assembly load failures in strange and difficult-to-debug contexts (e.g. loading a .net control as activex in solidworks)

it is a bit fiddly to get set up, and I cannot guarantee how it will work in your specific scenario, but it may help

@gankeyu being honest I would prefer not to expose intermediary builds to the world. I highly appreciate your help as probably the key factor is the old compat cache, which you brought on the front line.

Long time no see @jdhill :slight_smile: Thanks for your input, well if you are out of options anything is an option :ring_buoy:

If I’ll be able to confirm that the compat cache tricked me during development then I’ll find the exact lines that brought the incompatibility(?) or just compat troubles.

Ok. This is getting weird. Now even when deleting the cache it doesn’t want to fail :joy: I mean dual xeon machine which failed to load from the begining now after running WinDbg on it doesn’t want to fail anymore @curtisw is compat cache in addition stored somewhere else? Also how much first plugin run differs from next ones?

@gankeyu @curtisw Seems that compat cache has stuck for some reason, reboot helped, now I’m back and it looks like Compat.exe never gave good results (besides that when Rhino.exe is opened through WinDbg) always failing on Dual Xeon (lets treat it as representing of NL [not loading devices group]) no matter which past released build I’ll use - this is especially interesting because as you remember @curtisw you was able to load it a while ago, and not less interesting that compat cache wasn’t wrong on my i7 dev laptop as it doesn’t fail, there is no dump to explore on i7 (SL [success loading devices group]). After manually deleting compat cache it is always rebuilt on Rhino init without any issues.

What I’m pointing out here is that it seems that it could be(?) some Compat.exe issue as indeed as @gankeyu pointed it fails due to StackOverload while resolving references (I’d like to highlight again that this is not the case on i7). An issue like missing some references not loaded yet on runtime[NL] but ready to go on [SL]? Some sources point out that this is a possible memory leak(?) Some others say it is pinvoke related(?) No idea what is behind it.

On clean rhp it gave me (short form):

EXCEPTION_RECORD:  (.exr -1)
ExceptionAddress: 00007ffa47a40e04 (ntdll!RtlVirtualUnwind+0x0000000000000024)
   ExceptionCode: c00000fd (Stack overflow)
  ExceptionFlags: 00000001
NumberParameters: 2
   Parameter[0]: 0000000000000001
   Parameter[1]: 0000002cb7205fe0
PROCESS_NAME:  Compat.exe
RECURRING_STACK: From frames 0x5 to 0x0
ERROR_CODE: (NTSTATUS) 0xc00000fd - A New Guard Page for the Stack Cannot be Created.
EXCEPTION_CODE_STR:  c00000fd
EXCEPTION_PARAMETER1:  0000000000000001
EXCEPTION_PARAMETER2:  0000002cb7205fe0
IP_ON_HEAP:  00007ff976b04505

Should I forward this dump file to some “report email”?

Yeah from the information here it looks like an issue with Compat itself, and likely a problem with Mono.Cecil. I have run into some race conditions before with it that cause it to hang, but none that had a stack overflow. From the stack trace it doesn’t look like it will be easy for us to fix so maybe we will need to use something different to scan assemblies for api compatibility.

I’ve created RH-81476 to see if we can circumvent the issue in mono.cecil or (hopefully not) use something else which would be a huge effort.

Thanks for all of the effort to dig into this issue everyone! Hopefully we can get this resolved soon.

Still, the weirdest part of it is that it depends on env and runs fine on some of them. As we are discussing it I thought, just as a suggestion/question to think about, is per user %appdata% roaming folder best choice for compat cache ?

@curtisw I have also one small question, does compat rebuild whole cache on some events for eg. like after SR update?