Set rgba channel in RenderWindow

Hi,

As I’m waiting for RH-64925 to be resolved I’ve started testing the rhinocommon implementation of a RenderWindow viewer instead of the C++ SDK.

Now I’m trying to find the best way to set the frame buffer from an rgba float4 array in C++.

Looking at the API, the RenderWindow.Channel.SetValues and dotNET sample, this feels like a reasonable implementation:

Scene.RenderFrame();  // render image to temporary buffer in C++
var size = rw.Size();
IntPtr ptr = Marshal.AllocHGlobal(sizeof(float) * size.Width * size.Height * 4);
Scene.SetFrameBuffer(frameBuffer); // copy framebuffer to allocated ptr

using (var channel = rw.OpenChannel(RenderWindow.StandardChannels.RGBA))
{
   channel.SetValues(System.Drawing.Rectangle.FromLTRB(0, 0, size.Width, size.Height), rw.Size(), new PixelBuffer(ptr));
}

But it renders black.

I’ve verified that my values are in the IntPtr through:

float[] frameBuffer = new float[rw.Size().Width * rw.Size().Height * 4];
Marshal.Copy(ptr, frameBuffer, 0, frameBuffer.Length);

I’ve also looped through the buffer like so:

for (var x = 0; x < size.Width; x++)
{
     for (var y = 0; y < size.Height; y++)
     {
          int loc = (y * size.Width + x) * 4;
          channel.SetValue(x, y, Color4f.FromArgb(frameBuffer[loc + 3], frameBuffer[loc], frameBuffer[loc + 1], frameBuffer[ loc + 2]));
     }
     if (_shutdown) break;
}

which does show the image but it’s obviously very slow.

Any recommendation on the ideal workflow for this?

The best way is to pass around a pointer to your native framebuffer, don’t Marshal.Copy it.

For example lets look at how I do it for RhinoCycles - the renderer Cycles is native code, RhinoCycles is a RhinoCommon plug-in:

The function we look at is BlitPlixelsToRenderWindowChannel() in RhinoCycles.

Here we first:

  • get pointer to a pixel buffer (rgba, but also for normals and depth in separate passes) with GetPixelBuffer(). The pointer to the buffer is stored in pixel_buffer.
  • when a valid pixel_buffer is acquired create a Rhino.Render.PixelBuffer pb holder for it
  • on the RenderWindow open the Channel we need
  • use the channel to SetValuesRect(rect, size, pb)

You only need to make sure the memory that is pointed to by pixel_buffer is valid until right after SetValuesRect, after that it could be collected if your implementation likes to do that.

edit: to put it in other words

The pointer is expected to be an unmanaged pointer.

1 Like

Thanks, got it working now!

Think I’ve tried this before but probably didn’t get my mashaling right for the pointer.

I assume then that the data in the unmanaged buffer is copied to an internal buffer as I call SetValues? and that I need to call SetValues every time there is new data in the unmanaged buffer?

That is correct. This is how RhinoCycles works: render, copy to an unmanaged buffer (owned by Cycles), give the pointer to said buffer to managed (RhinoCycles) so it can be passed to the RenderWindow, which in turn gives it back to unmanaged native Rhino code. SetValues indeed behind the scenes copies the data contained by your unmanaged buffer to its own for further processing.

The Rhino post-effects pipeline will do its thing (denoising, bloom, tone mapping, etc) on this internal buffer before showing it in the viewport or the render window.

Hi Nathan,

Waking this up again with a little question.

My tests for a scene show:

  • Shaded view in Rhino, 16ms per frame. (60fps)
  • Using my own Display mode, 90ms per frame. (11fps)

The part of this process which is taken up by my renderer is about 1-2ms. So for Rhino to copy my buffer to the pipeline, applying other pipeline stuff like wires, gumballs, post effects whatever and displaying the view takes about 85-90ms. I assume most of this stuff also happens in the shaded view mode except the buffer copying and maybe post effects.

Is there any way to speed up the rhino pipeline part of this process? Like disable post effects? (I don’t have a lot of hope here since Cycles is also fairly slow.) Or do I have to implement something like a custom unmanaged pipeline? Or maybe a custom rendermode which doesn’t use the “Raytraced mode” implementation?

Hi @oborgstrom

You can double-check in the viewport properties settings under post effects if you have any enabled.

I assume that you are currently impelementing a RealtimeDisplayMode. How are you timing the process of getting a pixel from your renderer to the viewport display? You should be aware that with the RealtimeDisplayMode you don’t directly update the viewport. The mechanism of putting pixels in the RenderWindow buffer is separate from the mechanism that puts those pixels on screen. You use SignalUpdate() to tell the RDK that new data is available. The RDK maintains a timer by which it periodically checks the update flag that gets set by SignalUpdate(), and only refreshes the viewport once it is set. This also means that your renderer could be updating the RenderWindow buffer faster than what the RDK uses to update the viewport. As implementation detail, the current timer period is 200ms, and is currently not available for customization.

Out of curiosity what are youi looking for to gain with higher update frequency?

Adding @andy to the discussion in case he wants to be involved in performance discussion related to the RDK.

Hi @nathanletwory

No post process happening, see below some more thorough testing from two separate machines with a res of 933x1976 :

Laptop with GTX 1050 Ti:

Flush change que (set camera data):

  • 0.2 ms

Render frame on GPU:

  • 100ms for raytracing
  • 108ms total incl copy framebuffer from GPU.

Set new values in RGBAChannel and signal redraw:

  • 32.6 ms

Time until next frame starts:

  • 60 ms

Workstation with RTX A6000:

Flush change que (set camera data):

  • 0.2 ms

Render frame on GPU:

  • 2ms for raytracing
  • 6ms total incl copy framebuffer from GPU.

Set new values in RGBAChannel and signal redraw view:

  • 100ms

Time until next frame starts:

  • 60ms

I’m testing the performance by running _testmaxspeed in Rhino and my improvements refer to the workstation testing.

I’m at the moment rendering the first frame on my main thread, as I otherwise get a delay when changing the view. The timer reports:

  • the time it takes to apply changes to the renderer
  • the time it takes to render a new frame and download the data from the GPU
  • the time it takes to copy that data to the render window RGBA channel and signaling update with:
using (var rgba = rw.OpenChannel(RenderWindow.StandardChannels.RGBA))
{
     rgba?.SetValues(rect, rw.Size(), pb);
}
SignalRedraw?.Invoke(this, EventArgs.Empty);
  • and then last it reports the time when the next change que is triggered (camera is moved). I assume this time would include everything that happens in the pipeline plus other Rhino stuff.

The main reason for this is to improve the interaction with the model. Would prefer higher than 10fps if possible, especially if the GPU performance is there.

Then the second reason which would apply to my parallel renderthread would be faster convergence to a less noisy image. But if I can’t speed up the SetValues call then I can maybe have something that keeps on iterating as that is happening since I currently would be able to render about 50 (2ms) iterations while the copy happens. So two parallel threads, one which copies data into Rhino buffers and triggers redraw and one that just keeps on rendering. Obviously the timing situation is very different when I’m not on an A6000 card.

By the 200ms timer you mean if there is no event triggering the change que then the frame would only update every 200ms assuming SignalUpdate has been called?

The timer is one that checks the flag set by SignalUpdate. If the flag is set to true only then does it tell the display pipeline to refresh the viewport.

Hi again @nathanletwory ,

I’ve now run into the DrawOpenGl possibility and I’m trying to test this out for drawing the buffer to see what the improvements would be. I’m looking closely at the implementation in Cycles but I’m having problems with Rhino not loading my native dll when I include glew and call functions from it.

Would you know if there’s anything specific I need to do for this to work? Maybe something with glew-mx and the fact that there is already a context? Fumbling in the dark here a bit.

Thanks
Oscar

RhinoCycles actually no longer uses this to draw its results to the viewport. In Rhino 6 it did. I’ll check this tomorrow and write up something, now putting kids to bed.

I see, any particular reason for changing? Now using the C# SetValues() for every case?

Good luck!

The reason was that using OpenGL in combination with rendering on the GPU caused a lot of crashes. Also with rendering on CPU to some extent but not so much. Lots of driver-related crashes and other instability. Now that the responsibility lies with Rhino RDK Raytraced has been much more stable.

One feature that I liked a lot with the old drawing method was the fast drawing I did, where I’d alpha blend in results from Raytrayced over the OpenGL rendered mode, which gave a very smooth response. That has unfortunately been lost.

I see, well our renderer is at the moment only implemented on an up to date version of optix so it already requires a fairly new nvidia driver and nvidia hardware.

Any thoughts on why Rhino won’t load my dll? Maybe @stevebaer?

It builds nicely but as soon as I include glew and call gl functions my native methods throw DllNotFoundException. I have zero opengl experience and I’m only trying to use it to draw the buffer in the Rhino viewport. Might be something obvious that I’m missing.

Maybe you can use Depends to figure out if all dependencies can be found, and if not which of the DLLs is the offending one.

Other than that make sure you’re building for x64.

1 Like

Thanks @nathanletwory, got me on the right path, forgot to copy over glew32.dll… Now it works, fast!

But it seems a bit confused with the view frustum for the curves. Do I have to set this manually now? Or is it because I’m not rendering any depth data for opengl?

About that I am not sure. Let us ask @andy how this should be done to get the wireframe channels drawn correctly.

@andy would you have any suggestions on this?

@nathanletwory I’m having the same issue Oscar was above. I’m on Rhino 8.10. I’m setting

using (var channel = RenderWindow.OpenChannel(RenderWindow.StandardChannels.RGBA))
{
    unsafe
    {
        fixed (float* p = backingBuffer) // # Pin the float[] so it doesn't move
        {
            var buffer = new PixelBuffer((IntPtr)p);
            channel.SetValues(rectangle, size, buffer);
        }
    }
}

I’m doing this inside of an implementation of RealtimeDisplayMode, following the same code structure from the MockingBirdRenderPlugin example – though I’m only rendering a single pass.

The performance appears to be good when the viewport is small (800x600), but gets much worse when the viewport is larger (1700x1100), to the point that SetValues is taking the vast majority of the execution time (it takes over 50ms, while my rendering code is only taking 10ms). I’m skeptical SignalRedraw() can be the issue, since the performance is fine on a small viewport, and my profiling of the function with the single line calling SetValues() removed yields the expected performance. I’d like to improve interaction with the model to better than 15fps, especially since the actual rendering is not the bottleneck – I’m trying to implement a custom view mode that does some non-photorealistic rendering that can be pretty quick to calculate, with the intent that this would be one of the primary views used when interacting with the model.

Is there any chance of channel.SetValues being made faster, or another function exposed that can set the resulting values more directly? I’m not using any channels other than RGBA, and would be happy to skip any post-processing steps entirely. It would be nice to still see overlays like the gumball – but the performance issues I’m seeing are happening even if there is no gumball or Rhino wireframes visible at all.

My questions are:

  1. Should I be implementing this using something other than RealtimeDisplayMode?
  2. I see Oscar found a workaround using DrawOpenGl. I’d like to avoid this if possible – we’re not using OpenGL to do our rendering, so it’s a lot to pull in just to get pixels on the screen faster, and I’m very wary of the instability you mentioned with using it farther up in this thread. If there is absolutely no way around it: what’s the recommended way to implement this function from a C# plugin? Should we be using OpenTK? Is there any special setup that needs to be done, or perhaps an example you know of?

This function is the fastest available for the buffer to set values.

From the managed world (C#, .NET) point of view there is not much to be done here. For RhinoCycles I also am using this approach, although I’m not pinning buffers, since that is in unmanaged world:

I haven’t done any profiling of this before, so lets take a look.

On my M2 Max at 2598x1678 it takes between 20 and 30ms for me, but not 50ms.

But on my Windows machine with Ryzen 9 7900X (12-core) it takes 5-6 times longer.

I hope there is room for improvement there. I logged a bug report to track that: RH-83576 SetValues on framebuffer slow.

No, that is fine.

I don’t know, RhinoCycles hasn’t used OpenGL drawing in a long time, and I’m not sure in what state that particular mechanism is. But back in the day when it did there was a bit of C/C++ code that I PInvoked to do that.

Lets see if we can have @johnc or @andy make performance improvements in SetValues.

Thanks for logging the bug and taking the time to look! Appreciate the time and effort - your documentation was essential in getting me this far. I’ll keep whacking at it from my end (I’m on Windows with a Ryzen 7 2700) and see where I get.

I tried another “approach” of invoking the renderer synchronously from a conduit, putting my pixels in a bitmap, and drawing that using DrawBitmap on the entire viewport during PostDrawObjects. Obviously not ideal, but it’s done in ~20ms.