SetDataList has very slow performance

I have found that calling SetDataList vs SetData has a big performance penalty and I am curious as to why, or how to better set big lists of data.

I’ve just used a stop watch and diagnostics to grab an indication on the performance (note that i have used a release build and checked times in grasshopper with the total time similar to those reported below).

[object] Elapsed time to set data: 0ms
[object] Elapsed time to set data: 0ms
[object] Elapsed time to set data: 0ms
[object] Elapsed time to set data: 0ms
[object] Elapsed time to set data: 0ms
[List] Elapsed time to set data list: 398ms [Size=8349]
[List] Elapsed time to set data list: 2792ms [Size=47802]
[List] Elapsed time to set data list: 0ms [Size=0]

I feel like 3s to set a data list of 48000 is wayyyyy too long. The component which includes the logic which produces this list contains various loops/if/switch logic and runs in 50ms.

I don’t see anything complex about the code:
DA.SetDataList(name, enumerable);

@DavidRutten sorry David tagging you again. I feel like this may be best answered by a McNeel dev who is familiar with the inner workings. I’m unsure if there is a better forum for this…

@brian Hey Brian, sorry for tagging more of the McNeel team. I’m keen to get understanding of how to avoid this performance hit. It occurs on any enumerable contain object types (I have not test structs). Is there insight you can provide?

Be aware that if you are working with IEnumerable, there is a common culprit to be aware of:
Given a LINQ statement (e.g. “Select”), the LINQ expression is only executed when you actually need the enumerable. So, that being said, without converting it into a list beforehand, it could be that your LINQ expression is executed for the first time inside the SetDataList. In other words, you might also measure your LINQ expressions performance because of being accidentally injected and executed within the DA. Can you post more code?

Thanks TomTom, yep i ensure any lazy loading is performed before SetDataList by calling ToList/ToArray prior. This too was the first thing i ruled out. I’ll post some code tomorrow.

Here is some code

public class TestSetDataList : GH_Component
{
    /// <summary>
    /// Initializes a new instance of the TestSetDataList class.
    /// </summary>
    public TestSetDataList()
      : base("TestSetDataList", "SetDataList",
          "Check Performance of Set Data List",
          "Test", "Test")
    {
    }

    /// <summary>
    /// Registers all the input parameters for this component.
    /// </summary>
    protected override void RegisterInputParams(GH_Component.GH_InputParamManager pManager)
    {
        pManager.AddIntegerParameter("Count", "Count", "Number of items to add to List<object>", GH_ParamAccess.item);
    }

    /// <summary>
    /// Registers all the output parameters for this component.
    /// </summary>
    protected override void RegisterOutputParams(GH_Component.GH_OutputParamManager pManager)
    {
        pManager.AddGenericParameter("Item", "Item", "Single Item", GH_ParamAccess.item);
        pManager.AddGenericParameter("Items", "Items", "Multiple Items", GH_ParamAccess.list);
        pManager.AddNumberParameter("Set Data Time", "Set Data Time", "Set Data Time", GH_ParamAccess.item);
        pManager.AddNumberParameter("Set Data List Time", "Set Data List Time", "Set Data List Time", GH_ParamAccess.list);
    }

    /// <summary>
    /// This is the method that actually does the work.
    /// </summary>
    /// <param name="DA">The DA object is used to retrieve from inputs and store in outputs.</param>
    protected override void SolveInstance(IGH_DataAccess DA)
    {
        int count = default;

        DA.GetData(0, ref count);

        var genericObject = new object();
        var genericList = new List<object>();
        for (int i = 0; i < count; i++)
        {
            genericList.Add(new object());
        }

        var stopWatch = new Stopwatch();

        stopWatch.Start();
        DA.SetData(0, genericObject);
        stopWatch.Stop();

        var timeToSetData = stopWatch.ElapsedMilliseconds;

        stopWatch.Restart();
        DA.SetDataList(1, genericList);
        stopWatch.Stop();

        var timeToSetDataList = stopWatch.ElapsedMilliseconds;

        DA.SetData(2, timeToSetData);
        DA.SetData(3, timeToSetDataList);
    }

    /// <summary>
    /// Provides an Icon for the component.
    /// </summary>
    protected override System.Drawing.Bitmap Icon
    {
        get
        {
            //You can add image files to your project resources and access them like this:
            // return Resources.IconForThisComponent;
            return null;
        }
    }

    /// <summary>
    /// Gets the unique ID for this component. Do not change this ID after release.
    /// </summary>
    public override Guid ComponentGuid
    {
        get { return new Guid("1CBF48EE-CCDD-47DF-8DCA-088DA75DBB6A"); }
    }
}

Here is the performance on my machine (the numbers reported are milliseconds)
image

This was run in Debug, I’ll compile a release now and re test.

Similar results

SetDataList calls IEnumerable.ToList and AddRange on GH_Structure so it’s an O(2n) action (with much memory pressure) while SetData is O(1).

It’s not Set.

Where did you find the source code?

The IEnumerable.ToList call should be an O(1) in my case as the concrete type is already a list, but understand worse case is O(2n). I also realise there maybe be a need to iterate through the enumerable as part of the SetDataList call, but the iteration itself cannot take 10s for 100,000 items. The time to create the initial list of objects is less than 10ms.

IIRC It’s O(n) because the list is copied.

1 Like

Ah right, yer that makes sense. But these operations still can’t account for the time penalty. I’ll try using a a simple item to item component passing in 100,000 objects (i thought this didn’t incur the same time penalty)

EDIT: (I’m not sure this is a completely valid comparison) but setting the input to the simple item to item component with 100,000 objects had no noticeable lag (i.e. 55ms appears to be total execution time). But this illustrates the performance of SetDataList

This code is buggy… If you call

stopWatch.Start();
DA.SetData(0, genericObject);
stopWatch.Stop();

You measure how long it takes to add one object to the output, while
doing the following you add n objects. It doesn’t even matter if you call Runscript n-times,
because you only measure the last Runscript execution:

stopWatch.Restart();
DA.SetDataList(1, genericList);
stopWatch.Stop();

So you are comparing apples with oranges.

1 Like

Maybe to clarify. If you output something over the DA you are not going to pass the reference to the list, instead your data is converted into a tree structure. Copying, which is secure, but inefficient. I think there is a way to directly create a GH_Structure and return that, then you can get rid of the duplicated copying. But I don’t remember how this works. Maybe someone else can help here…

SetDataTree just sets the output tree. But the component needs to handle structure of input itself.

What I compared here was not from the code above it was the total execution time of SetData being called 100,000 times from a component receiving a list of n items and calling SetData on n items. The reason i compared this to SetData being called 100,000 times in the same component was that i didn’t want to compare apples to apples as this is not a real use case. It took 55ms for a component which receives a single item to call SetData on that single items with 100,000 items compared to 5.5s for the SetDataList to be called with 100,000 objects so both were timed for n=100,000.

In terms of using GH_Structure I have tried this and I am surprised at how much more performant this is. We are now within the realm that i expected the operation to perform. Time to create a DataTree with 100,000 was 58ms in this example and 17ms to set the DataTree, this is an enormous difference from the SetDataList which took 13.3s. I have also added the iteration of SetData in this example @TomTom, i do see this is a real use case but is here for the apples to apples comparison, and it took the same time as SetDataList.

(I’ve had to reduce the number to 50,000 for some reason with the screen recording active 100,000 items could take 20s+)
SetDataList

What surprises me is the right most component processes 50,000 inputs in 20ms. It seems like the solution is to avoid SetDataList and handle the creation of the GH_DataStructure yourself. The GH_DataStructure does require the generic to implement IGH_Goo, without looking into the details i figured using GH_ObjectWrapper would be a quick approach.

What’s unclear to me is what to use when working with a list of list of objects, am i really expected to loop through the nested list wrapping all my objects in a GH_ObjectWrapper, or can i wrap the high level list and wrap the nested list in another type which can be understood by grasshopper.

A screen shot too:

Also code too:

protected override void SolveInstance(IGH_DataAccess DA)
{
    int count = default;

    DA.GetData(0, ref count);

    var stopWatchCreateData = new Stopwatch();
    var stopWatchSetData = new Stopwatch();
    var stopWatchSetDataList = new Stopwatch();
    var stopWatchCreateStructure = new Stopwatch();
    var stopWatchNewWrapperObj = new Stopwatch();
    var stopWatchSetDataTree = new Stopwatch();

    var genericObject = new object();

    stopWatchCreateData.Start();
    var genericList = new List<object>();
    for (int i = 0; i < count; i++)
    {
        genericList.Add(new object());
    }
    stopWatchCreateData.Stop();

    stopWatchSetData.Restart();
    for (int i = 0; i < count; i++)
    {
        DA.SetData(0, genericObject);
    }
    stopWatchSetData.Stop();

    stopWatchSetDataList.Start();
    DA.SetDataList(1, genericList);
    stopWatchSetDataList.Stop();

    var objwrapperStopWatch = new Stopwatch();

    stopWatchCreateStructure.Start();
    var structure = new GH_Structure<GH_ObjectWrapper>();
    foreach (var item in genericList)
    {
        stopWatchNewWrapperObj.Start();
        var objwrap = new GH_ObjectWrapper(genericList);
        stopWatchNewWrapperObj.Stop();
        structure.Append(objwrap);
    }
    stopWatchCreateStructure.Stop();

    stopWatchSetDataTree.Start();
    DA.SetDataTree(2, structure);
    stopWatchSetDataTree.Stop();

    DA.SetData(3, stopWatchCreateData.ElapsedMilliseconds);
    DA.SetData(4, stopWatchCreateStructure.ElapsedMilliseconds);
    DA.SetData(5, objwrapperStopWatch.ElapsedMilliseconds);
    DA.SetData(6, stopWatchSetData.ElapsedMilliseconds);
    DA.SetData(7, stopWatchSetDataList.ElapsedMilliseconds);
    DA.SetData(8, stopWatchSetDataTree.ElapsedMilliseconds);
}

@Mike26
It seems like, when you set a double data to output, it takes time converting to <GH_Number>. Maybe, you could try create a List<GH_Number>() at first.
For me, I used SetDataTree(int paramIndex, IGH_Structure tree) set 60w data within 776ms contains time of processing data from database.
image