C# Parallel For Loop, "Data is incomplete or lost"?

Hi, recently I been trying to speed up some mesh and matrix operations, by using a Parallel.For loop
When I do a simple Series the result seems to be complete, but when I use a function inside the Parallel For loop, it appears to be loosing some results, I’m new at using threading so I don’t know if the For loop needs something else to wait for completion(I’m pretty sure I’m missing something).

Here I leave the gh file with a c# script, one is with a normal for loop and the other is in parallel. Thanks

Parallel For Loop.gh (61.6 KB)

The lists are not threadsafe (different threads accessing the same index at the same time, so then you can experience data loss).

In your case I would take advantage of the fact that you do are not manipulating the number of items being dealt with (fixed length data), and the fact that the indata is also well known, namely a mesh. Therefore use arrays instead in the Parallel loops. This is becaue each thread would then work on guaranteed unique array indexes, and therefore you will not have any such data races as when you use dynamic lists (regular Lists< T > are not thread safe, an absolute no-no in parallel loops, except for reading data, but no writing to list, no-no)

The problem with lists can be illustrated like so:

List.Add(0);         // Say, thread 0 added this
List.Add(0, 1);      // thread 4 added this
List.Add(0, 1, 2);   // thread 2 added this
List.Add(0, 1, 3);   // Ops, thread 3 added at the same time messing up the list

With arrays the indexes intervals are spread out among the threads, and so any one thread will never write to the same location (index) as any other thread

arrary[0| | | ];  // Lets say that thread 0 added this
arrary[0| | |3];  // thread 3 added this
arrary[0|1| |3];  // thread 1 added this
arrary[0|1|2|3];  // thread 2 added this, and 'collisions' thus can't happen.

Notice how you cannot know in which order things happen, and they’re not always even in “order” (they can do things at the same time, and so collisions can occur in dynamic lists).

I know there are advanced concepts out there to deal with things like this, but fixed arrays “cost nothing” to create and fill in (and they are cheap to sum up afterwards in a sequential loop). Arrays allows for quick solutions without diving into any deeper threading philosophy, just use them and off you go.

Also, try to keep everything you write to in the loop as local variable, while variables which you read from doesn’t matter (reading doesn’t cause data races, so reading is no problem inside Parallel loops).

Use arrays. The size is given (var result_array = new double[mesh.Vertices.Count];) and since it will never change size during the loop, therefore nothing bad can happen in the parallel loop (colliding indexes for example). And so on.

// Rolf

1 Like

Yes, that fix the problem. Thanks!! RIL for the quick and nicely developed answer, it appears to me that threading is kind of a black box concept. While I was searching for a possible answer I found this link – https://docs.microsoft.com/en-us/dotnet/standard/parallel-programming/potential-pitfalls-in-data-and-task-parallelism–, no mention of this “problem”, and this one seems to be a fairly basic one.
I have upload the fix script for the people that may have similar mistakes. Thanks again
FIX - Parallel For Loop.gh (62.1 KB)

1 Like

Taste varies regarding codings style, but as for me I prefer using duck-typing (var) to make the code more readable.

Also recommended is to keep as much of the code as possible inside the Parallel.For loop. This is so it’s easier to see if any code is writing to something “dynamic” or to something which isn’t declared inside the loop (then the “black” part of the magic with Parallel.For usually disappears). :slight_smile:

So, skipping the redundant types to the left of variable names (where it can be inferred by the compiler), replacing the types with the var keyword, makes the code shorter and “cleaner”, like so:

    GH_Number[] TPI = new GH_Number[x.Vertices.Count];
    System.Threading.Tasks.Parallel.For(0, x.Vertices.Count, i =>
      {
      // Get all Neighbour's heights
      var sum = 0.0;
      var NeighbourIndicies = x.Vertices.GetConnectedVertices(i);  
      if (NeighbourIndicies.Length > 0)
      {
        for (var j = 0; j < NeighbourIndicies.Length; j++)
          sum += x.Vertices[j].Z; // the height
      }      
      // Each height sum into its own slot in the array
      TPI[i] = new GH_Number(sum);
      }
      );
    A = TPI;

But as said, coding style is often a matter of taste.

// Rolf

Small nitpicky correction there - var is not technically “duck-typing” or dynamic in the Python sense of the term - it’s still statically typed but uses compile-time information to infer the type of the right-hand side of the assignment, so you don’t have to go through the ritual of declaring stuff like:

SomeReallyLongTypeName foo = new SomeReallyLongTypeName(); 

C# has a dynamic type as well but it’s not commonly used (unless there’s a specific use-case where you don’t know the type of the variable at runtime)

Speaking of coding style, the above code could be written (in declarative manner) as

var TPI = x.Vertices
  .Select(
    (v,i) => x.Vertices.GetConnectedVertices(i).Sum(
        ni => x.Vertices[ni].Z
    )
  ).Select(
    n => new GH_Number(n)
  );
A = TPI;

Personally I find this more readable (although yes, it’s a matter of taste), plus it can be easily parallelized while preserving order, by adding AsParallel() and AsOrdered() :

var TPI = x.Vertices
  .AsParallel()
  .AsOrdered()
  .Select(
    // exactly the same as above!
  )

Yeah, that was nitpicking, but it was interesting nitpicking. :slight_smile: However, I actually pointed out “where it can be inferred by the compiler” (duck-typing = inferred in runtime, while var = inferred at compile time so yes, I stand corrected).

Interesting code style you demonstrate. Is this all Linq or std .NET? And what about speed?

// Rolf

It’s all standard Linq - takes a while to get used to, but the higher level of abstraction means you never have to worry about off-by-one errors that are so common in array indexing :slight_smile:

Speed looks to be the same on the data set above, however some odd discrepancy came about with the summed results. I haven’t looked into it closely but it might be some sort of floating-point error?

Ah okay, figured it out - mesh vertices are stored as single-accuracy floating point values ( Point3f), and doing

var sum = 0.0

infers sum to be of type double, automatically casting all the Z coordinates from single to double when adding them to the sum.

On the other hand using LINQ to sum over Point3f's keeps everything in single-accuracy land because nothing told it otherwise. After adding an explicit double cast, both approaches agree

var TPI = x.Vertices
    .AsParallel()
    .AsOrdered()
    .Select(
        (v, i) =>
        x.Vertices.GetConnectedVertices(i).Sum(
            ni => (double) x.Vertices[ni].Z // sum over neighbor indices, cast to double
        )
    ).Select(
        n => new GH_Number(n)
    );
A = TPI;

Parallel For Loop- Linq.gh (66.0 KB)

2 Likes