I am working in the Rh5 environment and need to support Multithreading of certain components for performance reasons… recently I pieced together some old code with some new code to make a Multithreaded Boundary Surfaces component that supports Datatrees, and unfortunately I have discovered that it arbitrarily crashes my application (and Rhino).
I was able to get some debug info out of Visual Studio even though it’s unmanaged code (in a C# component)… it seems to be pointing to a Memory Access Violation in System.Collections.ListDictionaryInternal – I’m using a ConcurrentDictionary so I thought that was supposed to be threadsafe. Anyway, my code:
private void RunScript(DataTree C, ref object A)
{
Print("Input: " + C.ToString());
// set up a dictionary for our resulting path structure
System.Collections.Concurrent.ConcurrentDictionary<GH_Path,Brep[]> collectResults = new System.Collections.Concurrent.ConcurrentDictionary<GH_Path,Brep[]>();
if (C.DataCount > 0) {
// this try catch block wraps the entire MT operation so that we can catch
// any exceptions on operations that we haven't specifically wrapped internally
try {
// set up threads for each branch
System.Threading.Tasks.Parallel.For(0, C.Branches.Count, (i) => {
// collect the path for this branch
GH_Path p = C.Path(i);
Print(p.ToString());
if (C.Branches[i].Count > 0) {
// Get the geometry for this branch
List<Curve> CrvsInBranch = C.Branches[i] as List<Curve>;
Brep[] result = Brep.CreatePlanarBreps(CrvsInBranch);
// Set the results for this path in our container Dictionary
if (!collectResults.ContainsKey(p))
{
collectResults[p] = result;
};
} else {
if (!collectResults.ContainsKey(p))
{
collectResults[p] = new Brep[0];
};
}
});
} catch (System.Exception err) {
Print(String.Format("Outer wrapper exception: debug required. \n {0}", err.ToString()));
}
}
// convert our dictionary to our datatree
A = DictToTree(collectResults);
}
//
// Method for converting our dictionary into a GH Datatree
private DataTree DictToTree(System.Collections.Concurrent.ConcurrentDictionary < GH_Path,Brep[] > dict){
DataTree dt = new DataTree();
foreach(KeyValuePair<GH_Path,Brep[]> item in dict){
dt.AddRange(item.Value, item.Key);
}
return dt;
}
Any thoughts? My biggest idea at this point is to wrap this into a plugin in VS, as I don’t trust Multithreading in a C# component at this point.
You’re accessing the C’s (the data tree) properties from within the For loop… This is the first thing I would blame here. The C.Branches() and the C.Path() are accessing either the list of branches or the dictionary of paths. None of those collections are thread-safe, including the data tree.
Try… Catch might not help with multiple threads when called from outside of the For loop.
Print() might also cause some issues when called from multiple threads.
The easiest way of multi-threading (the one to use most of the time when coding for Rhino/GH) is to put everything in an array and make sure no 2 threads access the same object at any point. Array as a fixed sized collection should work most of the time given this constraint. Make sure you assign different ranges of the array to different threads.
For instance:
4 threads running on 16 item long array should have the index ranges split such as 0-3, 4-7, 8-11, 12-15.
It’s a crude solution, but it works (given there are no 2 references to the same object in the array).
Interesting. Rather than deal with fixed size arrays, what if I preprocess the datatrees, dumping them into a “source” ConcurrentDictionary, which I can access the same way I’m doing with the results (by GH_Path)? Any red flags there?
Thanks,
Marc
P.S. so far, the try catch outside the for loops has been successful at catching exceptions from within threads, and the Print statements have been working as well – even allows me to see the order in which the threads were processed. Perhaps I’m just getting lucky there.
I’ve done tons of arrays processed by Parallel.For and I never had to split them manually since it is claimed that the Parallel.For does that behind the scenes. But if that’s actually not the case, then I still have tons of potential performance gains not being utilized…
sorry, wrong reading. I just like to keep the following statement:
you can use a non threadsafe collection if you are using a lock. This would make sense if each iteration takes ages to complete, and order doesn’t matter.
Well, I guess an array is as thread safe as the Parallel.For’s inbuilt automagic partitioner, since I have never had any dataraces. I tend to put results in equal length arrays (as the source data array / or list) and sum them up after the Parallel.For-loop, and so I don’t even have to lock when collecting data.
Arrays are threadsafe, List of T (so DataTree of T) are not. That is why I deleted the post above. I assumed you are talking about datatrees and not arrays. But you clearly wrote “array”.
For source data (reading) it doesn’t matter in a Parallel.For-loop what type of list (as long as the list doesn’t change size during the loop). It’s the collection type for the result (writing) which is the problem.
Result collections need to be thread safe, or alternatively, put the results into equal length arrays (same size as the source data) and then no problem no more. That last approach (equal length result arrays) has served me well in most cases.
I didn’t know that reading from a non-ts collection in parallel is okay.
But that’s what I do as well. As I said I commented because I was misreading. I just kept saying, whenever you like to use a non threadsafe collection -> f.e. datatree you only need to lock when storing results in it.
In most cases this doesn’t make any sense, only if each iteration takes a while to compute, so that overhead from locking can be less as when copying the data from an array into a data tree later on.
I’ve safely embedded many of your excellent advice in Parallel.For loops. Not sure though how efficient the partitioning is compared to handcrafting it it myself. Seems that the load-balancing of the partitions is the most important thing to get right. It’d be intersting to know whether someo0ne managed to get iot better by handcrafting rather than realying on .Net to automagically find the best scheme.
Not necessarily. Arrays around value-types whose size exceeds the atomic assignment (usually 64-bits on 64-bit systems) may get into a race condition when an assignment is made to the same index from two different threads. The end value in the array may be a mashup of both original values.
Of course if you’re not assigning to the same index, or if you’re only reading, then yes, arrays are threadsafe.
Most collections will be threadsafe if you’re only reading from them.
You’re right, I was thinking about manually setting up each thread (the raw way)… as you’re suggesting, For.Parallel manages the number of threads automatically.
That should work. I think. It’s only the modification of the sortedlist/dictionary that isn’t threadsafe, but once it exists and no longer changes, look up should work just fine. The only way that would break is if the lookup method either use class-level or static fields (which seems unlikely in the extreme) or actually builds caches during individual lookups (which is possible, but one expects the .NET programmers to have made that safe).
Well, I’m the last to be sure of anything under the hood. But I’ve abused available lists so much the last couple of months using Parallel.For, and I haven’t experienced any problems or surprises of any kind. Everything just works.
I added a ConcurrentDictionary for my source data, and it didn’t help. I’m still experiencing crashes, which are repeatable with certain data sets. Code here:
private void RunScript(DataTree<Curve> C, ref object A)
{
Print("Input: " + C.ToString());
// accessing the data tree from inside a parallel thread is not threadsafe, so we set up a dictionary to hold our GH data
System.Collections.Concurrent.ConcurrentDictionary<GH_Path,Curve[]> C_source = new System.Collections.Concurrent.ConcurrentDictionary<GH_Path,Curve[]>();
// set up a dictionary for our resulting path structure
System.Collections.Concurrent.ConcurrentDictionary<GH_Path,Brep[]> collectResults = new System.Collections.Concurrent.ConcurrentDictionary<GH_Path,Brep[]>();
if (C.DataCount > 0) {
// this try catch block wraps the entire MT operation so that we can catch
// any exceptions on operations that we haven't specifically wrapped internally
try {
// populate our source dictionary (before engaging the parallel threads)
for (int i = 0; i < C.Branches.Count; i++) {
GH_Path p = C.Path(i);
C_source[p] = C.Branches[i].ToArray();
}
List<GH_Path> keys = C_source.Keys.ToList();
int branchCount = keys.Count;
// set up threads for each branch
System.Threading.Tasks.Parallel.For(0, branchCount, (i) => {
// collect the path for this branch
GH_Path p = keys[i];
Print(p.ToString());
if (C_source[p].Length > 0) {
// Get the geometry for this branch
List<Curve> CrvsInBranch = C_source[p].ToList();
Brep[] result = Brep.CreatePlanarBreps(CrvsInBranch);
// Set the results for this path in our container Dictionary
if (!collectResults.ContainsKey(p))
{
collectResults[p] = result;
};
} else {
if (!collectResults.ContainsKey(p))
{
collectResults[p] = new Brep[0];
};
}
});
} catch (System.Exception err) {
Print(String.Format("Outer wrapper exception: debug required. \n {0}", err.ToString()));
}
}
// convert our dictionary to our datatree
A = DictToTree(collectResults);
}
// <Custom additional code>
// Method for converting our dictionary into a GH Datatree
private DataTree<Brep> DictToTree(System.Collections.Concurrent.ConcurrentDictionary < GH_Path,Brep[] > dict){
DataTree<Brep> dt = new DataTree<Brep>();
foreach(KeyValuePair<GH_Path,Brep[]> item in dict){
dt.AddRange(item.Value, item.Key);
}
return dt;
}
File attached in case anyone is interested in testing.