C++ Parallel std::for_each


I am using ON_Rtree with a slower geometrical operation.
For this reason I would like use parallel loop. I am using parallel std:for_each and I see the performance increase. But the problem is that I need to get output from this loop. If I push_back found integer pairs to pairsList vector the application crashes. How to append values to a list safely?

    std::vector<int> pairsList;

    std::vector<int> a(AABB.size());
    std::generate(a.begin(), a.end(), [n = 0]() mutable { return n++; });

    std::for_each(std::execution::par_unseq, std::begin(a), std::end(a), [&](int i) {

        ON_SimpleArray<int> neighbours;
        tree.Search(AABB[i].Min(), AABB[i].Max(), neighbours);

        for (int j = 0; j < neighbours.Count(); j++) {
            if (neighbours[j] > i) {//Ignore previous pairs
                if (getCollisionOpenNurbs(OBB[i], OBB[neighbours[j]])) {   //Check OBB collision +4-6 ms  
                    if (FaceFace(P[i], P[neighbours[j]], Pl[i], Pl[neighbours[j]])) {
   -> Issue here                     pairsList.push_back(i); //Crash
    -> Issue here                    pairsList.push_back(neighbours[j]);//Crash


The vector container is not thread safe so it cannot be updated in parallel. You need to use a separate vector for each thread and then combine them after the parallel loop. Another way if you know how many elements are to be added by each thread, is to use one simple array (not a container type) and initialize all elements to zero and then pass an offset to each thread so that elements are added to their correct slots in the list. This approach has the advantage of not requiring a combining step after the parallel step is complete but can only be used when you know the number of elements generated by each thread ahead of time. I was able to get this approach to work for many of parallel sections of code.


I assume for RTree I cannot know how many elements will be added. But I know that I have to iterate n times.

Do you have any sample code for similar situation like this?

Also do you know if C++17 is faster thsn C++89, is there any requirements which version should I use with Rhino?

You can use the first method I mentioned. Just create an array of vectors and use the parallel i index as an index into the array to select the vector for storage of the i’th thread results. Then after the parallel loop, combine the vectors.

Later I may be able to send some code after my computer is back on line. We just moved to a new apartment and it is taking a bit of time to find places for all our stuff which is in 75 boxes.

I generate my C++ code using Visual Studio 2019 for Rhino 7 (and VS 2017 for Rhino 6) to create a C++ DLL using the flow I posted a few months back. But I do not know which version on C++ it is using or if it can be changed. To find my post search for:

Step-by-step details for adding C++ DLL to a Python script with working example