One failure in a batched call kills entire call

fergus.hudson · April 9, 2021, 3:18am

Something I’m running into that is a bit frustrating for my use case is that when batching calls a single failure kills the entire batch.

So when I’m making a series of batched Compute calls an unexpected result in an early batch call might mean a ‘None’ in the input array to the next batch call, witch kills the entire process, instead of ‘None’ values just propagating through the batch calls to the next call.

Is this by design? Is this necessary? Are there any plans to change this?
Most importantly, does anyone have any suggestions to deal with/work around it in Compute’s current state?

Batch calls are critical for speed with Compute, but batch calls that fail due to a single ‘None’ among the inputs are very, very painful to deal with in a large system.

Here’s some example code to demonstrate what I’m talking about:

import rhino3dm
import compute_rhino3d.Brep as Brep
import compute_rhino3d.AreaMassProperties as AMP

c1 = rhino3dm.Point3d(0,0,0)
c2 = rhino3dm.Point3d(1000,0,0)
c3 = rhino3dm.Point3d(1000,1000,0)
c4 = rhino3dm.Point3d(0,1000,0)
surf = Brep.CreateFromCornerPoints1( c1, c2, c3, c4, 1 )

surf_arr   = [surf]*30
boolParams = [True]*30

result = AMP.Compute6( surf_arr, boolParams, boolParams, boolParams, boolParams, multiple=True ) 
print( len(result), type(result) )

#no None in input, so no problem
#output:   30 <class 'list'>

surf_arr   = [surf]*29 + [None]

result = AMP.Compute6( surf_arr, boolParams, boolParams, boolParams, boolParams, multiple=True ) 
print( len(result), type(result) )

#one None in the input array, so no results, entire batch call fails
#output:   3 <class 'dict'>

print(result)

#output: {'statusCode': 500, 'message': 'Something went horribly, horribly wrong while servicing your request.', 'details': 'Nancy.RequestExecutionException: Oh noes! ---> //snip

fergus.hudson · April 13, 2021, 8:22am

Can anyone offer any insight into this?

fergus.hudson · April 15, 2021, 6:04am

@dale @will @fraguada I don’t know who’s best to ping to learn about this, but are you able to offer any info? Is this behaving as intended, or is it something on the todo list?

will · April 15, 2021, 11:44am

I think the main problem here is that Compute’s errors aren’t very informative. It throws 500 in many places where a specific error would be more useful, such as this case where Compute fails to deserialise {} (a.k.a. None) to a Rhino.Geometry.Brep.

I don’t think returning partial results has been considered. @steve have you thought about how this might work?

As far as workarounds, you could of course check the input before sending it (less ideal) or encapsulate the whole workflow in a custom endpoint via a Compute plug-in.

fergus.hudson · April 19, 2021, 2:01am

Thanks for the info @will .

We are doing large number of calls in psuedo-parrallel, so we pass a fixed number of elements to a batched Compute call, the output of which is the input for the next call, and so on in a long chain of batched Compute calls all with the same number of elements (each element is a specific part or location in the overall model).

Some elements in some calls will fail, or return results that are unacceptable. Ideally this would not stop the computation for all elements. We can’t remove those problem elements from the array without causing major inconvenience so we’d like to replace them with None and have None elements propagate through all subsequent calls, but batched Compute calls don’t allow this.