Hello,

I’m running DQN on a `Tuple([Box(shape=(1,)), Discrete(n=2)])`

action space. If I bucketise the continuous action as `Discrete(n=11)`

(say) and I make the action space the product of both discrete spaces, it becomes `Discrete(n=22)`

. In that case, RLlib uses the `Categorical`

TF action distribution and it runs.

I would now like to override the `Categorical`

's sampling operator and basically use the `(batch_size, 22)`

tensor to output the continuous action and the discrete action. I subclassed `Categorical`

and used a custom action distribution that does just that (note that the policy object also needs to be subclassed so that in `get_distribution_inputs_and_class`

we use the right distribution).

In particular, when I print the sampled tensor in my custom `_build_sample_op`

and `deterministic_sample`

functions, I get the expected (and desired) shape `(batch_size, 2)`

. However, the environment step function still receives an integer between 0 and 21, meaning that the custom sampling is not used.

I think it is also worth mentioning that since RLlib first creates a fake action based on the action space to get things started, I change the received action `x`

in `logp`

as `tf.zeros_like(x)`

.

Is there something I’m missing, e.g something in the rollout worker or the sample batch?

Thanks a lot!