Multi-Policy Decision Making for Reliable Navigation in Dynamic Uncertain Environments
Add to Google Calendar
The complex and tightly-coupled interactions between the dynamic agents make everyday social environments unpredictable, posing a major obstacle for autonomous navigation. Trajectory planning often produces reasonable behavior, but they do not account for the future closed-loop interactions of other agents with the trajectory being constructed. As a consequence, the trajectories are unable to anticipate cooperative interactions (such as a human yielding), or adverse interactions (such as the robot blocking the way).
In this dissertation, we introduce Multi-Policy Decision Making (MPDM) as a novel framework for autonomous navigation in dynamic, uncertain environments where the robot's trajectory is not explicitly planned, but instead, the robot dynamically switches between a set of candidate closed-loop policies, allowing it to adapt to different situations encountered in such environments.
The candidate policies are evaluated based on multiple forward simulations of samples drawn from the estimated distribution of the agents' current states. These forward simulations and thereby the cost function, capture coupled interactions between the agents' behaviors.
The robot's emergent behavior is directly affected by the quality of policy evaluation. Reliably evaluating a policy is based on only a few forward roll-outs (due to real-time constraints) is difficult, especially with the large space of possible pedestrian configurations. By representing a forward simulation as a recurrent network and enabling the quick computation of accurate gradients, we radically improve the reliability and expressivity of MPDM. We reformulate the traditional motion planning problem and present it in a very different light "” as a bilevel optimization problem where the robot adapts its policy to avoid potentially dangerous future outcomes now.
Finally, through extensive experiments on a physical robot platform operating in a semi-crowded environment evaluated through objective metrics and subjective feedback, we show that MPDM produces emergent behavior that is more reliable as compared to a state-of-the-art trajectory planning algorithm.