1 / 1

Learning Vehicular Dynamics, with Application to Modeling Helicopters

S T A N F O R D. Models in Prior Work Predict velocities and angular rates: f : learned from data. Obtain position and orientation from numerical integration. Shortcomings From physics we have:

crete
Télécharger la présentation

Learning Vehicular Dynamics, with Application to Modeling Helicopters

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. S T A N F O R D Models in Prior Work • Predict velocities and angular rates: • f: learned from data. • Obtain position and orientation from numerical integration. Shortcomings • From physics we have: • Body coordinate frame is different at every time step. This makes inertia highly non-linear in the state and very difficult to capture/learn from data. • For most physical systems, forces and torques have a fairly simple relation to inputs and current state. This simplicity is lost by the change of coordinate frame. First Autonomous Funnel • Aerobatic maneuver. • Method: model-based reinforcement learning. • Simulator: • Acceleration prediction. • Longer time-scale criterion. • Acknowledgments: control is joint work with Adam Coates, Ben Tse. (Paper forthcoming.) Rotation between body coordinate frames at times t and t+1 Accelerations Video available. Simulator Accuracy • Our acceleration prediction model • Predict accelerations: • f: learned from data. • Obtain velocity, angular rates, position and orientation from numerical integration. Advantages • No need to learn inertia from data. Constraints from physics are incorporated explicitly. • The relation between state, inputs and accelerations is not cluttered by the change of coordinate frame, and thus easier to learn from data. • Standard learning criteria • Frequency domain fitting: requires a linear model, used in CIFER (industry standard). • Minimize one-step prediction error: • For f linear in state s and inputs u: f can be found by linear regression. Longer time-scale criterion • Accuracy of simulation over longer time-scales is important for control. The following longer time-scale criterion was suggested in [Abbeel & Ng, 2004]: (H: time-scale of interest) • EM-algorithm for maximization is expensive in our continuous state-action space setting. We present a simple and fast algorithm for (approximately) minimizing the average squared error over a certain duration. • Sketch of algorithmic idea (see paper for full algorithm) • Model: • One step prediction at time t: • One step prediction at time t+1: • Two step prediction at time t: • Therefore, can approximate multiple-step dynamics by linear combination of one-step dynamics. • Our algorithm iterates the following two steps: • Compute estimate of st+1 given st, ut, ut+1 for current model A,B. • Estimate Bergen Industrial Twin XCell Tempest • Observations • Acceleration prediction model significantly better. Reasons: • Captures gravity exactly. • Captures inertia, thus side-slip effects in the data. • Longer time scale criterion outperforms CIFER, which in turn outperforms the one-step criterion. • Differences more significant for Tempest than for Bergen, since Bergen data is mostly around hover. Legend Linear model, one-step prediction error. Linear model, frequency domain fit with CIFER. Linear model, longer time scale prediction error. Acceleration model, one-step prediction error. Acceleration model, longer time scale prediction error. Helicopter State and Inputs • 12-D state: • 8-D state: • u1, u2: The longitudinal (front-back) and latitudinal (left-right) cyclic pitch controls cause the helicopter to pitch forward/backward or roll sideways. • u3: The tail rotor collective pitch control affects tail rotor thrust, and can be used to yaw (turn) the helicopter. • u4: The main rotor collective pitch control affects the main rotor's thrust. Orientation: roll, pitch, yaw Angular rates Position Velocity Encode symmetries using body (=robot-centric) coordinates Body coordinate frame attached to helicopter S T A N F O R D Learning Vehicular Dynamics, with Application to Modeling Helicopters Overview • Model-based reinforcement learning has been very successful. • State-of-the-art: • Reinforcement learning returns policies that fly well in simulation. • Remaining helicopter failures typically caused by inaccurate simulation. • Key technical challenge: Building an accurate simulator. • Our approach: • Encode all constraints known from physics. (Gravity, inertia, etc.) Learn only parts of model not determined by physics. • Explicitly learn simulation that is predictive at long time-scales. • Result • Significantly improved helicopter model. • First autonomous funnel (aerobatic maneuver) using our model. RC Helicopters Bergen Industrial Twin Pieter Abbeel, Varun Ganapathi, Andrew Y. Ng XCell Tempest Conclusion • Key technical challenge for model-based reinforcement learning applied to helicopters: building an accurate simulator. • Our approach • By using acceleration-based approach, we can encode all constraints known from physics. (Gravity, inertia, etc.) Learn only parts of model not determined by physics. • Explicitly learn simulation that is predictive at long time-scales. • Result • Significantly improved helicopter model. • First autonomous funnel (aerobatic maneuver) using our model.

More Related