1 / 29

Kill spill The ML-detour for slow extracted spill control

Explore the use of machine learning detour for slow extracted spill control in North Area Fixed target experiments. Investigate the possibility of injecting sinusoidal modulation to compensate for noise on spill. Test the resilience of trained agent for changes in spill parameters.

rdeal
Télécharger la présentation

Kill spill The ML-detour for slow extracted spill control

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Kill spillThe ML-detour for slow extracted spill control S. Hirlander, V. Kain

  2. Spill quality to North Area Fixed target experiments • Resonant extraction: 2/3 integer resonance • Slow  extraction over ~ 5 s flattop • Constraint from experiments in NA:

  3. Spill quality to North Area Fixed target experiments • Resonant extraction: 2/3 integer resonance • Slow  extraction over ~ 5 s flattop • Constraint from experiments in NA:

  4. Spill quality to North Area Fixed target experiments • Resonant extraction: 2/3 integer resonance • Slow  extraction over ~ 5 s flattop • Constraint from experiments in NA:

  5. Spill quality to North Area Fixed target experiments • Inject sinusoidal modulation of quadrupole current (QF current in future) to compensate n x 50 Hz noise on spill • Can inject at 50, 100, 150 and 300 Hz •  Task: find amplitude and phase of correction signal at various frequencies • And: spill noise drifts in amplitude and phase

  6. Other example… • Can be really bad… • Since 2016: measure phase, "calibrate" electronics for phase response: • Adjust amplitude after scan •  Typically 10 iterations

  7. Can we reduce correction time to 1 step • What about reinforcement learning? •  test case for NAF algorithm • Ingredients: • OpenAI gym environment simulating spill • What is the state? • What is an action? • What is the reward?

  8. Simulated environment • Only 50 Hz • Measured signal: • Goal: minimize ripple with the optimal

  9. Simulated environment • “BSI spill monitor” (2kHz) for SHiP cycle • State: real and imaginary part of the FFT spectrum at 50 Hz • Reward: effective spill length: max = 1 s

  10. Numerical optimizer versus RL •  goal: •  • COBYLA: example between 25 and 40 iterations Could be accelerated by restricting the search space.

  11. Numerical optimizer versus RL

  12. Numerical optimizer versus RL • RL with NAF • Simon has prepared a wrapper in the spinup style for the NAF algorithm

  13. Simon’s NAF

  14. NAF and Environment • Activation function of final layer: tanh • action: np.array: •  scale it to correct values in environment

  15. How to train? • Ideally want to learn best policy to correct for when fspill, Aspill change. • But: to train need to be able to change those; Can only change fcorr and Acorr •  Training works in episodes: • For each new episode reset() is invoked in OpenAI gym environment • Assume to start with fairly well corrected spill • reset(): choose random setting of fcorrand Acorr • During episode: step(): find best Dfcorr and DAcorr to maximise effective spill length • Stop if going too far • Stop if reached maximum number of allowed iterations per episode • Stop if reached acceptable effective spill length (i.e. 0.995) •  Test resilience of trained agent for changes in fspill and Aspill

  16. Does it work? • Training (reset(): random initial correction) • Example for a given seed: 654 150 episodes Max episode length: 300 1293 iterations

  17. Does it work? • Training (reset(): random initial correction) • Example for a given seed: 654 150 episodes Max episode length: 300 1293 iterations ~ 6 h of beam time with production supercycle.

  18. Does it work? • Test (reset(): 150 random initial spill) • Within limited range: fspill = +/- 20º . Amplitude change only up to DAspil = 0.1; Otherwise agent cannot solve it anymore! • Resilience to phase changes is good, but amplitude changes not good enough. •  would need to run an optimizer after a drift too large in amplitude ….

  19. Example - larger amplitude drifts • Test (reset(): 150 random initial spill) • Within limited range: fspill = +/- 20º . Amplitude change only up to DAspil = 0.3; Otherwise agent cannot solve it anymore!

  20. What to do?  Analytical solution • Can measure amplitude of oscillation with FFT • Without correction:  Aspill • With correction:  Asum • Need to calibrate electronics: • Amplitude and phase knob calibration factor offset what we set…

  21. Calibration – amplitude settings • Switch off correction: measure Aspill • Switch on correction: measure Asumwith two amplitude correction settings (  A1, A2 )

  22. Calibration – phase settings • Measure Asumwith two phase correction settings (  A1, A2 ) •  calculate d according to formula before

  23. Correction procedure • Do not want to switch off correction – huge amplitudes • Two measurements with different correction amplitudes  • Calculate Aspill • Adjust ac such that • Calculate cosd as before, adjust pc that • Sign not uniquely defined, try both.

  24. Conclusion • Numerical optimizers always work. • But they take some time • Training for RL is tricky. Cannot train for what is really drifting. • The training with the correction settings can however be used to compensate for spill amplitude and phase setting changes. • But only in a small range for drifts • If too large  use optimizer to reset, then work again with agent • RL training takes a long time!! • Analytic solution looks fine! • But it did not work in the past • Calibration seemed to drift… • ….will have a new correction system. Need BEAM to check!

  25. SPARE

  26. Changes of the response

  27. MSWG 31st of July 2015

  28. Example PPO and spill Environment

  29. Example TD3

More Related