Kill spill The ML-detour for slow extracted spill control

Kill spillThe ML-detour for slow extracted spill control S. Hirlander, V. Kain

Spill quality to North Area Fixed target experiments • Resonant extraction: 2/3 integer resonance • Slow  extraction over ~ 5 s flattop • Constraint from experiments in NA:

Spill quality to North Area Fixed target experiments • Inject sinusoidal modulation of quadrupole current (QF current in future) to compensate n x 50 Hz noise on spill • Can inject at 50, 100, 150 and 300 Hz •  Task: find amplitude and phase of correction signal at various frequencies • And: spill noise drifts in amplitude and phase

Other example… • Can be really bad… • Since 2016: measure phase, "calibrate" electronics for phase response: • Adjust amplitude after scan •  Typically 10 iterations

Can we reduce correction time to 1 step • What about reinforcement learning? •  test case for NAF algorithm • Ingredients: • OpenAI gym environment simulating spill • What is the state? • What is an action? • What is the reward?

Simulated environment • Only 50 Hz • Measured signal: • Goal: minimize ripple with the optimal

Simulated environment • “BSI spill monitor” (2kHz) for SHiP cycle • State: real and imaginary part of the FFT spectrum at 50 Hz • Reward: effective spill length: max = 1 s

Numerical optimizer versus RL •  goal: •  • COBYLA: example between 25 and 40 iterations Could be accelerated by restricting the search space.

Numerical optimizer versus RL

Numerical optimizer versus RL • RL with NAF • Simon has prepared a wrapper in the spinup style for the NAF algorithm

Simon’s NAF

NAF and Environment • Activation function of final layer: tanh • action: np.array: •  scale it to correct values in environment

How to train? • Ideally want to learn best policy to correct for when fspill, Aspill change. • But: to train need to be able to change those; Can only change fcorr and Acorr •  Training works in episodes: • For each new episode reset() is invoked in OpenAI gym environment • Assume to start with fairly well corrected spill • reset(): choose random setting of fcorrand Acorr • During episode: step(): find best Dfcorr and DAcorr to maximise effective spill length • Stop if going too far • Stop if reached maximum number of allowed iterations per episode • Stop if reached acceptable effective spill length (i.e. 0.995) •  Test resilience of trained agent for changes in fspill and Aspill

Does it work? • Training (reset(): random initial correction) • Example for a given seed: 654 150 episodes Max episode length: 300 1293 iterations

Does it work? • Training (reset(): random initial correction) • Example for a given seed: 654 150 episodes Max episode length: 300 1293 iterations ~ 6 h of beam time with production supercycle.

Does it work? • Test (reset(): 150 random initial spill) • Within limited range: fspill = +/- 20º . Amplitude change only up to DAspil = 0.1; Otherwise agent cannot solve it anymore! • Resilience to phase changes is good, but amplitude changes not good enough. •  would need to run an optimizer after a drift too large in amplitude ….

Example - larger amplitude drifts • Test (reset(): 150 random initial spill) • Within limited range: fspill = +/- 20º . Amplitude change only up to DAspil = 0.3; Otherwise agent cannot solve it anymore!

What to do?  Analytical solution • Can measure amplitude of oscillation with FFT • Without correction:  Aspill • With correction:  Asum • Need to calibrate electronics: • Amplitude and phase knob calibration factor offset what we set…

Calibration – amplitude settings • Switch off correction: measure Aspill • Switch on correction: measure Asumwith two amplitude correction settings (  A1, A2 )

Calibration – phase settings • Measure Asumwith two phase correction settings (  A1, A2 ) •  calculate d according to formula before

Correction procedure • Do not want to switch off correction – huge amplitudes • Two measurements with different correction amplitudes  • Calculate Aspill • Adjust ac such that • Calculate cosd as before, adjust pc that • Sign not uniquely defined, try both.

Conclusion • Numerical optimizers always work. • But they take some time • Training for RL is tricky. Cannot train for what is really drifting. • The training with the correction settings can however be used to compensate for spill amplitude and phase setting changes. • But only in a small range for drifts • If too large  use optimizer to reset, then work again with agent • RL training takes a long time!! • Analytic solution looks fine! • But it did not work in the past • Calibration seemed to drift… • ….will have a new correction system. Need BEAM to check!

SPARE

Changes of the response

MSWG 31st of July 2015

Example PPO and spill Environment

Example TD3

Kill spill The ML-detour for slow extracted spill control