180 likes | 552 Vues
Pattern-Forming Instabilities in the Swift-Hohenberg Model. Micah Brodsky 6.338 Spring ‘08. Swift-Hohenberg Model. Scalar partial differential equation Can derive as approximation to Navier-Stokes at onset of convection But used generally as case study in pattern formation
E N D
Pattern-Forming Instabilities in the Swift-Hohenberg Model Micah Brodsky 6.338 Spring ‘08
Swift-Hohenberg Model • Scalar partial differential equation • Can derive as approximation to Navier-Stokes at onset of convection • But used generally as case study in pattern formation • Has a Lyapunov functional – relaxes towards a steady state, no turbulence • But does interesting things along the way
The Model • ∂u/∂t = εu - u3 – (∇2 – 1)2u + g2u2 • Expands to:(ε - 1)·u - ∇2u - 2∇4u + g2u2 - u3 Saturationterm! Linear instability High-frequencystabilization Low-ordernonlinearities Preferred length scale • One of the simplest possible pattern-forming equations
Let’s try it out… ε = 1.5, g2 = 0 Initialized with low-amplitude random perturbations about zero
What Do We Understand? • We can easily predict simple features • Instability • Characteristic wavelength • Perturbational analysis gives a bit more • Wavelength instabilities (e.g. Zigzag and “Eckman”) • Higher order wave effects (e.g. hexagons!) • But what about high-amplitude behavior?
Scales of order ε = 4 ε = 0.5 ε = 0.1
Competition Among Patterns ε = 0.1, g2 = 0.5 Initialized with low-amplitude random perturbations about zero,along with a strip of -1.
High-Amplitude Instabilities • Initially saturates to uniform steady state solutions • 0 is linearly unstable, but ±√(ε – 1) are linearly stable • But then… ε = 3, g2 = 0 Initialized with low-amplitude random perturbations plus a slight positive bias, along with a strip of -1.
Implementation • Explicit finite difference solver • I.e. ui+1[x] = ui[x] + dt * ( … ui[x-1] - 2ui[x] + ui[x+1] ... ) • Instability is the bottleneck (and ∇4makes it hit hard) • C++ / MPI on SiCortex • Domain split into vertical slabs • Processors exchange 2-cell-wide boundaries after every time step
Output • SDL for real-time visualization • Forwarding X to the rank-0 node is a pain but can be done • Raw data dump to file, imported into Matlab with fread()
Where’s The Bottleneck? • Worst case communication / computation:~ 4 doubles / ~ 66 flops • But SiCortex has lots of bandwidth • Most of the time is spent computing • Compiler effects • Memory, memory, memory…
Compiler • -O3 goes without saying • But wait, why is it generating such crummy machine code (sometimes you have to look!) • Pointer analysis! • Compiler can use registers and software pipelining efficiently only if it knows that it’s not tripping over its own calculations • Use __restrict__ pointers wherever possible • Means no other pointer points to the same data • 50% FLOPS improvement on simpler kernel (diffusion equation) • But… only about 10% here.
Memory Hierarchy • Profile with papiex -a • L1 cache easily overrun • Solution: block striping • Adds some overhead, but about 10% net savings on “tall” problems (why not more?) • L2 usually okay • (though not if we go 3D) Memory access stencil:
Communication • SiCortex has a lot of bandwidth • 1 double for every 2 clock cycles (without congestion) • This is plenty to keep us busy • But, communication is still up to 25% of runtime (use mpipex, subtract startup overhead) • We can still gain by overlapped communication & computation • Or can we?
Overlapped Communication • Solution: • Compute edges first, then start non-blocking transfers (Isend, Irecv) • Wait for results only whenwe need to compute nextedges • Problem! This kills our L1 cache performance gains • Communication is heaviest when middle block is narrowest • Net result: 5-7% gain on large problems • 13% on large, “thin” problems
The real problems • Kernel has too many instructions! • Processor has poor instruction-level parallelism (ILP) • Not to mention the time step is *really* slow • Implicit / Crank-Nicholson methods? • Spectral methods? • Scalability • Slab slicing limits to width / 2 processors • Switch to block decomposition? • 3D • I really want to see 3D results. In 3D. =)