ISSUES IN THE DESIGN AND ANALYSIS OF COMPUTER EXPERIMENTS

ISSUES IN THE DESIGN AND ANALYSIS OF COMPUTER EXPERIMENTS David M. Steinberg Tel Aviv University SAMSI Working Group March 2007

COLLABORATORS Dennis Lin Dizza Bursztyn Ron Kenett Henry Wynn Ron Bates Sigal Levy Einat Neuman Ben Ari Gideon Leonard Tamir Reisin Eyal Hashavia Zeev Somer THANK YOUS Noga Alon Ronit Steinberg SAMSI Working Group March 2007

PREVIEW • Some Applications • Nuclear Waste Repository • Ground Response to an Earthquake • Chemotherapy Simulator • Optimizing a Piston • Designing Computer Experiments • Latin Hypercube Designs • Rotated Factorial Designs • LHD’s as Rotated Factorial Designs • Near LHD’s from Rotated Factorials • Nuclear Waste Disposal: Quandaries • Chemotherapy: Quandaries • Ground Shaking: Quandaries • GASP Models and Bayesian Regression SAMSI Working Group March 2007

Example: Nuclear Waste Repository RESRAD computes leaching of radioactive isotopes from the repository into the food and water supply. Time frame is thousands of years, so field study is impossible. SAMSI Working Group March 2007

Example: Nuclear Waste Repository • Inputs • Initial isotope concentrations • Distribution coefficients of the isotopes • Lithology of the repository • Outputs • Maximal dose during 10,000 years SAMSI Working Group March 2007

Example: Ground Shaking What will be the ground response to an earthquake? An engineering simulator uses a finite element scheme to simulate ground motion. Shaking of the bedrock generates surface motion. We wish to study the output from the program to aid earthquake preparedness plans. SAMSI Working Group March 2007

Example: Ground Shaking • Inputs • Geometry of the ground surface • Layers of hard/soft soil below the surface • Shear velocity, density, elasticity of the soil in each layer • Amplitude and spectrum at bedrock • Outputs • Displacement along the surface • Acceleration along the surface SAMSI Working Group March 2007

Example: Chemotherapy Simulator What is the effect of chemotherapy treatment? The treatment affects both cancerous and healthy cells in the body. The goal is to develop treatment protocols that will put the cancer into remission with minimal damage to healthy cells. SAMSI Working Group March 2007

Example: Chemotherapy Simulator • Inputs • Treatment protocol: dosage and timing • Rate of drug decay • Rate of cell death • Rate of cell regeneration • Outputs • Number of healthy and malignant cells, as a fraction of the initial count SAMSI Working Group March 2007

Example: Piston Performance The piston simulator was written by Kenett and Zacks as a teaching tool for their text book. The simulator describes the cycle time of a piston and is based on the physics governing the piston. Variation in output is related to tolerances in the inputs. The goal was to achieve a target cycle time with minimal variation. SAMSI Working Group March 2007

Example: Piston Performance • Output • Cycle time SAMSI Working Group March 2007

Latin Hypercube Designs Latin Hypercubes are the most popular class of experimental plan. LHD’s place the input levels for each factor on a uniform grid. Then “mate” the levels across factors by randomly permuting the column for each factor. McKay, Beckman and Conover, Technometrics, 1979. SAMSI Working Group March 2007

Latin Hypercube Designs Example of a Latin Hypercube design for 3 factors. SAMSI Working Group March 2007

Latin Hypercube Designs Some 2-factor projections from a 250-run LHD. SAMSI Working Group March 2007

Latin Hypercube Designs Other mating schemes have been suggested to obtain columns with low correlation. Ye showed how to get 2m-2 fully orthogonal columns with 2m runs. Butler showed how to get orthogonality with respect to a trigonometric regression model and 2m runs. How many orthogonal columns are possible? SAMSI Working Group March 2007

Rotated Factorial Designs Bursztyn and Steinberg developed experimental plans with many levels in which linear effects are orthogonal. Start with a “standard” first-order orthogonal design, like a 2k-p fractional factorial: D. “Rotate” the design using a rotation matrix R: D  DR. Then (DR)’(DR) = R’D’DR = nR’R = nI. SAMSI Working Group March 2007

LHD’s as Rotated Factorial Designs Steinberg and Lin showed how to rotate two-level factorials into Latin Hypercube designs with a large number of first-order orthogonal columns. This work combines a rotation idea in Bursztyn and Steinberg with another rotation idea developed by Lin and Beattie. SAMSI Working Group March 2007

LHD’s as Rotated Factorial Designs • Lin and Beattie: rotate 2k factorials to Latin Hypercube designs. The intuition: • Columns in a LHD are an arithmetic sequence. • Columns in DR are linear combinations of the rows of D(the 2k design). • The rows of Dare a binary expansion of the odd integers. • Using appropriate powers of 2 as the elements in R, each column in DR is an integer sequence. SAMSI Working Group March 2007

LHD’s as Rotated Factorial Designs SAMSI Working Group March 2007

LHD’s as Rotated Factorial Designs Weights SAMSI Working Group March 2007

LHD’s as Rotated Factorial Designs Weights Weighted Sums SAMSI Working Group March 2007

LHD’s as Rotated Factorial Designs • Lin and Beattie: rotate 2k factorials to Latin Hypercube designs. • Can we organize weights for multiple columns in a rotation matrix R? • Yes – provided R is t by t, where t is a power of 2. • A simple recursive scheme gives the rotation matrices. • Original proposal limited to full factorial designs 2k, where k is a power of 2. SAMSI Working Group March 2007

LHD’s as Rotated Factorial Designs Lin and Beattie: rotate 2k factorials to Latin Hypercube designs. SAMSI Working Group March 2007

LHD’s as Rotated Factorial Designs Bursztyn and Steinberg showed that fractional factorial designs can also be rotated. First, the design must be decomposed into sets of factors, each of which is a full factorial. SAMSI Working Group March 2007

LHD’s as Rotated Factorial Designs Steinberg and Lin: Bursztyn & Steinberg Lin & Beattie The resulting design is an orthogonal Latin hypercube. SAMSI Working Group March 2007

LHD’s as Rotated Factorial Designs The construction requires that each set of columns be a full factorial design. Suppose we start with a saturated fractional factorial with 2m runs. How can we “group” the columns to achieve the maximum number of full factorials? SAMSI Working Group March 2007

LHD’s as Rotated Factorial Designs We can order the columns so that each set of m consecutive columns is a full factorial. • Identify the columns as the non-zero points in GF(2m). • All non-zero points (hence all columns) can be obtained as xj mod p(x), where p(x) is a primitive polynomial of GF(2m). • Order the columns by the order of the powers. • A set of m consecutive columns is not a full factorial if it as a linear dependency. Easy to show that this implies a linear dependency in the first m columns. SAMSI Working Group March 2007

LHD’s as Rotated Factorial Designs • Identify the columns as the non-zero points in GF(2m), the Galois Field of binary vectors of length m. • The column of 1’s is matched with (0,0,…,0). • The column for A is matched with (1,0,…,0). • The column for B is matched with (0,1,0,…,0). • The column for AB is matched with (1,1,0,…,0). • In general, the column for any interaction is matched with a vector with 1’s marking the factors involved in the interaction. SAMSI Working Group March 2007

LHD’s as Rotated Factorial Designs • Identify the columns as the non-zero points in GF(2m), the Galois Field of binary vectors of length m. • Each binary vector is used to represent a polynomial with binary coefficients. • AC  (1,0,1,0,0,0)  1 + x2 • BDF  (0,1,0,1,0,1)  x + x3 + x5 SAMSI Working Group March 2007

LHD’s as Rotated Factorial Designs 2. All non-zero points (hence all columns) can be obtained as xj mod p(x), where p(x) is a primitive polynomial of GF(2m). GF theory – there exists a primitive polynomial, p(x), that can be used to generate all the non-zero polynomials in GF(2m). The primitive polynomial is a binary polynomial of degree m. Recall that m is the number of factors, so we want to generate all polynomials of degree m-1 or less. All calculations are carried out modulo 2. SAMSI Working Group March 2007

LHD’s as Rotated Factorial Designs 2. All non-zero points (hence all columns) can be obtained as xj mod p(x), where p(x) is a primitive polynomial of GF(2m). For example, with m=4, a primitive polynomial is 1+x+x4. x0 ≡ 1 (A) x1≡ x (B) x2 ≡ x2 (C)x3 ≡ x3 (D) x4≡ 1+x (AB) x5≡ x+x2 (BC) etc. If we continue, we find all the non-zero polynomials. Every set of m successive columns is a full factorial. SAMSI Working Group March 2007

LHD’s as Rotated Factorial Designs The rotated designs are a special class of Latin Hypercubes with an external orthogonal array structure (U-designs). For each pair of columns, ¼ of all the points are in each quadrant. For many pairs, finer divisions hold. SAMSI Working Group March 2007

LHD’s as Rotated Factorial Designs Some 2-factor projections from the design of the ground-shaking study. SAMSI Working Group March 2007

LHD’s as Rotated Factorial Designs Points may “clump” in low-dimensional projections. In high dimensions, points do not clump. The rotation is isometric, so the inter-point differences are like those in the original factorial, except for “shrinking” the final design back to a hypercube. SAMSI Working Group March 2007

LHD’s as Rotated Factorial Designs Steinberg and Lin show that these rotated designs have good statistical properties as screening designs. Main effects have low aliasing with second order effects (by comparison with randomly mated LHC designs or randomly chosen U-designs). SAMSI Working Group March 2007

LHD’s as Rotated Factorial Designs Suppose you use the design to fit a simple first-order regression model, to “screen” the most influential factors: Y = Xβ + ε. But the true dependence involves additional regression terms: Y = Xβ + Zγ. Then β-hat = β + (X’X)-1X’Zγ = β + Aγ. The matrix A is known as the alias matrix. SAMSI Working Group March 2007

LHD’s as Rotated Factorial Designs The alias matrix depends on the design, the model used for screening, and the extra terms in Z. A good screening design should have small values in A for simple screening models and somewhat more complex extra terms. Bursztyn and Steinberg, JSPI (2006), 1103-1119, SAMSI Working Group March 2007

LHD’s as Rotated Factorial Designs We compared 16-run, 12-factor designs, with a first-order screening model and extra terms of second order. The alternatives: a standard LHD (best of 100 random choices) and an OA-based LHD (best of 100 random choices). SAMSI Working Group March 2007

LHD’s as Rotated Factorial Designs The percent of entries in A that were < 0.1: SAMSI Working Group March 2007

LHD’s as Rotated Factorial Designs For the standard and OA-based LHD’s, the results shown are the best found for 100 random designs. For the orthogonal LHD, all non-isomorphic groupings of columns into 3 sets of 4 columns were found. Results were very similar for all groupings. SAMSI Working Group March 2007

LHD’s as Rotated Factorial Designs A design with n/2 columns, all orthogonal to each other and to all possible second-order effects, can be constructed using the same ideas. The trick is in the choice of the starting design. We rotate the resolution IV “foldover” design. The rotation preserves the foldover property and that, in turn, guarantees the orthogonality properties. The GF(2m) structure again provides a way to group the columns into full factorials. SAMSI Working Group March 2007

Near LHD’s from Rotated Factorials Orthogonal designs that are nearly LHD’s can be obtained by rotating other base designs. Example: use as the base design the 48 run Plackett-Burman design. Rotate 40 factors in 5 groups of 8. The rotated design has all columns orthogonal. It is also a U-design. It is nearly a Latin Hypercube. SAMSI Working Group March 2007

Near LHD’s from Rotated Factorials Below is a q-q plot for one of the factors against a uniform distribution. SAMSI Working Group March 2007

Nuclear Waste Repository: Quandaries • Main goal is to assess which input factors have greatest influence on output: Sensitivity Analysis. • For example: given a proposed site, which factors should be measured? • Output data are highly skewed, with many 0’s (configurations with no leaching into the drinking water). • What is the best way to summarize the results? SAMSI Working Group March 2007

RESRAD • RESRAD is a computer model designed to estimate radiation doses and risks from RESidual RADioactive materials. • RESRAD simulates radiation doses and cancer risks for a variety of pathways in the environment (e.g. drinking water, food chain, atmosphere). Developed at Argonne National Laboratory. http://web.ead.anl.gov/resrad/ SAMSI Working Group March 2007

RESRAD • Number of input parameters can reach hundreds. • Most parameters are difficult/expensive to measure or control and are subject to wide ranges of uncertainty. SAMSI Working Group March 2007

RESRAD Typical RESRAD output. SAMSI Working Group March 2007

Our Case Study • Twenty-seven input parameters. • Initial radionuclide is U238 buried at a depth of 2 meters. • Lithology is one-dimensional, with contaminated, unsaturated and saturated layers above groundwater. SAMSI Working Group March 2007

Our Case Study • Wide uncertainties for inputs. • Many have log-normal distributions as a reflection of scientific uncertainty. • The distribution coefficients for U234 and U238 should be identical. • Outcome: maximal annual dose during 10k years. SAMSI Working Group March 2007

Our Case Study Use RESRAD’s built-in capability for sensitivity analysis. Options include: • One-factor-at-a-time analysis. • Random samples of input settings. • Latin Hypercube samples. • Different input parameter distributions (e.g. uniform, normal, log-normal). • Specified rank correlations of inputs. SAMSI Working Group March 2007

ISSUES IN THE DESIGN AND ANALYSIS OF COMPUTER EXPERIMENTS

ISSUES IN THE DESIGN AND ANALYSIS OF COMPUTER EXPERIMENTS

Presentation Transcript

Design and Analysis of Engineering Experiments

Design and Analysis of Experiments

Introduction to the design (and analysis) of experiments

Design and Analysis of Experiments

Design and Analysis of Experiments

Design and Analysis of Engineering Experiments

Design and Analysis of Engineering Experiments

Design and Analysis of Experiments

Design and Analysis of Engineering Experiments

Design and Analysis of Experiments Randomized Complete Block Experiments

Design and Analysis of Experiments

Design and Analysis of Experiments

Design and Analysis of Experiments

An overview of the design and analysis of experiments

DESIGN AND ANALYSIS OF EXPERIMENTS: Basics

Statistical Issues in the Design of Microarray Experiments

Design and Analysis of Experiments

Design and Analysis of Engineering Experiments

Design and Analysis of Engineering Experiments

Design of Experiments and Data Analysis

Design and Analysis of Engineering Experiments