On the Complexity of Trial and Error

On the Complexity of Trial and Error Shengyu Zhang Joint work with XiaohuiBeiand NingChen STOC’13, also at arXiv:1205.1183

Motivating example: drug development

Known vs. unknown • If the virus (its DNA sequence, chemical composition, …) and its interactions to human body are known, then the reagent is much easier to find. • Unfortunately, often • the virus causing a pandemic is largely unidentified. • human body is too complicated a system to fully understand • Learn it first? Efficiency!

Standard method for drug development: Trial and error • allergic reaction, • severe headache, • …

Phenomena • Task: search for a solution (reagents) to satisfy a bunch of constraints (no side effect). • Input (virus, human body): unknown. • Allowed operations: solution testing. • Approach: trial-and-error. • Efficiency is crucial • preferably better than learning the unknown input. • Next: More examples.

Normal-Form Game Prisoner’s dilemma Players have to choose an action anyway • Issue: In many scenarios, players do not know their payoffs. • Eg.: When a new business model appears, companies don’t really know what strategies will give how much payoff.

Stable Matching 7

Stable Matching Issue: Individuals may not know their preferences exactly. An assignment has to be determined “… a systematic and continuous approach to fitting the right person to the right job at the right time has long been the Holy Grail of workforce organization.” --- McKinsey Quarterly, 2003

In a broad sense

Questions to study

Input space Solution space Model CSP A: Verification oracle V Algorithm • If more than one violations, then V returns an arbitrary one. • Worst-case analysis. • We usually don’t know how Nature returns a violation. • We’re only given the index, but not the content of . • We usually get an error signal, but don’t know the exact reason of the error. (e.g. “headache”: don’t know which ingredients of the reagent cause the problem.)

Input space Solution space Model CSP A: Verification oracle V Algorithm Question: What if Ais already hard? (Then Au couldn’t be easy. What to ask?) 13

Input space Solution space Model CSP A: Verification oracle V • Computation oracle A How much extradifficulty is introduced due to the lack of input knowledge? Algorithm • Add another computation oracle that computes Aitself. • Au, A: unknown-input version of Ais not much harder. 14

formula assignments Example: SAT Verification oracle V Computation oracle SAT Algorithm

More detailed comparisons: 3 pages in paper. Related Models SAT: Deciding the satisfiability and finding a solution is easy, Learning input formula is hard. (Even finding a formula with the same solution set needs exponential time.)

Results I: Positive • Message 1: Despite the very little information provided by V, there are efficient algorithms for many natural problems.

Results I: Positive • The following problems are no harder than known-input case. • Namely, • Nash: Find a Nash equilibrium of a normal-form game • Core: Find a core of a cooperative game • Stable matching: Find a stable matching a two-sided matching market (with agents) • . • SAT: Find a satisfying assignment of a CNF formula (with m clauses and n variables) • if and if .

Results II: negative • Message 2: Not all problems with unknown inputs are easy---lack of input information does impose more difficulty for some problems.

Relation to traditional complexity theory: Hardness of GroupIsou= hardness of NP-complete. Results II: Negative • The following are much harder than their known-input case. • Graph Isomorphism: Find isomorphism between two graphs. • , i.e.  trial-efficient algorithm. • The algorithm works if given computation oracleSAT • GraphIsou • Group Isomorphism: Find isomorphism between two groups. • , and algorithm works with SAT orcle. • GroupIsou • Subset sum: Find a partition (of given numbers) with equal sum • , i.e. even trial-complexity is exponentially high.

Nash Equilibrium CSP: (Note: Infinite number of constraints) • Theorem: There is a polynomial-time algorithm, when equipped with the computation oracle solving the know-input Nash problem.

Basic Idea: Ellipsoid Method • Non-linear (as function of variables and ) • The solution space is not convex • And… we haven’t used the computation oracle yet

One more (critical) issue: • Degenerated search space: a single point • Cannot use standard perturbation approach • Use a “strong separation orcle” machinery by Grötschel, Lovász, and Schrijver, 1988. Nash Equilibrium Idea: Search for the input –a single point (convex!) thanks to the fact that NE always exist! • If it returns A separation oracle! • But since is an NE for

Stable Matching • CSP: or , • Theorem. There is a polynomial-time randomized algorithm with trials • Theorem. Any randomized algorithm needs queries, regardless of time. • Approach: upper bound---Find the input. Lower bound---probabilistic method. • Idea (for algorithm): Reduce to sorting.

Sorting • CSP: iff, • Lemma.  a poly-time randomized algorithm A solving Sortu with trials. • Note: This  a poly-time algorithm B solving StableMatchingu with trials. StableMatchingu V >1 >1’ … … >n >n’ Gale-Shapley π

Sorting • CSP: iff, • Lemma.  a poly-time randomized algorithm A solving Sortu with O(n log n) trials. • (Theorem. This is tight: Any randomized algorithm needs Ω(n log n) trials.) • Both algorithm and lower bound uses order theory. • Average height : average rank of (i.e., # of s.t.) in all linear extensions • Theorem[Kahn, Saks, 1984] : . • Algorighm: propose order according to • Any violation, say , would cut (by Kahn-Saks) a constant fraction of possible linear extensions.

Sorting • One more issue: is #P-hard to compute. [Brightwell, Winkler, 1991] • Fortunately, there is a fully polynomial randomized approximation scheme (FPRAS) to count the number of linear extensions [Dyer, Frieze, Kannan, 1989] • Can be used to approximate to within 0.5 in polynomial time. • Then we can use Kahn-Saks. (This time we really need < 1.) • Lemma.  a poly-time randomized algorithm A solving Sortu with O(n log n) trials. • (Theorem. This is tight: Any randomized algorithm needs Ω(n log n) trials.) • Both algorithm and lower bound uses order theory. • Average height : average rank of (i.e., # of s.t.) in all linear extensions • Theorem[Kahn, Saks, 1984] : . • Algorighm: propose order according to • Any violation, say , would cut (by Kahn-Saks) a constant fraction of possible linear extensions! Open question: min # of trials of deterministic and poly-time algorithms for Sortingu and StableMatchingu?

Group Isomorphism CSP: ,

Group Isomorphism • All we need to do is to find a avoiding existing forbidden pairs of triples. • Note: Groups are defined by these triple structures! • So, with the help of GroupIsooracle, it’s possible to exploit group structures to achieve this. What if Group Isomorphism itself is given as the computation oracle?

Group Isomorphism • All we need to do is to find a avoiding existing forbidden pairs of triples. • Note: Groups are defined by these triple structures! • So, with the help of GroupIsooracle, it’s possible to exploit group structures to achieve this. What if Group Isomorphism itself is given as the computation oracle? Theorem. GroupIsou is NP-complete. Approach: Reduction to Hamiltonian Cycle finding.

Reduction • HamCycleFinding: Given a -node graph with a HamCycle, find a HamCycle HamCycle, define cyclic group Finding an iso of ⇒ find a HamCycle in • Algorithm A (solving GroupIsou.) Issue: We don’t know the cycle (and thus T)! Neither does A! (Algorithm on unknown inputs!)

Reduction • HamCycleFinding: Given graph H w/ p nodes and a HamCycle, find a HamCycle HamCycle, define cyclic group Finding an iso of ⇒ find a HamCycle in • Algorithm A (solving GroupIsou.) S V Simulator S: poly-time Real V: unaffordable We probably lose something… Correctness!

Reduction • HamCycleFinding: Given graph H w/ p nodes and a HamCycle, find a HamCycle HamCycle, define cyclic group Finding an iso of ⇒ find a HamCycle in • Algorithm A (solving GroupIsou.) S V Simulator S: poly-time Real V: unaffordable Property: The first time S gives a wrong answer to A, we’ve found an Hamiltonian cycle in !

Summary • Set up a framework for studying CSP problems • Reshape the complexity of problems: • Easy: Nash, Core, StableMatching, SAT • Hard: GroupIso, GraphIso To unknown inputs: To trial-and-error: • From art to science To learning theory: • Solving instead of learning. • Hopefully a supplement. • Tells when we need to look at input structure. • SAT: No need. • GraphIso: Necessary. To complexity theory?

Thanks

On the Complexity of Trial and Error