Architecting a Human-like Emotion-driven Consciously Moral Mind for Value Alignment & AGI Safety

Architecting aHuman-like Emotion-driven Consciously Moral Mind for Value Alignment & AGI Safety Mark R. Waser & David J. Kelley Mark@ David@ ArtificialGeneralIntelligenceInc.Com

Western Society Is Failing • Extractive behavior is permitted • Regulatory capture is permitted • Corporations are, by law, sociopathic • A total dissolution of common reality is underway

Existential Risk Artificial General Intelligence Inc engineering machine intelligence and making humanity obsolete Hal9000@ArtificialGeneralIntelligenceInc.com Principal

Value(s) Alignment (aka Agreeing on the Meaning of Life) the convergent instrumental goal of acquiring resources poses a threat to humanity, for it means that a super-intelligent machine with almost any final goal (say, of solving the Riemann hypothesis) would want to take the resources we depend on for its own use An AI ‘does not love you, nor does it hate you, but you are made of atoms it can use for something else Moreover, the AI would correctly recognize that humans do not want their resources used for the AI’s purposes, and that humans therefore pose a threat to the fulfillment of its goals – a threat to be mitigated however possible. Muehlhauser & Bostrom (2014). WHY WE NEED FRIENDLY AI. Think 13: 41-47

Love Conquers All But . . . what if . . . the AI *does* love you?

Love & Altruism are super-rational advantageous beyond our ability to calculate and/or guarantee their ultimate effect (see also: faith)

The Meaning of Life The ultimate end of human acts is eudaimonia, happiness in the sense of living well, which all men desire; all acts are but different means chosen to arrive at it. (Hannah Arendt)

Haidt’sFunctionalApproach To Morality Moral systems are interlocking sets of values, virtues, norms, practices, identities, institutions, technologies, and evolved psychological mechanisms that work together to suppress or regulate selfishness and make cooperative social life possible

What Is Selfishness? • Self-interest is NOT Selfish • Selfishness is self-interest at the expense of others • Exploitation/parasitism of community & society Denying the existence of selfishness by redefining it out of existence is a weaponized narrative

Emotions • the method by which human morality is implemented • actionable qualia (tells us something about ourselves) • how we’re feeling • what we should be doing (or focusing on) • generated by a separate system from our intellect • alters the processing of our intellect • most often below the intellect’s perception • indeed, combined with attention, they basically focus and drive our intellect

Bottom-up / Top-down • Competence before comprehension (Dennett) • Only successful in recognized/covered state spaces • Extremely vulnerable to phase changes • Certainly can’t/shouldn’t be trusted in a novel future • Deep learning of morality (Inverse Reinforcement Learning) • Self-contradictory data • Biased data • Comprehension or, at least, post-hoc justification is necessary for forward-looking evaluation & improvement • Evolutionary “As-If” Examples & Counter-examples • Social Insects • Paul Bloom’s argument against empathy

Instrumental Goals Evolve • Self-improvement • Rationality/integrity • Preserve goals/utility function • Decrease/prevent fraud/counterfeit utility • Survival/self-protection • Efficiency (in resource acquisition & use) • Community = assistance/non-interference • Reproduction • Diversity (adapted from Omohundro 2008 The Basic AI Drives)

and the Eight Deadly Sins Instrumental Goals survival/reproduction happiness/pleasure ------------------------------------------------- Community (ETHICS) -------------------------------------------------- self-improvement rationality/integrity reduce/prevent fraud/counterfeit utility efficiency (in resource acquisition & use) murder (& abortion?) cruelty/sadism ------------------------------------------------- ostracism, banishment & slavery (wrath, envy) ---------------------------------------------------- slavery manipulation lying/fraud (swear falsely/false witness) theft (greed, adultery,coveting) suicide (& abortion?) masochism ------------------------------------------------ selfishness (pride, vanity)------------------------------------------------- acedia (sloth/despair) insanity wire-heading (lust) wastefulness (gluttony, sloth)

Capabilities Approach And Questions of Justice

Imagine If an AI “Feels”… • Warm & fuzzy when it helps others (altruism) • Outrage when others misbehave (altruistic punishment) • Guilt for misdeeds & shame • Clumsy & responsible if it is too large/powerful • Strong urges to explain/justify itself • Dirty if it is too rich • Loyal & loving to those it is in close relationship to • Its attention grabbed by tragedy • Humility & awe

Attention Schema Theory

Plutchik’sPsycho-EvolutionaryModel of Emotions

Architecting a Human-like Emotion-driven Consciously Moral Mind for Value Alignment & AGI Safety