1 / 81

Anthropological Informatics

Anthropological Informatics. Reality Measures or Reality bytes. Measurement and Perception.

milt
Télécharger la présentation

Anthropological Informatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Anthropological Informatics Reality Measures or Reality bytes

  2. Measurement and Perception “Take away number in all things and all things perish. Take calculation from the world and all is enveloped in dark ignorance, nor can he who does not know the way to reckon be distinguished from the rest of the animals.” St. Isidore of Seville “And still they come, new from those nations to which the study of that which can be weighted and measured is a consuming love.” W.H. Auden

  3. Causality “In causal terms the presence of oxygen is a necessary but not a sufficient condition for fire. Oxygen plus combustibles plus the striking of a match would illustrate a sufficient condition for fire” William L. Reese

  4. A Necessary and Sufficient Condition • Oxygen • Combustibles • Matches

  5. Visualization: The Match? “Science and technology have advanced in more than direct ratio to the ability of men to contrive methods by which phenomena which otherwise could be known only through the senses of touch, hearing, taste, and smell have been brought within the range of visual recognition and measurement and thus become subjects to that logical symbolization without which rational thought and analysis are impossible.” William N. Ivins

  6. Mentalite “One of the fundamental traits of the mind of the declining middle ages is the predominance of the sense of sight, a predominance which is closely connected with the atrophy of thought. Thought takes the form of visual images. Really to impress the mind a concept has first to take the visible shape.” Johan Huizinga

  7. Dissonance • Modern: we feel that quantities are set and transactions are fair and equivalent • Present : Past : with inspection, vagaries and unfairness • In Roger Bacon 13th century, quanta differed from region to region and transaction to transaction • A bushel of oats was nor more nor less than as many oats a bushel basket contained but a bushel for the lord would be heaped and a bushel for the peasant was no more than level with the rim (the differential was not cheating but a proper negotiation)

  8. Greek metrological relief Conical sundial with hours in Greek letters Greek multiplication wax tablet

  9. Roman measuring tools Egyptian measuring gold rings against a bull’s head weight Egyptian alabaster vase With volume marked as 81/2 hennu

  10. Roman milestone Facsimile of the Peutinger Table, a copy of a Roman road map; Rome is at the center

  11. Ptolemies’ “Geography”

  12. Changes in Vision • A shift to the visual in the Middle Ages was the match that ignited the flame of quantification • Change was marked in several main fields of human exertion: - LITERACY - MUSIC - PAINTING - BOOKKEEPING

  13. Literacy • There was a shift in conduits of authority from the ear to the eye) • In the 14th century devised new cursive script with word separation and punctuation for easier writing and reading • Reading became swift and silent • Literacy spread to classes beneath poets and philosophers: composers, painters and bookkeepers

  14. Music • Renaissance Europeans considered music to be an emanation of the basic structure of reality (harmony guided the heavens) • Gregorian chants were performed from memory • By c. 10th century, accumulation of chants exceeded apprentices’ abilities to memorize • Monks developed a system of “neumes” or signs to indicate highs and lows without a musical staff • The musical staff was standardized by Guido of Arezzo, a 11th century Benedictine choirmaster • Ut … re … mi … fa … sol … la … cut the training of a good singer from 10 years to 1 year

  15. Quadrivium • 4 of the liberal arts considered essential for a solid education • Arithmetic • Geometry • Astronomy • Music Music and science: Galileo, Descartes, Kepler and Huyghens were all accomplished musicians and published on measurement in musical subjects

  16. Painting • Medieval artists were more concerned with rank of their subjects than with the faces of individuals (size = importance; space was to be filled by altering perspectives) • In the 14th century, geometry begins to guide compositions (scenes were to be viewed by an observer at single point in time; perspective was adhered to)

  17. Bookkeeping “We shall ever give ground to honor. It will stand to us like a public accountant, just, practical, and prudent in measuring, weighing, considering, evaluating, and assessing, everything we do, achieve, think and desire.” Leon Battista Alberti (1440) “Inasmuch as all things in the world have been made with a certain order, in like manner they must be managed … of the greatest importance, such as the business of merchants, which … is ordered for the preservation of the human race.” Benedetto de Cotrugli (15th c.)

  18. The merchant struggling to make sense of his books was a theme • Blizzards of transactions, scrambled by • Bills of exchange • Promissory notes • Credit practices • Axiom: production preceded delivery • Reality: payments could precede delivery or production • Payments were undulatory, with currencies and bills of exchange billowing and plunging in value in relation to one another

  19. RECORDS …. My god me we need records or what will we know? • By the end of the 14th c. Hindu-Arabic numerals were beginning to appear in merchants’ account books • Double-entry accounting systems were developed (ingoing and outgoing values; plus and minus); great improvement over narrative accounts • By the 15th century, an accounting lexicon and guides to practice were being developed

  20. Visions and Models “I often say that when you can measure what you are speaking about and express it in numbers you know something about it; but when you cannot measure it, and when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind.” William Thompson, Lord Kelvin (1891)

  21. Our Information Age: • All information incomplete. There is always more to know, always another way to reframe what is already known. Our leaders must make important decision on the basis of incomplete information

  22. Information does not narrow the range of choices; it widens it. Further information is likely to make any decision-making process more meaningful and effective. It may not make the decision easier.

  23. Information is always subject to multiple interpretations and constructions. “Data” is nothing until it is given meaning and assembled in a narrative.

  24. Information comes in many forms: data, stories, myths, visual images, and meta-theories. Information theorists do not regard data as information at all. It is potential information. Information is data endowed with relevance and purpose. • Data are undigested facts. • Information are facts organized for you by someone else but not yet absorbed into your own thinking. • Knowledge is information that you have internalized.

  25. Different people speak different information languages even when they are speaking the same language.

  26. Information leaks. In our information society nobody keeps secrets. There is an erosion of confidentiality that accompanies the inundation in information through media.

  27. Information once distributed is almost impossible to destroy. Information has its own survival skills.

  28. Information Production • about 10 exabytes • 90% digital • 55% personal • print .003% of bytes • email is 4 PB/y • www is about 50 TB • growth at 50% y Gray and Szalay 2003

  29. The First Disk 1956 • IBM 305 RAMAC • 4 MB • 50 X 24” disks • 1200 rpm • 100 ms access • $35K/y rent • Included computer and accounting software

  30. 10 Years Later 30 MB

  31. Cost of Storage

  32. Storage Capacity Outstrips Moore’s Law • Improvements Capacity 60%/y Bandwidth 40%/y Access time 16%/y • $1000/TB today • $100/TB in 2007 Moore’s Law: 58.7%/y TB growth: 112.3%/y Price decline: 50.7%/y

  33. Moore’s Law • Performance/price doubles every 18 months • 100 X per decade • Progress in next 18 months will outstrip all previous progress (new storage sums all previous storage and new processing will outstrip all old processing)

  34. Rules of Thumb for Data Engineering • Moore’s Law: an address bit per 18 months • Storage grows 100 X/decade (1000X in last decade!) • Disk data of 10 years ago now fits in RAM • Device bandwidth grows 10X/decade (need for parallelism) • RAM:disk:tape price is 1:10:30 and will go to 1:10:10 • Gilder’s Law: aggregate bandwidth 2X/8 months • Web Rule: cache everything

  35. Filling A Terabyte In A Year ItemItems/TBItems/day 300 KB JPEG 3 M 9,800 1 MB Doc 1 M 2,900 1 hour 256kb/s 9K 26 MP3 audio 1 hour 1.5 Mbp/s 290 .8 MPEG video Gray and Szalay 2003

  36. Schematized Storage • File metaphor too primitive: just a “blob” • Table metaphor too primitive: just “records” • Need metadata describing data context • Format • Providence (author, publisher, citations) • Rights • History • Related documents • in a standard format • XML and XML schema • Data Set is a great example • World is defining standard schema

  37. Keys for Storage • Schematized storage can help organization and research • Schematized XML data sets are a universal way to exchange data • Data are objects, and so, need standard representation for classes and methods

  38. Access Variable and Increasing

  39. Stages in Science • Observational Science Scientist gathers data by direct observation Scientist analyzes data • Analytical Science Scientist builds analytical model Makes predictions • Computational Science Simulate analytical model Validate model and make predictions • Data Exploration Science: data captured by instruments or data generated by simulator processed by software places in a database as files Scientist analyzes database files

  40. Data Avalanche • Better observational instruments and better simulations are producing an avalanche of data

  41. Discoveries Booming • Conceptual discoveries (relativity, quantum mechanics) and theoretical may be inspired by observations • Phenomenological discoveries (dark matter, obscured universe) made by advances in empirical rigor; inspires theories and is motivated by them

  42. Discovery Cycle • New technical capabilities • Observational discoveries • Advances in theory • Application of new theories Phenomenological discoveries: exploring parameter space; making new connections Maxim: understanding complex phenomena requires complex, information rich data and simulations

  43. How to Keep Up • We are looking for “needle in haystacks” (the Higgs particle in dark matter) • Needles are easier than haystacks • Global statistics have poor scaling • As data and computers grow at the same rate, we can only keep up with N log N • Discard notion of optimal: data are fuzzy and solutions are approximations • Require combination of statistics and computer science

  44. Analysis of Databases • Create uniform samples • Filter data • Assemble subsets • Estimate completeness • Censor bad data • Count and build histograms • Generate Monte Carlo subsets • Perform likelihood calculations • Test hypotheses These tasks are best done inside databases (“bring Mohamed to the mountain”)

  45. Go for Smart Data • Too much data to move around, so take analysis to the data • Do all data manipulations inside the database (build custom procedures and functions in the database) • Guaranteed automatic parallelism • Easy to build custom functionality key (pixel processing, temporal and spatial indexing, unified databases and procedures) • Easy to reorganize data (multiple views make optimal analyses) • Scalable to Petabyte data sets

  46. Data Mining Images We can discover new types of phenomena using automated pattern recognition; multiscale analyses

  47. Optimal Statistics • Statistics algorithms scale poorly • Even if data and computers grow at same rate, computers can do at most N log N algorithms • Solutions: assume infinite computational resources assume only source of error is statistical there is a finite sample size Solutions will require combinations of statistics and CS New algorithms will not be worse than N log N

  48. Make Clever Data Structures • Use of tree structures • Fast, approximate algorithms • Must account for computation costs scale level of accuracy shoot for “best” results given …

  49. Hyperdimensionality • Explore parameter spaces in catalog domains through • Clustering analysis (different types and outliers) • Multivariate correlations (find significant, nontrivial correlations in the data) Visualization becomes the key; include interactive visualization and data mining processes

  50. Publishing Data • expectations and standards must change • there will be exponential growth • projects must become more responsible

More Related