1 / 108

Tópicos Especiais em Aprendizagem

Tópicos Especiais em Aprendizagem. Reinaldo Bianchi Centro Universitário da FEI 2012. 1a. Aula. Parte B. Objetivos desta aula. Apresentar os conceitos básicos de Aprendizado de Máquina : Introdu ção. Definições B ásicas. Áreas de Aplicação. Statistical Machine Learning .

jun
Télécharger la présentation

Tópicos Especiais em Aprendizagem

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tópicos Especiais em Aprendizagem Reinaldo Bianchi Centro Universitário da FEI 2012

  2. 1a. Aula Parte B

  3. Objetivos desta aula • Apresentar os conceitos básicos de Aprendizado de Máquina: • Introdução. • Definições Básicas. • Áreas de Aplicação. • Statistical Machine Learning. • Aula de hoje: Capítulos 1 do Mitchell, 1 do Nilsson e 1 e 2 do Hastie + Wikipedia.

  4. MainApproachesaccordingtoStatistics ExplanationbasedLearning Decisiontrees Case BasedLearning Inductivelearning BayesianLearning NearestNeighbors Neural Networks Support Vector Machines GeneticAlgorithms Regression Clustering ReinforcementLearning Classification AI StatisticsNeural Network

  5. MainApproachesaccordingtoStatistics NearestNeighbors Support Vector Machines Regression Clustering Classification AI StatisticsNeural Network

  6. MainApproachesaccordingtoStatistics NearestNeighbors Regression Clustering Classification AI StatisticsNeural Network

  7. Primeira aula, parte B • Introduction to Statistical Machine Learning: • Basic definitions. • Regression. • Classification.

  8. LivroTexto • The Elements of Statistical Learning • Data Mining, Inference, and Prediction

  9. WhyStatisticalLearning? • “Statisticallearning plays a key role in manyareasofscience, financeandindustry.” • “Thescienceoflearning plays a key role in thefieldsofstatistics, data miningand artificial intelligence, intersectingwithareasofengineeringandother disciplines.”

  10. SML problems Predictwhether a patient, hospitalizeddueto a heartattack, willhave a secondheartattack. Thepredictionisto be basedondemographic, dietandclinicalmeasurementsforthatpatient. Predictthepriceof a stock in 6 monthsfromnow, onthe basis ofcompany performance measuresandeconomic data.

  11. SML problems Identifythenumbers in a handwritten ZIP code, from a digitizedimage. Estimatetheamountofglucose in thebloodof a diabeticperson, fromtheinfraredabsorptionspectrumofthatperson'sblood. Identifytheriskfactorsforprostatecancer, basedonclinicalanddemographic variables.

  12. Examplesof SML problems ProstateCancer StudybyStameyet al. (1989) thatexaminedthecorrelationbetweenthelevelofprostatespecificantigen (PSA) and a numberofclinicalmeasures. Thegoal is to predictthelogof PSA (lpsa) from a numberofmeasurements.

  13. Examplesofsupervisedlearningproblems

  14. Otherexamplesoflearningproblems DNA Microarrays Expression matrix of 6830 genes (rows, only 100 shown) and 64 samples (columns) for the human tumor data. The display is a heat map, ranging from bright green (negative, under expressed) to bright red (positive, over expressed). Missing values are grey.

  15. Other examples of learning problems DNA Microarrays Expression matrix of 6830 genes (rows, only 100 shown) and 64 samples (columns) for the human tumor data. The display is a heat map, ranging from bright green (negative, under expressed) to bright red (positive, over expressed). Missing values are grey.

  16. Other examples of learning problems DNA Microarrays Expression matrix of 6830 genes (rows, only 100 shown) and 64 samples (columns) for the human tumor data. The display is a heat map, ranging from bright green (negative, under expressed) to bright red (positive, over expressed). Missing values are grey.

  17. Other examples of learning problems DNA Microarrays Expression matrix of 6830 genes (rows, only 100 shown) and 64 samples (columns) for the human tumor data. The display is a heat map, ranging from bright green (negative, under expressed) to bright red (positive, over expressed). Missing values are grey. • Task: describe how the data are organised or clustered. • (unsupervised learning)

  18. Overview of Supervised Learning Cap 2 do Hastie

  19. Variable TypesandTerminology • In thestatisticalliteraturetheinputsare oftencalledthepredictors, inputs, and more classicallytheindependent variables. • In thepatternrecognitionliteraturethetermfeaturesispreferred, whichwe use as well. • Theoutputsare calledthe responses, orclassicallythedependent variables.

  20. Variable TypesandTerminology • Theoutputsvary in natureamongtheexamples: • ProstateCancerpredictionexample: • The output is a quantitativemeasurement. • Handwrittendigitexample: • The output isoneof 10 differentdigitclasses: G = {0,1,...,9}

  21. Namingconventionforthepredictiontask • Thedistinction in output type has ledto a namingconventionforthepredictiontasks: • Regressionwhenwepredictquantitativeoutputs. • Classificationwhenwepredictqualitativeoutputs. • Both can be viewed as a task in functionapproximation.

  22. Examplesof SML problems ProstateCancer StudybyStameyet al. (1989) thatexaminedthecorrelationbetweenthelevelofprostatespecificantigen (PSA) and a numberofclinicalmeasures. Thegoal is to predictthelogof PSA (lpsa) from a numberofmeasurements. • Regressionproblem

  23. Examplesofsupervisedlearningproblems • Classificationproblem

  24. Qualitative variables representation • Qualitative variables are representednumerically by codes: • Binary case: iswhenthere are onlytwoclassesorcategories, such as “success” or “failure,” “survived” or “died.” • These are oftenrepresented by a single binarydigitorbit as 0 or 1, orelse by −1 and 1.

  25. Qualitative variables representation • Whenthere are more thantwocategories, Themostcommonlyusedcodingisviadummy variables: • K-levelqualitative variable isrepresented by a vector of K binary variables or bits, onlyoneofwhichis “on” at a time. • Thesenumericcodes are sometimesreferredto as targets.

  26. Variables • Wewilltypically denote aninput variable by thesymbolX. • IfX is a vector, itscomponents can be accessed by subscriptsXj. • Observedvalues are written in lowercase: hencetheithobservedvalueofX iswritten as xi • Quantitativeoutputswill be denoted by Yandqualitativeoutputswill be denoted by G (forgroup).

  27. Two Simple ApproachestoPrediction: LeastSquares (método dos mínimos quadrados) andNearestNeighbors (método dos vizinhosmais próximos)

  28. Linear Methods for Regression • “Linear models were largely developed in the pre-computer age of statistics, but even in today’s computer era there are still good reasons to study and use them.” (Hastie et al.)

  29. Linear Methods for Regression • For prediction purposes they can sometimes outperform non-linear models, especially in situations… • small sample size • low signal-to-noise ratio • sparse data • Transformation of the inputs

  30. Linear ModelsandLeastSquares The linear model has been a mainstayofstatisticsforthepast 30 yearsandremainsoneofitsmostimportanttools. Given a vector ofinputs: wepredictthe output Y viathemodel:

  31. Linear Models Thetermistheintercept, alsoknown as thebias in machinelearning. Oftenitisconvenienttoincludetheconstant variable 1 in X, include in the vector ofcoefficients , andthenwritethe linear model in vector form as aninnerproduct:

  32. Positive Linear Relationship E(y) Regression line Intercept b0 Slope b1 is positive x

  33. Negative Linear Relationship E(y) Regression line Intercept b0 Slope b1 is negative x

  34. No Relationship E(y) Regression line Intercept b0 Slope b1 is 0 x

  35. Fitting the data: Least Squares • How do wefitthe linear modelto a set of training data? • by far themost popular isthemethodofleastsquares. • Pick thecoefficientsβtominimizetheResidual SumofSquares:

  36. Least Squares Method • Least Squares Criterion: • where: • yi = observed value of the dependent variable for the ith observation • yi = estimated value of the dependent variable for the ith observation ^

  37. Fitting the data: Least Squares • RSS(β) is a quadraticfunctionoftheparameters, andhenceitsminimumalwaysexists, but may not be unique. • Thesolutioniseasiesttocharacterize in matrixnotation: • whereXisanN × pmatrixwitheachrowaninput vector • yisan N-vector oftheoutputs

  38. Fitting the data: Least Squares • Differentiating withrespecttoβweget:

  39. Fitting the data: Least Squares • AssumingthatX has full columnrank, we set thefirstderivativetozero: • IfXTXisnonsingular, thentheuniquesolutionisgiven by:

  40. Example: height x shoe size • We wanted to explore the relationship between a person’s height and their shoe size. • We asked to individuals their height and corresponding shoe size. • We believe that a persons shoe size depends upon their height. • The height is independent variable x. • Shoe size is the dependent variable, y.

  41. Example: height x shoe size The following data was collected: Height, x (inches) Shoe size, y Person 1 69 9.5 Person 2 67 8.5 Person 3 71 11.5 Person 4 65 10.5 Person 5 72 11 Person 6 68 7.5 Person 7 74 12 Person 8 65 7 Person 9 66 7.5 Person 10 72 13

  42. Example: height x shoe size

  43. Least Squares Method(forma matricial) Theuniquesolutionisgiven by: Oftenitisconvenienttoincludetheconstant variable 1 in X, include in the vector ofcoefficients

  44. X without Bias β0

  45. X with Bias β0

  46. XT

  47. XTX

  48. XTX n

  49. XTy

  50. XTy

More Related