1 / 27

Soft Computing & Computational Intelligence

Soft Computing & Computational Intelligence. Biologically inspired computing models Compatible with human expertise/reasoning Intensive numerical computations Data and goal driven Model-free learning Fault tolerant Real world/novel applications. Soft Computing &

whitley
Télécharger la présentation

Soft Computing & Computational Intelligence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Soft Computing & Computational Intelligence • Biologically inspired computing models • Compatible with human expertise/reasoning • Intensive numerical computations • Data and goal driven • Model-free learning • Fault tolerant • Real world/novel applications

  2. Soft Computing & Computational Intelligence • Artificial Neural Networks (ANN) • Fuzzy Logic • Genetic Algorithms (GAs) • Fractals/Chaos • Artificial life • Wavelets • Data mining ANNs GAs FL

  3. Biological Neuron hair cell (sensory transducer) signal flow dendrites synapse axon hillock cell body axon synapse

  4. Artificial Neuron i1 w1 inputs o  output i2 w2 o 1 w3 sigmoid i3 nonlinear transfer function weighted sum of the inputs 0 i1 + w1 w2 i2 + w3 i3 i1 + w1 w2 i2 + w3 i3

  5. Neural Net Yields Weights to Map Inputs to Outputs Neural Network  Molecular weight w11 h w11   Boiling Point H-bonding   Biological response Hydrofobicity  h Electrostatic interactions w23  w34 Observable Projection Molecular Descriptor There are many algorithms that can determine the weights for ANNs

  6. Neural Networks in a Nutshell • A problem can be formulated and represented as a mapping • problem from • Such a map can be realized by an ANN, which is a • framework of basic building blocks of • McCulloch-Pitts neurons • The neural net can be trained to conform with the map • based on samples of the map and will reasonably generalize • to new cases it has not encountered before

  7. Neural Network as a Map

  8. Poisonous/Edible Mushroom Classification Problem 1. cap-shape: bell=b,conical=c,convex=x,flat=f,knobbed=k,sunken=s 2. cap-surface: fibrous=f,grooves=g,scaly=y,smooth=s 3. cap-color: brown=n,buff=b,cinnamon=c,gray=g,green=r, pink=p,purple=u,red=e,white=w,yellow=y 4. bruises?: bruises=t,no=f 5. odor: almond=a,anise=l,creosote=c,fishy=y,foul=f, musty=m,none=n,pungent=p,spicy=s 6. gill-attachment: attached=a,descending=d,free=f,notched=n 7. gill-spacing: close=c,crowded=w,distant=d 8. gill-size: broad=b,narrow=n 9. gill-color: black=k,brown=n,buff=b,chocolate=h,gray=g, green=r,orange=o,pink=p,purple=u,red=e,white=w,yellow=y 10. stalk-shape: enlarging=e,tapering=t 11. stalk-root: bulbous=b,club=c,cup=u,equal=e,rhizomorphs=z,rooted=r,missing=? 12. stalk-surface-above-ring: ibrous=f,scaly=y,silky=k,smooth=s 13. stalk-surface-below-ring: ibrous=f,scaly=y,silky=k,smooth=s 14. stalk-color-above-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o,pink=p,red=e,white=w,yellow=y 15. stalk-color-below-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o, pink=p,red=e,white=w,yellow=y 16. veil-type: partial=p,universal=u 17. veil-color: brown=n,orange=o,white=w,yellow=y 18. ring-number: none=n,one=o,two=t 19. ring-type: cobwebby=c,evanescent=e,flaring=f,large=l,none=n,pendant=p,sheathing=s,zone=z 20. spore-print-color: black=k,brown=n,buff=b,chocolate=h,green=r,orange=o,purple=u,white=w,yellow=y 21. population: abundant=a,clustered=c,numerous=n,scattered=s,several=v,solitary=y 22. habitat: grasses=g,leaves=l,meadows=m,paths=p,urban=u,waste=w,woods=d Relevant Information: This data set includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family (pp. 500-525). Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one. The Guide clearly states that there is no simple rule for determining the edibility of a mushroom; no rule like ``leaflets three, let it be'' for Poisonous Oak and Ivy. Sources: (a) Mushroom records drawn from The Audubon Society Field Guide to North American Mushrooms (1981). G. H. Lincoff (Pres.), New York: Alfred A. Knopf (b) Donor: Jeff Schlimmer (Jeffrey.Schlimmer@a.gp.cs.cmu.edu) (c) Date: 27 April 1987 Number of Instances: 8124; Number of Attributes: 22 (all nominally valued) Mushroom: original data were alphanumeric. replace alphanumeric attributes in order mentioned by 1, 2, 3 etc

  9. x 1 w 1 w 2 S f() y w 3 x 3 w N x N McCulloch-Pitts Neuron

  10. 1 w 2 S w x Output f() 11 11 1 neuron 1 w 3 S w f() 12 11 y 1 w 13 S S f() x f() 2 1 w 22 S 3 w f() 1 21 w 23 S 2 w f() 32 Second hidden layer First hidden layer Neural Network As Collection of M-P Neurons

  11. Kohonen SOM for text retrieval on WWW newsgroups WEBSOM node u21 Click arrows to move to neighboring nodes on the map. Instructions Re: Fuzzy Neural Net References Needed Derek Long , 27 Oct 1995, Lines: 24. Distributed Neural Processing Jon Mark Twomey, 28 Oct 1995, Lines: 12. Distributed Neural Processing Jon Mark Twomey, 28 Oct 1995, Lines: 12. Re: neural-fuzzy TiedNBound, 11 Dec 1995, Lines: 10. New neural net C library available Simon Levy, 2 Feb 1996, Lines: 15. Re: New neural net C library available Michael Glover, Sun, 04 Feb 1996, Lines: 25.

  12. From Guido De Boeck SOM’s for Data Mining To be published (Springer Verlag)

  13. The Data Mining Process data prospecting and surveying transformed data preprocess & transform database selected data make model select Interpretation& rule formulation

  14. Santa Fe Time Series Prediction Competition • 1994 Santa Fe Institute Competition: 1000 data chaotic laser data, predict next 100 data • Competition is described in Time Series Prediction: Forecasting the Future and • Understanding the Past, A. S. Weigend & N. A. Gershenfeld, eds., Addison-Wesley, 1994 • Method: - K-PLS with  = 3 and 24 latent variables • - Used records with 40 past data for training for next point • - Predictions bootstrap on each other for 100 real test data • Entry “wouldhave won” the competition

  15. WISDOM UNDERSTANDING KNOWLEDGE INFORMATION DATA

  16. Docking Ligands is a Nonlinear Problem DDASSL Drug Design and Semi-Supervised Learning

  17. Histograms PIP (Local Ionization Potential) Wavelet Coefficients Electron Density-Derived TAE-Wavelet Descriptors • Surface properties are encoded on 0.002 e/au3 surface Breneman, C.M. and Rhem, M. [1997] J. Comp. Chem., Vol. 18 (2), p. 182-197 • Histograms or wavelet encoded of surface properties give Breneman’s TAE property descriptors • 10x16 wavelet descriptore

  18. Feature Selection (data strip mining) PLS, K-PLS, SVM, ANN Fuzzy Expert System Rules GA or Sensitivity Analysis to select descriptors

  19. Binding affinities to human serum • albumin (HSA): log K’hsa • Gonzalo Colmenarejo, GalaxoSmithKline • J. Med. Chem. 2001, 44, 4370-4378 • 95 molecules, 250-1500+ descriptors • 84 training, 10 testing (1 left out) • 551 Wavelet + PEST + MOE descriptors • Widely different compounds • Acknowledgements: Sean Ekins (Concurrent) • N. Sukumar (Rensselaer)

  20. Microarray Gene Expression Data for Detecting Leukemia • 38 data for training • 36 data for testing • Challenge: select ~10 out of 6000 genes • used sensitivity analysis for feature selection (with Kristin Bennett)

  21. WORK IN PROGRESS GATCAATGAGGTGGACACCAGAGGCGGGGACTTGTAAATAACACTGGGCTGTAGGAGTGA TGGGGTTCACCTCTAATTCTAAGATGGCTAGATAATGCATCTTTCAGGGTTGTGCTTCTA TCTAGAAGGTAGAGCTGTGGTCGTTCAATAAAAGTCCTCAAGAGGTTGGTTAATACGCAT GTTTAATAGTACAGTATGGTGACTATAGTCAACAATAATTTATTGTACATTTTTAAATAG CTAGAAGAAAAGCATTGGGAAGTTTCCAACATGAAGAAAAGATAAATGGTCAAGGGAATG GATATCCTAATTACCCTGATTTGATCATTATGCATTATATACATGAATCAAAATATCACA CATACCTTCAAACTATGTACAAATATTATATACCAATAAAAAATCATCATCATCATCTCC ATCATCACCACCCTCCTCCTCATCACCACCAGCATCACCACCATCATCACCACCACCATC ATCACCACCACCACTGCCATCATCATCACCACCACTGTGCCATCATCATCACCACCACTG TCATTATCACCACCACCATCATCACCAACACCACTGCCATCGTCATCACCACCACTGTCA TTATCACCACCACCATCACCAACATCACCACCACCATTATCACCACCATCAACACCACCA CCCCCATCATCATCATCACTACTACCATCATTACCAGCACCACCACCACTATCACCACCA CCACCACAATCACCATCACCACTATCATCAACATCATCACTACCACCATCACCAACACCA CCATCATTATCACCACCACCACCATCACCAACATCACCACCATCATCATCACCACCATCA CCAAGACCATCATCATCACCATCACCACCAACATCACCACCATCACCAACACCACCATCA CCACCACCACCACCATCATCACCACCACCACCATCATCATCACCACCACCGCCATCATCA TCGCCACCACCATGACCACCACCATCACAACCATCACCACCATCACAACCACCATCATCA CTATCGCTATCACCACCATCACCATTACCACCACCATTACTACAACCATGACCATCACCA CCATCACCACCACCATCACAACGATCACCATCACAGCCACCATCATCACCACCACCACCA CCACCATCACCATCAAACCATCGGCATTATTATTTTTTTAGAATTTTGTTGGGATTCAGT ATCTGCCAAGATACCCATTCTTAAAACATGAAAAAGCAGCTGACCCTCCTGTGGCCCCCT TTTTGGGCAGTCATTGCAGGACCTCATCCCCAAGCAGCAGCTCTGGTGGCATACAGGCAA CCCACCACCAAGGTAGAGGGTAATTGAGCAGAAAAGCCACTTCCTCCAGCAGTTCCCTGT GATCAATGAGGTGGACACCAGAGGCGGGGACTTGTAAATAACACTGGGCTGTAGGAGTGA TGGGGTTCACCTCTAATTCTAAGATGGCTAGATAATGCATCTTTCAGGGTTGTGCTTCTA TCTAGAAGGTAGAGCTGTGGTCGTTCAATAAAAGTCCTCAAGAGGTTGGTTAATACGCAT GTTTAATAGTACAGTATGGTGACTATAGTCAACAATAATTTATTGTACATTTTTAAATAG CTAGAAGAAAAGCATTGGGAAGTTTCCAACATGAAGAAAAGATAAATGGTCAAGGGAATG GATATCCTAATTACCCTGATTTGATCATTATGCATTATATACATGAATCAAAATATCACA CATACCTTCAAACTATGTACAAATATTATATACCAATAAAAAATCATCATCATCATCTCC ATCATCACCACCCTCCTCCTCATCACCACCAGCATCACCACCATCATCACCACCACCATC ATCACCACCACCACTGCCATCATCATCACCACCACTGTGCCATCATCATCACCACCACTG TCATTATCACCACCACCATCATCACCAACACCACTGCCATCGTCATCACCACCACTGTCA TTATCACCACCACCATCACCAACATCACCACCACCATTATCACCACCATCAACACCACCA CCCCCATCATCATCATCACTACTACCATCATTACCAGCACCACCACCACTATCACCACCA CCACCACAATCACCATCACCACTATCATCAACATCATCACTACCACCATCACCAACACCA CCATCATTATCACCACCACCACCATCACCAACATCACCACCATCATCATCACCACCATCA CCAAGACCATCATCATCACCATCACCACCAACATCACCACCATCACCAACACCACCATCA CCACCACCACCACCATCATCACCACCACCACCATCATCATCACCACCACCGCCATCATCA TCGCCACCACCATGACCACCACCATCACAACCATCACCACCATCACAACCACCATCATCA CTATCGCTATCACCACCATCACCATTACCACCACCATTACTACAACCATGACCATCACCA CCATCACCACCACCATCACAACGATCACCATCACAGCCACCATCATCACCACCACCACCA CCACCATCACCATCAAACCATCGGCATTATTATTTTTTTAGAATTTTGTTGGGATTCAGT ATCTGCCAAGATACCCATTCTTAAAACATGAAAAAGCAGCTGACCCTCCTGTGGCCCCCT TTTTGGGCAGTCATTGCAGGACCTCATCCCCAAGCAGCAGCTCTGGTGGCATACAGGCAA CCCACCACCAAGGTAGAGGGTAATTGAGCAGAAAAGCCACTTCCTCCAGCAGTTCCCTGT DDASSL Drug Design and Semi-Supervised Learning

  22. Direct Kernel with Robert Bress and Thanakorn Naenna

  23. with Wunmi Osadik and Walker Land (Binghamton University) Acknowledgement: NSF

  24. Magneto-cardiogram Data with Karsten Sternickel (Cardiomag Inc.) and Boleslaw Szymanski (Rensselaer) Acknowledgemnent: NSF SBIR phase I project

  25. Direct Kernel PLS with 3 Latent Variables

  26. SVMLib Linear PCA SVMLib Direct Kernel PLS

  27. www.drugmining.com Kristin Bennett and Mark Embrechts

More Related