1 / 27

Grid Computing at Texas Tech University using SAS

Grid Computing at Texas Tech University using SAS. Ron Bremer Jerry Perez Phil Smith Peter Westfall* Director, Center for Advanced Analytics and Business Intelligence Texas Tech University. What is Grid Computing?.

lotus
Télécharger la présentation

Grid Computing at Texas Tech University using SAS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grid Computing at Texas Tech University using SAS Ron Bremer Jerry Perez Phil Smith Peter Westfall* Director, Center for Advanced Analytics and Business Intelligence Texas Tech University

  2. What is Grid Computing? • Grid computing means using multiple resources connected by the net to perform demanding calculations. • Example:

  3. Economies of High Performance Computing • Current fastest machine: ~40 Teraflops ($300M) • 10 Tflops Machines (~$50M) • Fastest Cluster at TTU: 0.1 Tflops (~$0.1M) • Speed of a PC 0.003 Tflops (~$.001M)

  4. Underused Resources • Computers are everywhere, mostly idle! • Grid computing leverages unused resources to create an effective “Supercomputer” • Teraflops = (N computers) x (TFLPs per) • For Free! (Almost)

  5. Grid Initiatives at TTU and in Texas • HipCAT – High Performance Computing Across Texas • TIGRE – Texas Internet Grid for Research and Education • SORCER – Service ORienter Computing EviRonment (TTU CS dept.) • SAS/Connect grid

  6. HipCAT • Consortium of Texas institutions working together to use • High performance computing • Clusters • Massive data storage • Scientific visualization • Grid computing. • Director: Phil Smith, Texas Tech University • Members: • Baylor College of Medicine • Rice University • Texas A&M University • Texas Tech University • University of Houston • University of Texas • University of Texas at Austin • University of Texas at Arlington • University of Texas at El Paso • University of Texas Southwestern Medical Center

  7. TIGRE • Texas Internet Grid for Research & Education • Two year project involving: UT, TTU, UH, Rice, and TAMU • Funding announced by the Governor in September • TIGRE will develop a grid software stack and policies and procedures to facilitate Texas grid computing efforts.

  8. Grid Software Products Used at TTU • AVAKI • Globus • Jini Networking Technology • SAS/Connect (MPConnect), %Distribute macro

  9. Benefits of SAS • Ease of Use (relative to other grid products) • Available and applicable for many scientists in their resp. fields • Flexibility • Data base (DATA step, PROC SQL) • Math/Optimization (SAS/IML, SAS/OR) • Stat (SAS/STAT, SAS/ETS)

  10. Problems Amenable to SAS Grid • Replicates of Fundamental task • Fundamental tasks are time consuming, lots of replicates • Examples • Simulation • Astrophysics • Bioinformatics • Ensembles of predictive models

  11. Success Story • Financial Event Studies • Developed simulation tool to detect events • Simulated its performance • 25 hours finished in 40 minutes • Published in J. Fin. Econometrics • Old system: “Sneaker grid”

  12. Another Success Story:Portfolio Analysis • 300 portfolios, 50 securities each by randomly sampling securities from CRSP daily database (7.23 Gigabytes) • 15 models created for each of 50 securities (PROC AUTOREG of SAS/ETS), under 169 treatment settings. • 126,750 models and associated data steps per portfolio. • 500 days of continuous computing time reduced to two weeks.

  13. Notoriety • Web articles appeared in SAS, Grid today, Next-Gen Data forum • Interviewed by DataBase Trends and Applications

  14. SAS Grid Structure • Client connects to host machines • Client sends replicates of fundamental task (“chunks”) to hosts • Hosts process chunks, send back to client • Client combines chunks and summarizes

  15. The SAS Grid

  16. SAS Farm • 100 SAS machines in student lab • 2.66 GhZ per node • All have SAS software installed • SAS “Spawner” must be started on all • Avaki also installed - diagnoses problems

  17. Student Lab

  18. Load Balancing • Automatically supports load balancing by farming out independent tasks to the next available resource. • Students never noticed that their machines were being used!

  19. Simulation-Based Methods PROC MULTTEST of SAS/STAT(first hard-coded bootstrap?)

  20. Simulation-Based Methods, II • Adjust=simulate in GLM and MIXED • Posterior simulation in MIXED

  21. Toy Example – Testing Random Number Generators • Random number generators often fail to provide independent numbers. • Test case: U1, U2 are Uniform on (0,1). • If independent, then E{6(U1-U2)2} = 1.00. • Check: Generate many pairs, report average (should be 1.000000)

  22. Code

  23. Results

  24. Startup (Windows) 1. Start Spawner: C:\Program Files\SAS\SAS 9.1>spawner -i -comamid tcp 2. Activate Spawner: 3. Set batch log in permissions:

  25. The %Distribute Macro • Written by Cheryl Doninger and Randy Tobias • File: http://support.sas.com/rnd/scalability/papers/distribute.zip • Supporting document: http://support.sas.com/rnd/scalability/papers/distConnect0401.pdf

  26. Problems We Have Experienced • Random crashes (client as well as hosts) • Diagnosing errors • I/O problems • Windows Service Pack 2 Firewall • Social issues (grid involves people!)

  27. Future Plans • Support from business and government: • grid-enabled bioinformatics • business intelligence/data mining • Support HPC at TTU and in Texas

More Related