1 / 57

Ongoing Computer Engineerin g Research Projects at the Lucian Blaga University of Sibiu

Ongoing Computer Engineerin g Research Projects at the Lucian Blaga University of Sibiu Prof. Lucian VINTAN, PhD-Director Advanced Computer Architecture & Processing Systems Research Lab - http://acaps.ulbsibiu.ro/research.php The Research Team Prof. Lucian VINTAN, PhD – Research Chair

Télécharger la présentation

Ongoing Computer Engineerin g Research Projects at the Lucian Blaga University of Sibiu

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ongoing Computer Engineering Research Projects at the Lucian Blaga University of Sibiu Prof. Lucian VINTAN, PhD-Director Advanced Computer Architecture & Processing Systems Research Lab - http://acaps.ulbsibiu.ro/research.php Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  2. The Research Team • Prof. Lucian VINTAN, PhD – Research Chair • Assoc. Prof. Adrian FLOREA, PhD • Senior Lecturer Daniel MORARIU, PhD • Senior Lecturer Ion MIRONESCU, PhD • Lecturer Arpad GELLERT, PhD • Radu CRETULESCU, PhD student • Horia CALBOREAN, PhD student • Ciprian RADU, PhD student Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  3. Computing hardware 14 Intel Compute nodes (2 processor HS21 blades with quad-core Intel Xeon) 2 Cell Compute nodes (2 processor QS22 blades withIBM PowerXCell 8i Processor ) Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  4. Our current research topics • Anticipatory Techniques in Advanced Processor Architectures • An Automatic Design Space Exploration Framework for Multicore Architecture Optimizations • Optimizing Application Mapping Algorithms for NoCs through a Unified Framework • Optimal Computer Architecture for CFD calculation • Adaptive Meta-classifiers for Text Documents Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  5. Anticipatory Techniques in Advanced Processor Architectures Prof. Lucian VINTAN, PhD Assoc. Prof. Adrian FLOREA, PhD Lecturer Arpad GELLERT, PhD Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  6. FetchBottleneck • Fetch Rateis limited by the basic-blocks’dimension (7-8 instructions in SPEC 2000); Solutions • Trace-Cache & Multiple (M-1) Branch Predictors; • Branch Prediction increases ILP by predicting branch directions and targets andspeculatively processing multiple basic-blocks in parallel; • As instruction issue width and the pipeline depth are getting higher, accurate branch prediction becomes more essential. Some Challenges • Identifying and solving some Difficult-to-Predict Branches (unbiased branches); • Helping the computer architect to better understand branches’ predictability and also if the predictor should be improved related to Difficult-to-Predict Branches. Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  7. Difficult to predict unbiased branches • A difficult-to-predict branch in a certain dynamic context • unbiased • „highly shuffled“. Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  8. Predicting Unbiased Branches • State of the art branch predictors are unable to accurately predict unbiased branches; The problem: • Finding new relevant information that could reduce their entropy instead of developing new predictors; Challenge: • Adequately representing unbiased branches in the feature space! • Accurately Predicting Unbiased Branches is still an Open Problem! Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  9. Random DegreeMetrics Based on: • Hidden Markov Model (HMM) – a strong method to evaluate the predictability of the sequences generated by unbiased branches; • Discrete entropy of the sequences generated by unbiased branches; • Compression rate (Gzip, Huffman) of the sequences generated by unbiased branches. Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  10. Issue Bottleneck (Data-flow) Conventional processing models are limited in their processing speed by the dynamic program’s critical path (Amdahl); 2 Solutions • Dynamic Instruction Reuse (DIR) is a non-speculative technique. • Value Prediction (VP) is a speculative technique. Common issue • Value locality Chalenges • Selective Instruction Reuse (MUL & DIV) • Selective Load Value Prediction (“Critical Loads”) • Exploiting Selective Instruction Reuse and Value Prediction in a Superscalar / Simultaneous Multithreaded (SMT) Architecture to anticipate Long-Latency Instructions Results Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  11. Exploiting Selective Instruction Reuse and Value Prediction in a Superscalar Architecture Selective Instruction Reuse (MUL & DIV) Selective Load Value Prediction (Critical Loads) Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  12. Selective Instruction Reuse and Value Prediction in Simultaneous Multithreaded Architectures Physical Register File ROB Fetch Unit Issue Queue Functional Units I-Cache Decode Branch Predictor Rename Table PC RB LSQ D-Cache LVPT SMT Architecture (M-Sim) enhanced with per Thread RB and LVPT Structures Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  13. Power Estimation Power Models Hardware Configuration Cycle-Level Performance Simulator Hardware Access Counts Performance Estimation SPEC Benchmark Exploiting Selective Instruction Reuse and Value Prediction in a Superscalar Architecture The M-SIM Simulator Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  14. Exploiting Selective Instruction Reuse and Value Prediction in a Superscalar Architecture Relative IPC speedup and relative energy-delay product gain with a Reuse Buffer of 1024 entries, the Trivial Operation Detector, and the Load Value Predictor Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  15. Conclusions and Further Work • Indexing the SLVP table with the memory address instead of the instruction address (PC); • Exploiting an N-value locality instead of 1-value locality; • Generating the thermal maps for the optimal superscalar and SMT configurations (and, if necessary, developing a run-time thermal manager); • Understanding and exploiting instruction reuse and value prediction benefits in a multicore architecture. Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  16. Anticipatory multicore architectures • Anticipatory multicores would significantly reduce the pressure on the interconnection network performance/energy; • Value prediction, multithreading and the cache coherence/consistence mechanisms there are subtle, not well-understood relationships; • data consistency errors consistency violation detection and recovery; • The inconsistency cause: VP might execute out of order some dependent instructions; • Dynamic Instruction Reuse in a multicore system. Reuse Buffers coherence problemscache coherence mechanisms • Details at http://webspace.ulbsibiu.ro/lucian.vintan/html/#11 Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  17. An Automatic Design Space Exploration Framework for Multicore Architecture Optimizations Horia CALBOREAN, PhD student Prof. Lucian VINTAN, PhD Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  18. Multiobjective optimization • Number of (heterogeneous) cores in the processor becomes higher – the systems become more and more complex • More configurations have to be simulated (NP-hard problem) • Time needed to simulate all configurations prohibitive • Performance evaluation has become a multiobjective evaluation Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  19. Solutions • Reducing simulation time • parallel & distributed simulation • sampling simulation • Reducing number of simulations • intelligent multiobjective algorithms Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  20. Proposed framework • We developed FADSE (framework for automatic design space exploration) • Compatible with most of the existing simulators • Portable - implemented in java • Includes many well known multiobjective algorithms • Is able to run simulators and also well known test problems Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  21. Existing tools • Bounded to a certain simulator (Magellan) • Lack portability - bounded to a certain operating system (M3Explorer, Magellan) • Perform design space exploration of small parts of the system (only the cache - Archexplorer) Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  22. FADSE – application architecture Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  23. Features • Parallel simulation (client server model) • Ability to introduce constrains through XML interface • Easily configurable through XML files: • change DSE algorithm, • specify input parameters and their possible values, • specify desired output metrics, etc. Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  24. Our target • Perform an evaluation of the existing algorithms on different simulators • Find out which one performs best • Improve the algorithms - map them on the specific problem of design space exploration Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  25. Conclusions • We have developed a framework which is able to perform automatic design space exploration • Extensible, portable • Many implemented multiobjective algorithms (through the use of jMetal) • Reduces time through parallel &distributed execution of simulators Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  26. Optimizing Application Mapping Algorithms for NoCs through a Unified Framework Ciprian RADU, PhD student Prof. Lucian VINTAN, PhD Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  27. Outline • Introduction • The application mapping problem for NoCs • The relation between application mapping and routing • Evaluating application mapping algorithms for Networks-on-Chip • The framework design • The ns-3 NoC simulator • Automatic Design Space Exploration for Networks-on-Chip • The framework Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  28. The application mapping problem for NoCs Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  29. Application mapping & routing Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  30. Evaluating application mapping algorithms for Networks-on-Chip • Existing application mapping algorithms are currently evaluated on specific NoCs • e.g.: NoCs with 2D mesh topology • Existing comparisons between the algorithms are not made on the same NoC architecture • We propose a unified framework for the evaluation and optimization of application mapping algorithms on different NoC designs Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  31. The framework design • 3 major components: • A module that contains the implementation of different application mapping algorithms; • A network traffic generator; • A Network-on-Chip simulator. Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  32. The framework design flow Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  33. The ns-3 NoC simulator • Based on ns-3, an event driven simulator for Internet systems • Aims for a good accuracy – speed trade-off • Flexible and scalable • Current parameters: • Packet size, packet injection rate, packet injection probability; • Buffer size; • Network size; • Switching mechanism (SAF, VCT, Wormhole); • Routing protocol (XY, YX, SLB, SO); • Network topology (2D mesh, Irvine mesh); • Traffic patterns (bit-complement, bit-reverse, matrix transpose, uniform random). Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  34. Automatic Design Space Exploration for Networks-on-Chip • Motivation • There is no NoC suitable for all kinds of workload • There is an exponential number of possible NoC architectures • Exhaustive DSE is no longer suitable • Automatic DSE uses an heuristic driven exploration of the design space • Disadvantage: near-optimal solutions • Advantage: speed Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  35. Design Space Exploration module Network-on-Chip simulator Configure the simulator Simulation results The framework • Components: • DSE module • NoC simulator • The DSE module determines the parameters of the NoC architecture • Uses algorithms from Artificial Intelligence • The NoC simulator (ns-3 NoC) is automatically configured to simulate the network architecture determined by the DSE module • The simulation results (network performance) help the DSE module at generating a better NoC architecture Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  36. Optimal computer architecture for CFD calculation Senior Lecturer Ion Dan MIRONESCU, PhD Prof. Lucian VINTAN, PhD Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  37. Practical aplication • Modelling and simulation of multiscale, multicomponent, multiphase flow in complex geometry (ongoing projects) for : • optimisation of sugar crystalisation • prediction of the flow properties of polymer based dispers systems (starch and starch fractions, microbial polysacharides) HPC/CFD Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  38. Goals • Speed-up of this application on the given architecture • Finding the optimal manycore architecture  for CFD application (e.g. NoC) Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  39. Method - Lattice Boltzmann (Chirila,2010) Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  40. Method advantages • easy discretization of complex geometry • easy incorporation of “multi” models • easy paralelisation • easy cupling to other scale models (Molecular Dynamics) Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  41. Computational model COMPUTE COMPUTE COMPUTE Ghost data EXCHANGE COMPUTE COMPUTE COMPUTE Local Values COMPUTE COMPUTE COMPUTE Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  42. General-purpose manycore platform What can be used and what must be accounted for: • ILP (super scalar, out of order, branch prediction) • Task and Thread LP (multicore/multiprocessor) • Mixed programming model (shared memory on blade, message passing between blades) • Cache system Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  43. Special purpose many core platform What can be used and what must be accounted for: • SIMD • Task and Thread LP (hardware multithreading, multicore/multiprocessor) • Message passing • Local store model –full user control Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  44. Charm++ • provides a high-level abstraction of a parallel program • cooperating message-driven objects called chares • support for load balancing, fault tolerance, automatic checkpointing • support for all architectures trough a specific low level tier • NAMD MD implementd in charm++ Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  45. Charm++ LB implementation Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  46. Charm++ LB implementation Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  47. DSE Search optimal values for • sites/bloc • blocs (chares)/core, /thread, /blade • communication patterns Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  48. Adaptive Meta-classifiers for Text Documents Prof. Lucian VINTAN, PhD Daniel MORARIU, PhD Radu CRETULESCU, PhD student Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  49. Introduction • We investigated a way to create a new adaptive meta-classifier for classifying text documents in order to increase the classification accuracy. • During the first processing phase (pre-classification) the meta-classifier uses a non-adaptive selector. • In the second phase (classification) we use a feed-forward neural network based on the back-propagation learning method. Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

  50. The architecture of the adaptive meta-classifier M-BP Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php

More Related