dataviseai
Uploaded by
25 SLIDES
14 VUES
0LIKES

LLMs and SQL Agents in Action

DESCRIPTION

This presentation explores:<br><br>u2022 An overview of SQL agents and challenges related to LLMs<br>u2022 Practical applications of SQL agents<br>u2022 Live demos and an implementation roadmap<br><br>Join our webinars to learn more!

1 / 25

Télécharger la présentation

LLMs and SQL Agents in Action

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. From Questions to Queries: LLMs and SQL Agents in Action Speaker: Ihor Protsenko AI Expert at Datavise Moderator: Polina Poliakova CMO at Datavise Hosted by Question Answer

  2. About Datavise We offer end-to-end AI & Data services, from consulting and development to deployment and ongoing support Our Mission Ethics To be the trusted partner in AI adoption, guiding businesses through ethical and impactful AI integration for long-term success Collaboration Security

  3. Introducing Ihor Protsenko Researcher at the MLO Lab at EPFL Data Science & AI Expert at Datavise ex-Grammarly Current research: LLMs and their applications for automated analytics

  4. What to Expect Today Overview of SQL agents and LLM related challenges Practical Implementations Live demos and Q&A

  5. SQL Agents: The Future of Data Interaction Bridging Natural Language and Database Operations Smart Data Interface Contextual Understanding Interactive Analysis Transforms natural language into precise SQL queries Comprehends data relationships and business context Enables dialogue-based data exploration and insights

  6. Practical Applications of SQL Agents Healthcare & Life Sciences Financial & Risk Analysis Legal Analytics - Instant patient cohort analysis - Trial outcome pattern identification - Real-time care pattern monitoring - Statistical significance reporting - Market trend analysis - Semantic grouping of instruments - Natural language market insights - Multi-jurisdiction precedent analysis - Outcome trend identification - Settlement value prediction

  7. Beyond Traditional Analysis SQL Agents VS Human Analysts Semantic Understanding Pattern Recognition Automated Insights Real-Time Processing Cross-Domain Integration - Domain-specific terminology - Multi-dimensional analysis - Data narratives - Continuous monitoring - Multi-domain insights - Intelligent grouping - Temporal patterns - Hypothesis generation - Instant updates - Pattern correlation - Ambiguity resolution - Correlation detection - Statistical significance - Alert generation - Interdisciplinary analysis

  8. The Limitations of Raw LLMs Why We Need Hybrid Approaches Core Limitations GPT-4 MATH Dataset Performance Hybrid Approach Benefits Neural Networks Exact numerical computations Provide intuitive understanding and pattern recognition Raw LLM Accuracy With Code Execution Complex aggregations 53.9% 84.3% Statistical calculations Symbolic Components Enable rigorous proof verification and precise computations Logical operations

  9. SQL as a Symbolic Component Bridging Human Intent and Computational Precision Mathematical Precision Execution Guarantees Optimization Capabilities Enhanced Interpretability Exact numerical computations and Deterministic operations with robust Automatic query optimization and Transparent analytical process and strict decimal precision type checking resource management error diagnosis

  10. Powerful Synergy SQL Provide precise, verifiable computation Result Flexible interface with LLMs mathematical reliability Interpret human intent and handle natural language

  11. Natural Language Understanding Challenges Term Ambiguity Entity Reference "List patients with RA" "Find trials using Advil" ? Rheumatoid Arthriti? ? Refractory Anemia ? Right Atriu? ? Respiratory Acidosis ? Advil (brand name? ? Alternative brands (Motrin, Brufen) ? Ibuprofen (generic? ? Different formulations (oral, topical) Temporal Ambiguity Context-Dependent Time "Show recent transactions" "Get Q4 performance" ? Last 24 hours? ? This month? ? This week? ? Last N transactions? ? Fiscal vs Calendar Q? ? Local vs Global timezone ? Which year? ? Including current period?

  12. Database Complexity Challenges Patient Demographics Lab Orders Medication Administration pt_id ord_id med_cd VARCHAR(10) NUMBER VARCHAR(8) Non-standard ID format Auto-increment Legacy coding dob_dt res_val dos_amt DATE VARCHAR(20) NUMBER(8,2) MM/DD/YYYY Mixed units Various units adm_ts ord_sts adm_dt TIMESTAMP CHAR(1) DATE ISO format Status codes Different timezone

  13. Database Complexity Challenges Scale Challenges Schema Understanding Data Quality ? 500+ interconnected table? ? Non-intuitive namin? ? Inconsistent format? ? Decades of historical dat? ? Complex relationship? ? NULL value semantic? ? Real-time data streams ? Time-dependent associations ? Unit conversions

  14. Query Generation Complexity User Request Generated SQL SELECT 
 p.patient_id,
 p.age,
 bp.reading_value as blood_pressure,
 bp.reading_date,
 m.medication_name,
 m.dosage,
 l.test_name,
 l.result_value,
 l.reference_range
 FROM patients p
 INNER JOIN diagnoses d 
 ON p.patient_id = d.patient_id
 AND d.diagnosis_code IN ('E11.9', 'E11.8') -- Diabetes codes
 INNER JOIN vital_signs bp 
 ON p.patient_id = bp.patient_id
 AND bp.type = 'BP'
 AND bp.reading_date >= DATEADD(MONTH, -3, GETDATE())
 AND bp.systolic > 140
 LEFT JOIN medications m
 ON p.patient_id = m.patient_id
 AND m.active_flag = 'Y'
 LEFT JOIN lab_results l
 ON p.patient_id = l.patient_id
 AND l.result_date >= DATEADD(MONTH, -3, GETDATE())
 WHERE 
 p.age >= 65
 AND bp.reading_value IS NOT NULL
 GROUP BY 
 p.patient_id
 HAVING 
 COUNT(DISTINCT bp.reading_date) >= 2 “Show me all elderly diabetic patients who had elevated blood pressure readings in the last quarter, along with their current medications and recent lab results”

  15. Database Complexity Challenges Structural Complexity Performance Impact Correctness Guarantees ? 5+ table joins require? ? Large patient history table? ? Medical code validatio? ? Temporal conditions across table? ? Multiple date range scan? ? Time range integrit? ? Mixed aggregation requirement? ? Aggregation over wide time span? ? NULL handling for labs/med? ? Complex medical code filtering ? Memory-intensive operations ? Unit consistency checks Future Direction This complexity suggests the need for strategic denormalization - trading some data redundancy for simpler queries and better performance.

  16. Improving Natural Language Understanding Bridging Human Intent and Computational Precision Using embeddings for similarity grouping Unified medical terminology Raw Input Standardized Raw Input Standardized Advil, Nurofen, Brufen Ibuprofen (parent category) BP, blood pressure, pressure blood_pressure (standard term)

  17. Improving Natural Language Understanding Entity-Aware Search Strategy Taxonomy Building Hierarchical organization of related terms Medications → NSAIDs → Ibuprofen → (Brand Names) Index Enhancement Pre-computed relationships and variations RA → {Rheumatoid Arthritis, Right Atrium, Related Terms}

  18. Simplifying Database Complexity Denormalization Example Before: Complex Joins After: Simplified Structure Single Denormalized Table: Multiple Tables: ? patient_health_summary ? patient_demographic? ? vital_sign? ? medication? ? lab_results SELECT 
 patient_id,
 blood_pressure_latest,
 current_medications,
 recent_lab_results
 FROM patient_health_summary
 WHERE has_blood_pressure = true SELECT p.patient_id, v.bp_reading, m.med_name, l.result
 FROM patient_demographics p
 JOIN vital_signs v ON p.id = v.patient_id
 JOIN medications m ON p.id = m.patient_id
 JOIN lab_results l ON p.id = l.patient_id
 WHERE v.measure_type = 'BP'

  19. Simplifying Database Complexity Practical Solutions for Better LLM Integration Schema RAG Implementation Descriptive Naming Convention Strategic Denormalization Embed table and column descriptions Use full words instead of abbreviations Pre-compute common joins Create semantic search for schema elements Include unit information in column names Create purpose-specific flat tables Generate context-aware schema suggestions Add temporal context in names Maintain data freshness timestamps Dynamic schema documentation retrieval Maintain consistent naming patterns Implement efficient update strategies

  20. Entity-Aware Query Generation User Query Processing ?? Entity Extraction Identify key entities from user query “Find all patients who took Advil for chronic pain in last quarter” entities = extract_entities(llm, question)
 entities = entities.split('$') Retrieved Entity Variations ?? Entity Enrichment Retrieve related terms from vector DB medication: proper_nouns = [retriever.get_relevant_documents(entity) 
 for entity in entities] Advil,Ibuprofen,Motrin,Nurofen ?? Query Generation condition: Create comprehensive query covering all variations chronic pain, persistent pain, long-term pain SELECT patient_id, medication_name, administration_date FROM medication_administrations WHERE medication_name IN ('Advil', 'Ibuprofen', 'Motrin') AND indication LIKE '%chronic pain%' AND administration_date >= DATEADD(MONTH, -3, GETDATE()) timeframe: last quarter

  21. Simplifying Complex Queries Structural Solutions for Query Patterns Complex Pattern Simplified Approach Multiple joins and aggregations Using pre-aggregated view -- Complex nested query with multiple joins
 SELECT 
 department,
 AVG(monthly_sales) as avg_sales,
 SUM(transaction_count) as total_transactions
 FROM (
 SELECT 
 d.name as department,
 DATE_TRUNC('month', t.sale_date) as month,
 COUNT(*) as transaction_count,
 SUM(t.amount) as monthly_sales
 FROM transactions t
 JOIN products p ON t.product_id = p.id
 JOIN departments d ON p.dept_id = d.id
 GROUP BY d.name, DATE_TRUNC('month', t.sale_date)
 ) subquery
 GROUP BY department -- Simple query using pre-aggregated view
 SELECT 
 department,
 avg_sales,
 total_transactions
 FROM department_monthly_stats
 WHERE month = CURRENT_MONTH

  22. Simplifying Complex Queries Practical Solutions for Better LLM Integration Pre-Aggregation Strategy Optimized Data Structures Query Templates Monthly sales summaries Denormalized tables Standardized filters Daily performance metrics Materialized views Common joins Rolling averages Summary tables Reusable aggregations Categorized totals Time-based partitions Parameter placeholders

  23. SQL Agent Success Recipes Key Ingredients for Effective Implementation Entity Intelligence Smart Data Structure Query Strategy Robust entity handling and standardizatio? Optimized for LLM interactio? Decomposed and templated approache? ? Comprehensive entity taxonom? ? Vector-based similarity matchin? ? Domain-specific terminology mappin? ? Automated entity expansion ? Strategic denormalizatio? ? Pre-aggregated view? ? Simplified join path? ? Performance-optimized schemas ? Progressive filtering step? ? Reusable query pattern? ? Intermediate result cachin? ? Modular query building

  24. SQL Agent: Implementation Roadmap Start with Entities Optimize Structure Template Patterns Iterate & Improve

  25. Future Directions Technical Evolution of SQL Agents The Path Forward Self-Improving Systems Immediate Focus ? Learning from query pattern? ? Automated schema optimizatio? ? Self-adjusting entity recognition ? Strengthen entity recognitio? ? Refine query strategie? ? Develop feedback loops Long-term Vision ? Autonomous optimizatio? ? Self-improving pattern? ? Predictive performance tuning Advanced Optimization ? Dynamic view materializatio? ? Adaptive query decompositio? ? Intelligent cache management

More Related