1 / 0

ESPDS NDE Build 4 and 5 Design Description Build Content Review (BCR)

ESPDS NDE Build 4 and 5 Design Description Build Content Review (BCR). NDA-SE13-DOC-1.0 December 13, 2011 Meet me 1-866-762-5577 pc 173812# Web Ex: https://mmancusa.webex.com/mmancusa/j.php?ED=175042952&UID=493176402&PW=NNmY4ZjMwNjJk&RT=MiMxMQ%3D%3D. Introductions. Government Team

greg
Télécharger la présentation

ESPDS NDE Build 4 and 5 Design Description Build Content Review (BCR)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ESPDS NDEBuild 4 and 5 Design DescriptionBuild Content Review (BCR)

    NDA-SE13-DOC-1.0 December 13, 2011 Meet me 1-866-762-5577 pc 173812# Web Ex: https://mmancusa.webex.com/mmancusa/j.php?ED=175042952&UID=493176402&PW=NNmY4ZjMwNjJk&RT=MiMxMQ%3D%3D
  2. Introductions Government Team GeofGoodrum – NDE Architect Tom Schott – Polar Products Manager Contractor Team Dan Beall – Program Manager Richard Sikorski – Solers Project Manager Peter MacHarrie – Chief Architect Lou Fenichel – Software Development Lead
  3. Purpose of this BCR Why are we here? To obtain approval to create the Customer Services and Monitoring and Control Builds that complete NDE’s Data Handling System (DHS) What are we going to see? Functional descriptions, graphical user interface (GUI) mockups, operational scenarios, and programmatic details
  4. Agenda Background Customer Services Product Management Operations Management Reporting Architecture Security Testing Schedule Stakeholder Requirements Met Government Approval to Proceed
  5. Background BCR Focus Who uses NDE? What does Builds 4/5 Add? What does it look like?
  6. Background Initial NPP Launch Support Build 1: Ingest Build 2: Product Generation Build 3: Distribution Supporting Software/Configurations Builds 4 and 5 Deliver concurrently Contain Customer Services and Monitoring and Control capabilities and Build 1,2, & 3 GUIs Complete the NDE DHS
  7. Background Paradigm shift – NDE is a FRAMEWORK for integrating multiple product systems into a single Enterprise Major Activities Product System Integration into NDE (NUCAPS, MiRS, etc.) Configurable Ingest Data Providers (NPP, DDS, JPSS, etc.) Customer Registration and configurable Distribution across multiple Product Systems Operational Monitoring and Control across multiple Product Systems
  8. BCR Focus What does NDE do for the Customer? Gets the customer the products they want in a highly reliable fashion What does NDE do for OSPO? Makes providing this data from NPP & JPSS possible, efficient, accountable, and reliable
  9. Who Uses NDE?
  10. Who uses NDE? Sys Admins Help Desk NDE TALs Customers PALs Key: OSPO Person to Person Person to Machine
  11. What Does Build 4/5 Add? A portal for Customers (Customer Services) An internal portal for TALs , PALs and Help Desk (Monitor & Control) A Data Warehouse for statistics and reporting Metrics gathering and GUIs for Build 1, 2, and 3
  12. What Does It Look Like? Internal users only (i.e. not customers) VPN access via Firefox (if outside of NSOF) Will be tested on IE8+ and Firefox3.6+ NDE_OPS NDE Data Handling System Annie User | Help | Logout
  13. Menu Contents System Product Management Ingest Product Generation Distribution Reports
  14. Generic Status Summary Table NDE_OPS NDE Data Handling System Annie User | Help | Logout Generic Table Z A A Z ✓ ✓ ✓ ✓ ✓ of 4 1
  15. Generic Status Summary Table NDE_OPS NDE Data Handling System Annie User | Help | Logout Generic Table Z Z A A A A Z Z Enter filter text… Enter filter value … Enter filter value … Enter filter value … of 4 1
  16. Sample List and View Details NDE_OPS NDE Data Handling System Annie User | Help | Logout Sample Details Viewer Page Selected fields are editable Save edited values Or reset to original values Reset Save
  17. Agenda Background Customer Services Product Management Operations Management Reporting Architecture Security Testing Schedule Stakeholder Requirements Met Government Approval to Proceed
  18. Customer Services Register Customer Distribute Inform
  19. Register Customer Customer Registration Follows the ESPC CCB and CCR process for approving customer access After access is approved, Technical Area Lead (TAL): Configures a Customer Host for ftps push, if needed Enters Customer Profile and Points of Contact Customer is notified of activation and is required to change password
  20. Register Customer NDE_OPS NDE Data Handling System Annie User | Help | Logout Customer Registration Request Basic Information Point of Contact List Customer ID: Password: Confirm Password: Trusted: Approval Threshold: Daytime Operator: nco_operator@noaa.gov remove Nighttime Operator: nco_operator_night@noaa.gov remove nco_ops *********** *********** ✓ ✓ 100 GBs/Day Role: Email: Alt Email: Organization: Phone: Fax: Primary POC: Trusted Users are not affected by Data Denial Lead Operator nco_lead_ops@noaa.gov NCO (301) 555-5555 (301) 555-5556 Add to List Reset Submit Cancel
  21. Register Customer NDE_OPS NDE Data Handling System Annie User | Help | Logout Customer Profiles Reset Save Add
  22. Distribute Standing Subscription Requests Customer requests data from PAL/Customer Liaison Define functional characteristics of the data flow (products, data volume, latency & timeliness targets) DDAM (Data Distribution Accounts Manager) Approves customer request Initiates ESPC approval process Defines the interface (SLA, External Data Host, firewall rules, username/authentication) Defines the subscription (protocol, push/pull, notification, etc) Verifies testing in NDE POP Mode ESPC Approval to promote to OPS Authorized User (NOT Customer) enters the data
  23. Distribute Ad hoc Subscription Requests Customer must be registered Customer and Customer Liaison work out subscription details Subscription includes an end date ESPC approval not required, preapproved up to preset threshold Authorized User (NOT Customer) enters the data Retroactive Delivery Supported Subscription can be backdated and those products still available will be sent
  24. Add Customer Push Host NDE_OPS NDE Data Handling System Annie User | Help | Logout Add Customer Push Host Basic Information Host name: Address: Access Type: Description: Login ID: Password: Confirm Password: Max Connections: nde1.star.nesdis.noaa.gov 140.90.47.10 FTP STAR Test Machine st@rU$er *********** *********** 5 Reset Submit Cancel
  25. Distribute – Customer Data GUIs NDE_OPS NDE Data Handling System Annie User | Help | Logout Customer Push Hosts Reset Save Add
  26. Distribute – Customer Data GUIs NDE_OPS NDE Data Handling System Annie User | Help | Logout Customer Subscription Request Basic Information Additional Parameters Customer: Product: Type: Primary Host: Primary Dir: Secondary Host: Secondary Dir: Start date time: End date time: Size: Compression: Checksum: Obs Latency Target: Sys Latency Target: Do Not Deliver: Spatial Qualifier: Other filters: Estimated files/day: Estimated xfer/day: Estimated MB/day: 28-OCT-11 05:48:00.00000 AM -- Priority: /tmp/primary_pushdir secondary.host.external.cust primary.host.external.cust PUSH ATMS_BUFR crc32 nco_ops High Medium zip Email files 120 Minutes 30 Minutes /tmp/secondary_pushdir Minutes after Observation 480 Notifications (50.7834,-94.5339); (21.0307,-63.948) Type: Frequency: Address: Delivery options: + fqs:N_Percent_Missing_Data:range:0,50; every 50 1200 Annie.user@noaa.gov 1200 1500 Reset Save Cancel
  27. Distribute – Customer Data GUIs Geospatial filters Gazetteer AOI NDE_OPS NDE Data Handling System Polar Map Toggle Annie User | Help | Logout + Customer Subscription Request Basic Information Additional Parameters Customer: Product: Type: Primary Host: Primary Dir: Secondary Host: Secondary Dir: Start date time: End date time: Size: Compression: Checksum: Observation Latency Target: System Latency Target: Do Not Deliver After: Spatial Qualifier: Other filters: Estimated files/day: Estimated xfer/day: Estimated MB/day: 28-OCT-11 05:48:00.00000 AM -- Priority: /tmp/primary_pushdir Eastern United States secondary.host.external.cust crc32 nco_ops ATMS_BUFR PUSH primary.host.external.cust High zip Medium Email files 120 Minutes 30 Minutes /tmp/secondary_pushdir Minutes after Observation 480 Notifications (50.7834,-94.5339); (21.0307,-63.948) Type: Frequency: Address: Delivery options: + fqs:N_Percent_Missing_Data:range:0,50; every 50 1200 Top-left: (50.7834,-94.5339); Bottom-right: (21.0307,-63.948) Cancel Done Annie.user@noaa.gov 1200 1500 Reset Save Cancel
  28. Distribute – Customer Data GUIs NDE_OPS NDE Data Handling System Annie User | Help | Logout Customer Subscription Request Basic Information Additional Parameters Customer: Product: Type: Primary Host: Primary Dir: Secondary Host: Secondary Dir: Start date time: End date time: Size: Compression: Checksum: Observation Latency Target: System Latency Target: Do Not Deliver After: Spatial Qualifier: Other filters: Estimated files/day: Estimated xfer/day: Estimated MB/day: + 28-OCT-11 05:48:00.00000 AM Add filters -- Current Filters: Priority: fqs:N_Percent_Missing_Data:range:0,50; remove fm:DayNightFlag:list:Day; remove /tmp/primary_pushdir secondary.host.external.cust ATMS_BUFR PUSH primary.host.external.cust crc32 High nco_ops zip Medium Email files 120 Minutes DayNightFlag fm list 30 Minutes /tmp/secondary_pushdir + Minutes after Observation 480 Notifications (50.7834,-94.5339); (21.0307,-63.948) Done Type: Frequency: Address: Delivery options: + fqs:N_Percent_Missing_Data:range:0,50; every 50 1200 Annie.user@noaa.gov 1200 1500 Reset Save Cancel
  29. Distribute – Customer Data GUIs NDE_OPS NDE Data Handling System Annie User | Help | Logout Subscription Definitions Reset Save Add
  30. Distribute – Customer Data GUIs NDE_OPS NDE Data Handling System Annie User | Help | Logout Subscription Definitions Reset Save Add
  31. Inform Provides the Customer the ability to check on the status of their data subscriptions System level status System/Customer Notification Customer specific subscription status Details of delivered and in process product deliveries Customer Portal is in the public DMZ User authentication is managed via https Portal compliant with NOAA web policies identified at: http://www.cio.noaa.gov/Policy_Programs/webpolicies.html
  32. NDE Customer Portal Login NDE Customer Portal Portal Login Customer ID Password Log In
  33. Welcome Page (Nominal Ops) NDE Customer Portal Click here to update password Some User | Log Out | Help Suspended Subscriptions NDE Status: Nominal View All View All View All View All View All Delivery Status Failed Deliveries Late Deliveries Late Deliveries Delivery Notifications Failed Delivery Notifications Host Status Down Hosts
  34. Customer Password Change NDE Customer Portal Some User | Log Out | Help Inactive Subscriptions NDE Status: Nominal View All View All View All View All View All Delivery Status Failed Deliveries + Update Password Late Deliveries Current Password: New Password: Confirm New Password: ************* Late Deliveries ************* ************* Delivery Notifications Cancel Submit Failed Delivery Notificati Host Status Down Hosts
  35. All Subscription View NDE Customer Portal Some User | Log Out | Help NDE Status: Nominal Subscriptions Back Delivery Status Late Deliveries Delivery Notifications Host Status
  36. Product Drill Down NDE Customer Portal Some User | Log Out | Help NDE Status: Nominal Subscriptions Back + Product Description Delivery Status Late Deliveries Delivery Notifications Host Status
  37. Product Drill Down NDE Customer Portal Some User | Log Out | Help NDE Status: Nominal Subscriptions Back + Product Description Delivery Status Late Deliveries Delivery Notifications Host Status
  38. Welcome Page (Degraded Ops) NDE Customer Portal Some User | Log Out | Help ID: 1 Date: 2011-12-25 12:00:00 Title: Customer Notice Description: Unexpected Hardware Maintenance has resulted in one available PGS Processor node. Many subscriptions have been suspended to accommodate the outage. Call the ESPC Help Desk (301-817-9999) for any urgent matters. NDE Status: Degraded View All View All View All Delivery Status SuspendedSubscriptions Late Deliveries Failed Deliveries Delivery Notifications Late Deliveries Host Status
  39. All Distribution Jobs View NDE Customer Portal Some User | Log Out | Help Deliveries NDE Status: Nominal Back Delivery Status Late Deliveries Delivery Notifications Host Status
  40. All Notification Jobs View NDE Customer Portal Some User | Log Out | Help Delivery Notifications NDE Status: Nominal Back Delivery Status Late Deliveries Delivery Notifications Host Status
  41. Customer Push Hosts View NDE Customer Portal Some User | Log Out | Help Hosts NDE Status: Nominal Back Delivery Status Late Deliveries Delivery Notifications Host Status
  42. Agenda Background Customer Services Product Management Operations Management Reporting Architecture Security Testing Schedule Stakeholder Requirements Met Government Approval to Proceed
  43. Product Management Product Management encompasses Products received from External Data Providers Products generated in-house – NOAA Unique Products (NUPs) and Tailored Products (TPs) Products are specified via CM-controlled XML files, tested extensively, and propagated to Operations via Patch NDE Sustainment performs all Product Management – DPA and NALI functionality
  44. Product Management XML File Development and CM Process Product System Integration Ingest – every Product has an XML File Define external data provider if required Product Generation (NUPs and TPs) Algorithms and Production Rules Operational GUIs Product Quality Monitoring Gap Check/World Wind Reconciliations Product Quality Summary Report
  45. NUP/TP via DAP Transition to Ops STAR Integration (SADIE, IPT) Functional Test (ALGTEST, Test) Performance Test (POP, Test) CCB (Approve Promotion to OPS) XML Files Install into OPS NDE Patch DAP Package contains all SPSRB documentation (build instructions, release notes) Patch is created from ‘built’ software
  46. NDE Patch Contents for Product Sys DAP Development Test OPS CCB Approval NDE Patch XML Authoring Tool NDE Patch XML File Validation Install Install DB DB DB PGS Nodes PGS Nodes Product Description tables Algorithm/Production Rule tables External Product Data Source tables
  47. Product Management GUIs Product Management GUIs are primarily read-only Provide limited capability to alter selected operational attributes Recommend modification access for TALs only Read only access is available to all roles (does not include customers)
  48. Product Management NDE_OPS NDE Data Handling System Annie User | Help | Logout Product Data Source Hosts Reset Save Add
  49. Product Management NDE_OPS NDE Data Handling System Annie User | Help | Logout Product Description Reset Save
  50. Product Management NDE_OPS NDE Data Handling System Annie User | Help | Logout Product Description Reset Save
  51. Product Management NDE_OPS NDE Data Handling System Annie User | Help | Logout Product Description Reset Save
  52. Product Management NDE_OPS NDE Data Handling System Annie User | Help | Logout Algorithm Description Reset Save
  53. Product Management NDE_OPS NDE Data Handling System Annie User | Help | Logout Algorithm Description Reset Save
  54. Product Management NDE_OPS NDE Data Handling System Annie User | Help | Logout Algorithm to Node Algorithm Node Reset Save
  55. Product Management NDE_OPS NDE Data Handling System Annie User | Help | Logout Node to Algorithm Algorithm Node Reset Save
  56. Product Management NDE_OPS NDE Data Handling System Annie User | Help | Logout Production Rules Reset Save
  57. Product Management NDE_OPS NDE Data Handling System Annie User | Help | Logout Production Rules Reset Save
  58. Product Management NDE_OPS NDE Data Handling System Annie User | Help | Logout Production Rules Reset Save
  59. Enterprise Measures NDE_OPS NDE Data Handling System Annie User | Help | Logout All Measurements Links to Product Content Detail of 4 1
  60. Enterprise Measures NDE_OPS NDE Data Handling System Annie User | Help | Logout Product Content Detail
  61. Product Quality Management Gap Check To identify missing granule coverage Check Chart World Wind Reconciliations IDPS CLASS Product Quality Summary Report
  62. Gap Check Chart NDE_OPS NDE Data Handling System Annie User | Help | Logout Back Product Coverage of 96 1
  63. NASA World Wind Gap Checker NDE_OPS NDE Data Handling System Annie User | Help | Logout ✓ VIIRS M15 EDR CONUS Production Rule Spatial Area ✓
  64. Reconciliations IDPS Data Delivery Report files received hourly (per NPOESS to NOAA ICD) Cataloged is checked for each file Warning Alerts are generated CLASS Daily File Reconciliation Report is generated (per NDE to CLASS ICD) CLASS personnel provides a notification to ESPC Operators for ‘missing’ files ESPC Operations takes action based on the number/type of missing files
  65. Product Management NDE_OPS NDE Data Handling System Annie User | Help | Logout Product Quality Summary
  66. Agenda Background Customer Services Product Management Operations Management Reporting Architecture Security Testing Schedule Stakeholder Requirements Met Government Approval to Proceed
  67. Operations Roles Help Desk (Operator) – Read only access to system status TALs (NDE Admin/Engineers) Start and Stop Modify System Element Configuration Process Help Desk logs Trouble Ticket (TT) and posts in Ops Log In a critical situation also calls TAL TALs able to change system element configuration without direct CCB approval for emergencies – otherwise approval required All Changes via GUI logged by time and User ID
  68. Operations Starting the System Is it working? Operators Log Scenarios - Is something wrong? Ingest Product Generation Distribution Infrastructure Quiesce / Stop the System Toggle Data Denial
  69. NDE Dashboard NDE_OPS NDE Data Handling System Annie User | Help | Logout Product Generation Distribution Ingest Clean-up --m ago 41.6/86.2 TBs SAN 48% used Stats Period most recent 0 0 0 0 0 0 Backlog Throughput Backlog Throughput Backlog Throughput 1 hr FATAL - 0 WARNING - 0 SEVERE - 0 Customer Hosts Notices NPP Maneuver at 11/27 14:55 EST. No data 14:45 to 15:30 EST 1/4 Log/Alert Summaries
  70. NDE Resource Control Panel NDE_OPS NDE Data Handling System Annie User | Help | Logout Resource Control STOP Server STOP Loop STOP Server STOP Loop STOP Server STOP Server STOP Loop STOP Loop STOP Loop START Server START Loop START Loop START Loop
  71. NDE Resource Control Panel NDE_OPS NDE Data Handling System Annie User | Help | Logout Resource Control STOP Server STOP Loop STOP Server STOP Loop STOP Server STOP Loop STOP Server START Loop STOP Server STOP Loop START Server START Loop STOP Server STOP Loop START Server START Loop
  72. NDE Resource Control Panel NDE_OPS NDE Data Handling System Annie User | Help | Logout Resource Control STOP Loop START Server START Loop
  73. NDE Resource Control Panel NDE_OPS NDE Data Handling System Annie User | Help | Logout Resource Control + Modify Job Boxes – n06apn2 STOP Loop START Server START Loop Save Cancel
  74. NDE Resource Control Panel NDE_OPS NDE Data Handling System Annie User | Help | Logout Resource Control STOP Server STOP Loop STOP Loop STOP Server STOP Loop START Loop
  75. NDE Resource Control Panel NDE_OPS NDE Data Handling System Annie User | Help | Logout Resource Control Run Manually Run Manually
  76. Operator – Is it working Main Dashboard read primarily by HelpDesk TAL has the option to use it but will often skip to other screens Dashboard provides an at-a-glance health and status of the NDE System
  77. NDE Dashboard - Nominal NDE_OPS NDE Data Handling System Annie User | Help | Logout Product Generation Distribution Ingest Clean-up 24m ago 41.6/86.2 TBs 10,449 9,731 9,297 SAN 48% used Stats Period 1,394 837 most recent 639 Backlog Throughput Backlog Throughput Backlog Throughput 1 hr FATAL - 0 SEVERE - 0 WARNING - 18 Customer Hosts Notices NPP Maneuver at 11/27 14:55 EST. No data 14:45 to 15:30 EST 1/4 Log/Alert Summaries
  78. Operators’ Log Ops/Help Desk to record any incidents, shift report, etc Exact rules per EPDS policy Log entries include the time and user automatically Entries can have tags to aid in searching later – select one or more tags in the system or create a new one Entries stored in the database Ops Log GUI provides for searching and sorting options By tags By time range By user By text found in the log Any combination of the above
  79. Operators’ Log NDE_OPS NDE Data Handling System Annie User | Help | Logout Operators’ Log + + + + Fatal Error NWS DIS Error Push Host Log message: Log it! Tags:
  80. Operations – Is something wrong? Example Scenario The remaining scenarios are located in the backup slides Ingest Failed Count is high Old granule files in incoming input directory Not getting ancillary data (DDS) Ingest is slow Ingest Resource down Available Drill-Downs
  81. Ingest Scenarios - Background HelpDesk is watching the Dashboard Indicators show status of Ingest servers (2), the retransmit application, and any product data sources (ESPC/DDS) Annotated status bars show Backlog and Throughput Failed Count – Ingest requests receiving a checksum mismatch, corrupt data file, corrupt crc file, duplicate, or unknown product message Old Count – Files older than 1 orbit still residing in the Landing Zone (crc/notification not received, crc corrupt, duplicate, or other ingest exception) Drill-Downs Backlog/Throughput – By Data Source, shows status and ‘last hour’ granule file counts by Ingest server Failed/Old Granule Files –Time/Type of failure, source, checksum (if applicable) Product Granule File Counts by Orbit – Critical product counts, latest displayed first
  82. Ingest Scenario #1 – Failed Count HelpDesk determines count on Dashboard is rising – any non-zero count warrants investigation Failed/Old Drill-Down reachedby clicking on Failed Count hyperlink HelpDesk writes Trouble Ticket (TT), informs TAL TAL investigates/takes corrective action
  83. IS#1 - NDE Dashboard NDE_OPS NDE Data Handling System Ingest Scenario #1 Annie User|Help Product Generation Distribution Ingest Clean-up 24m ago 41.6/86.2 TBs 9,324 9,831 7,731 SAN 48% used 3,019 2,742 Stats Period 2,503 most recent Backlog Throughput Backlog Throughput Backlog Throughput 1 hr FATAL - 2 WARNING - 18 SEVERE - 3 Customer Hosts Notices NPP Maneuver at 11/27 14:55 EST. No data 14:45 to 15:30 EST 1/4 Log/Alert Summaries
  84. IS#1 - Failed Ingest Drill Down NDE_OPS NDE Data Handling System Annie User | Help | Logout Failed Granules of 4 1
  85. IS#1 - TAL Actions If Checksum Mismatch If retransmit count > 0 then writes TT, Calls IDPS If retransmit count = 0 then writes TT, Reorders file If Unknown Product Checks filename against registered file patterns for similarities Unregistered – writes TT, assigns to DPA Unmatched but Similar – writes TT, further investigation (Pop the Hood) warranted If Duplicate examines counts/times, determines whether to Pop the Hood If Corrupt file/crc examines counts/times, determines whether to Pop the Hood
  86. Old Granule Drill Down NDE_OPS NDE Data Handling System Annie User | Help | Logout Total Count: 257 Old Granules in Incoming Input of 4 1
  87. Backlog & Throughput Drill Down NDE_OPS NDE Data Handling System Annie User | Help | Logout Backlog & Throughput
  88. Granule File Counts By Orbit NDE_OPS NDE Data Handling System Annie User | Help | Logout Granule File Counts By Orbit
  89. PGS Scenarios - Background Product Generation Scenarios Drill Downs Scenarios in Backup Slides Help Desk is watching the Dashboard Indicators show status of PGS factory servers (2), and PGS Processors (n) Annotated status bars show Backlog and Throughput Available Counts Failures – Non-zero return code Excessive CPU – Job exceeds CPU threshold Old – Job remains in Queued state Expired – Job Spec exceeds ‘wait for input’ time Primary Drill-Downs Product Rule Summary – Status/Job counts summarized by product type Job/Job Spec – Specific details of Jobs and Job Specs Product Coverage – Product file indicator by coverage time
  90. Product Generation Summary NDE_OPS NDE Data Handling System Annie User | Help | Logout Back Product Generation Status Summary Links to Job Box Configuration Drill Down Links to PGS Job/Job Spec Drill Down w/ filter on Production Rule and Status Links to Production Rule Definition Drill Down of 4 1
  91. PGS Job/Job Spec Drill Down NDE_OPS NDE Data Handling System Annie User | Help | Logout Back PGS Job Spec & Job Status of 4 1
  92. PGS Job/Job Spec Drill Down NDE_OPS NDE Data Handling System Annie User | Help | Logout Back PGS Job Spec & Job Status of 4 1
  93. PGS Job/Job Spec Drill Down NDE_OPS NDE Data Handling System Annie User | Help | Logout + Job Spec & Job Details Back PGS Job Spec & Job Status Job Process ID 56778 Update Priority Requeue Job Terminate Expire Job Done of 4 1
  94. Product Coverage Drill Down NDE_OPS NDE Data Handling System Annie User | Help | Logout Back Product Coverage of 96 1
  95. Operations – Is something wrong? Distribution Distribution Resources are down Customer not getting data Push Pull Customer not being notified Distribution throughput is low
  96. Distribution Summary NDE_OPS NDE Data Handling System Annie User | Help | Logout Back Distribution Summary Links to Distribution Request Drill Down w/ filter on selected status Links to Subscription Coverage Drill Down
  97. Distribution Request NDE_OPS NDE Data Handling System Annie User | Help | Logout Back Distribution Request Links to Subscription Drill Down Links to Distribution Request Details Modal Window
  98. Distribution Request NDE_OPS NDE Data Handling System Annie User | Help | Logout Back Distribution Request Links to Notification Drill Down
  99. Distribution Request NDE_OPS NDE Data Handling System Annie User | Help | Logout + Back Distribution Request Distribution Request Details DPR Status Priority Size DJ Status COMPLETE FAILED Cancel Save
  100. Subscription Coverage NDE_OPS NDE Data Handling System Annie User | Help | Logout Links to Gap Checker Drill Down w/ filter on that Product Back NCO_OPS Subscription Coverage Create a missing job by clicking here Of 96 1
  101. Notification NDE_OPS NDE Data Handling System Annie User | Help | Logout Back Notification Jobs
  102. Notification NDE_OPS NDE Data Handling System Annie User | Help | Logout + Back Notification Jobs Notification Job Details Notification Status FAILED Cancel Save
  103. Operations – Is something wrong? Infrastructure Cleanup SAN Cleanup not working Internal RAID Cleanup Processing Nodes not working Database Cleanup not working All cases require TAL to Pop the Hood SAN Storage Alert Pop the Hood RAID Processing Node Alert Pop the Hood
  104. Operations – Quiesce/Stop System To cleanly quiesce the system: Suspend the IDPS subscription(s) to NDE Allow Ingest to work off its current backlog Shutdown Ingest (loop and JBoss Server) in the Resource Control Panel Restart IDPS subscriptions to ensure no data loss from IDPS As PGS and DIS work off their backlogs, shut them down as well Perform any necessary maintenance Bring the system back up per standard procedure
  105. NDE System Notices NDE_OPS NDE Data Handling System Annie User | Help | Logout System Status Turn ON Turn OFF Internal Notices + + Customer Notices + Title Description Internal Customer +
  106. Agenda Background Customer Services Product Management Operations Management Reporting Architecture Security Testing Schedule Stakeholder Requirements Met Government Approval to Proceed
  107. Management Reports This presents a recommendation for an initial set of reports Recommend Management Reporting IPT to finalize Reports by February 28, 2012. Report Creation and Access Data Sources Reporting Requirements Throughput Performance Latency Availability
  108. Report Creation Developed and Delivered from NDE Development Environment as Software Patch Initial report selection delivered with Build 4/5 Report design based on XML inputs Reports will be generated and distributed automatically on set schedule Running a given report on demand and/or changing the report schedule will be handled via Jaspersoft
  109. Report Access Standard periodic reports will be rendered in PDF and/or Excel and automatically sent to appropriate stakeholders Reports on demand available to read only to all internal NDE Users via NDE GUI
  110. Data Sources Dashboard and Drill Downs from NDE Operational Database Reports from hourly summaries extracted to a separate data warehouse Operational Database contains 39 days worth of transaction data to support resolution of issues identified in the reports Data warehouse contains history of all summarized data. Data can be summarized Hourly, daily, weekly, monthly, quarterly, yearly, etc. By product type, product subtype, product, and customer Processing Host, etc.
  111. Product Type By Hour
  112. Product SubType By Hour
  113. Product SubType by Day
  114. Jobs vs. Response Time Hourly
  115. PGS Response Time
  116. DIS Response Time
  117. Data Age prior to Ingest
  118. Latency Report
  119. Availability Data Number of Minutes
  120. Agenda Background Customer Services Product Management Operations Management Reporting Architecture Security Testing Schedule Stakeholder Requirements Met Government Approval to Proceed
  121. Architecture COTS Evaluations XML Editors GUI Frameworks Reporting/Visualization Tools Metrics Gathering and Reporting Implementation GUI Framework Reporting Metrics Gathering and Reporting
  122. COTS Evaluation - XML Editors May use Solers Recommendation of XML Spy if needed
  123. Candidates Frameworks Apache Struts Spring MVC JBoss Seam Google Web Toolkit Sencha Ext GWT Eliminated due to high configuration overhead. Eliminated b/c required repositories not available offline.
  124. Seam Vs. Ext GWT
  125. Decision: Sencha Ext GWT Works with existing software platforms Hiberate, JBoss AS Interactive, customizable charting Eliminates the need for a separate BI Tool for Dashboards Extensive, customizable widgets Out-of-the-box grids, filtering, data binding, etc. Provides consistency across ESPDS Code re-use, knowledge transfer,lessons learned between NDE and PDA
  126. Candidate Reporting Engines Crystal Reports JasperSoft Enterprise Actuate BIRT Jasper Reports Business Intelligence Reporting Tool (BIRT) Eliminated due to high cost ($20k and up) with limited added value for NDE (Sencha provides for the Dashboard)
  127. BIRT vs. JasperReports
  128. Decision: JasperReports Meets all requirements at the right price (free) Works with existing software platforms Hiberate, Superior integration with java Larger Community and Userbase Increased chance of finding quick solutions to roadblocks Reduced chance of finding roadblocks due to product maturity Superior Analytical Tools in the Open Source Product Jaspersoft OLAP and ETL provide room to grow in analytics Most cost effective path to a commercial support license if required in the future
  129. Visualization Tools Google Earth and NASA World Wind (Open Source) Evaluation World Wind better at handling large polygon shapes crossing the International Dateline Google Earth Server behind firewall requires a commercial license Decision: NASA World Wind
  130. Metrics – COTS Evaluation Orion selected over Hyperic after further evaluation JBoss Operations Network selected for application level monitoring of JBoss Servers Internal OS utilities such as sar, nmon, top, and timex utilized for real-time performance tuning and evaluation
  131. Portal Architecture JBoss Application Server Presentation Layer UI Components (GWT Widgets) UI Process Components User Browser Business Layer Business Domain Components (DTO, Models) Business Logic Components (EJB) Data Layer Oracle DB Data Access Component (Hibernate)
  132. Jasper Report Server Architecture
  133. Build 4/5 Architecture NDE/Production Zone Monitor and Control Services Web/App Server n25rpn07 Web/App Server n25rpn08 http Sencha GUI Libraries Jasper Reports Server Jasper ETL NASA World Wind NDE Monitors JON Server NDE Internal Users: Helpdesk Operator, TAL, PAL Oracle DB DMZ Customer Services Web/App Server n26rpn07 Web/App Server n26rpn08 https Sencha GUI Libraries Customer
  134. Monitoring Framework NDE Host n Oracle Host NDE Info Services NDE and NOAA Management JVM Oracle Server JBoss NDE Data Warehouse (Historical) Reports NDE Administrator System Engineering Staff PAL Staff Customers ETL NDE Service Dashboard NDE Class 1 NDE Database (Operational) NDE Class n Oracle Enterprise Manager NDE Apps NDE Logs Admins: Database System Network JBoss JBoss Operational Network (JON) Server JON Agent Orion Agent Orion Server
  135. Metrics JBoss Operations Network (JON) Monitors resources at application level Memory, CPU, Application Storage Alerts appear on application dashboard Orion / Oracle EM/ Cisco SAN Infrastructure monitoring for Admin perspective Alert to Admin NDE Algorithm Execution Utilized AIX timex command Measures CPU, Memory, I/O per execution
  136. Agenda Background Customer Services Product Management Operations Management Reporting Architecture Security Testing Schedule Stakeholder Requirements Met Government Approval to Proceed
  137. Security System Security Security Testing Customer Access Control
  138. System Security and Testing NDE Resides within OSPO/ESPC networks and conforms to EPSC security procedures All external connections pass through ESPC firewalls to the NDE DMZ and must pass ESPC firewall rules All servers comply with ESPC security policies All User IDs require ESPC approval Customer Portal will be designed for security, with tightly regulated database access and no access to most NDE systems Security Testing NDE portals will undergo security testing (as also seen in the Testing section) Scans will be made to find and eliminate SQL injection and other vulnerabilities
  139. Customer Access Control OS Layer User IDs: “Pull” customers have ftps/ftp-only IDs These IDs are granted and controlled through standard ESPC procedures These IDs only work for ftp(s) on pull servers in the DMZ Again, these connections must pass ESPC firewall rules, including source IP “Push” customers have no OS-level IDs Data Access Control: Subscriptions control what products the customer receives Subscriptions set External Host Definitions for “Push” Customers The Customer Profile in NDE defines points of contact for any necessary direct verifications and the trust level for Data Denial Application Layer User IDs: Customers have login access to a DMZ-hosted site (Customer Portal) This site is read only – excepting their ability to change their own password Site displays only data relevant to their subscriptions (includes general notices) Site is encrypted
  140. Internal User Access Control OS Layer User IDs Sys Admins and Engineers (TALs, Developers) will have direct login to many or all NDE Machines as befits their duties, controlled by standard ESPC procedures PALs will have limited read access to the SAN and the processing node forensics directories and have the ability to VPN in to a server running Firefox to have remote access to the NDE internal site Data Access Control Internal Permissions are set to match role responsibilities Application Layer User IDs (Internal Portal) Internal Portal User IDs are granted via standard ESPC procedures They are in addition to any OS-level IDs granted The internal site can only be accessed from within the ESPC network
  141. NDE Internal Portal Data Access NDE application layer access controlled by NDE: Each User in the database is associated with a role or roles The permissions associated with a role has are centralized in one java class, simplifying changes, if needed Upon login, their session stores all the accesses provided by the user’s roles NDE Security Agent (java class) provides the user permissions Access checks are very fast as the user moves through the portal Accesses will be defined at the page, widget, or even data item level as needed at design time There are a limited number of updatable items, making the number of role controlled items reasonably small Changing a user’s role is handled via an NDE User Management GUI, which handles the database updates
  142. Access Control – NDE Security Agent NDE_OPS NDE_ADMIN NDE Data Handling System NDE Data Handling System The user’s login session stores a list of all accesses One page design serves multiple roles based on a single access check at login time (At Login Time) (At Page Access Time) User Session (Editable) UserID Admin User Access controlled item list ItemAllowed? UpdateUser No ViewMngUser No UpdateJobBox Yes ViewDashbrd Yes … NDE Security Agent Ops User OR (View Only) DB
  143. Manage Users NDE_OPS NDE Data Handling System Annie User | Help | Logout Manage Users ✓ Reset Save
  144. Agenda Background Customer Services Product Management Operations Management Reporting Architecture Security Testing Schedule Stakeholder Requirements Met Government Approval to Proceed
  145. Builds 4&5 Integration Test Develop an Integration Plan -Schedule builds of design items (aggregations of units associated with one or more CRs) Develop thread diagrams of Customer Services, Monitoring and Control , etc. Develop Detailed Integration Test Procedures, and review Execute Integration Test Procedures Develop success criteria for requirements verification
  146. Builds 4&5 Verification Test Regression Test Build 1-3 capabilities – from Builds 1,2, and 3 SIT Procedures Develop Requirements Verification Test Plan Develop Requirements Verification Test Procedures from Integration Procedures Present Test Readiness Review Execute Requirements Verification Test Procedures
  147. Test Tools Web Automation Test Tools Benefits - Speed Regression testing – batch testing Test compatibility/portability Drawbacks Testing human factors GUI widget label updates can break recorded scripts Examples Selenium HTT TEST
  148. Agenda Background Customer Services Product Management Operations Management Reporting Architecture Security Testing Schedule Stakeholder Requirements Met Government Approval to Proceed
  149. Build 4/5 Schedule Schedule Start End Development Schedule 12/11 09/12 Develop detailed requirements IPT Engagements (Complete NLT 03/31) Prototyping Coding Unit/Integration Testing Test Schedule 06/11 12/12 Establish infrastructure Develop Test Plan/Procedures Integration Testing Regression Testing Verification Testing Documentation Schedule 12/11 03/13
  150. Agenda Background Customer Services Product Management Operations Management Reporting Architecture Security Testing Schedule Stakeholder Requirements Met Government Approval to Proceed
  151. Stakeholder Requirements
  152. Stakeholder Requirements
  153. Stakeholder Requirements
  154. Agenda Background Customer Services Product Management Operations Management Reporting Architecture Security Testing Schedule Stakeholder Requirements Met Government Approval to Proceed
  155. Backup Slides

  156. Team Solers Contact Information Address 7474 Greenway Center Drive Suite 400 Greenbelt, MD 20770 Phone 240-473-0080 x100 Fax 240-473-0080 x161 Email daniel.beall@solers.com jamison.hawkins@lmco.com rsikorski@innovim.com
  157. Acronym List
  158. Product Management
  159. XML File Development Created in the Development Environment Validated on SADIE Development Environment XML Authoring Tool XML File Database Validation
  160. NDE Patch Contents for Ingest Development Test OPS CCB Approval NDE Patch XML Authoring Tool NDE Patch XML File Validation Install Install DB DB DB Product Description tables External Product Data Source tables
  161. Product Definition - Ingest Product Definition XML files include Identifying file pattern Data Provider Short and long descriptions Metadata extraction function/additional validation steps Allows optional detail for automated tailoring services SAN storage location Test or Operational status Allows categorization by product area, type
  162. Product Definition XML <ProductDefinition ingestDirectoryName="incoming_input" productShortName=“VIIRS-M5TH-EDR“ productLongName=“VIIRS 5th M Band Imagery EDR" productDescription=“The VIIRS Moderate Band Imagery EDRs are characterized by a 750m Horizontal Reporting Interval (HRI). All M-Band imagery products are re-sampled from the VIIRS moderate resolution SDR geolocation to a GTM projection." productMetadataLink=“/opt/apps/nde/pds/data/VIIRS-M5TH-EDR.xml“ productProfileLink="“ productType=“XDR" productIDPSMnemonic=“EDRE-VMOD-C0030“ productAvailabilityDate="“ productCIPPriority ="“ productStatus="" productFilenamePattern=“VM05O_npp_d________t________e_______b_____c_________noaa____.h5“ productFilenamePrefix=“VM05O" productHomeDirectory="products/VIIRS-M5TH-EDR“ productSpatialArea="“ productIDPSRetransmitLimit=“5“ productArea="" … </ProductDefinition >
  163. Product Generation Receive Delivered Algorithm Package (DAP) Integrate in SADIE Promote to ALGTEST/Functional testing Promote to POP/Performance testing Approve for promotion to OPS Install into OPS via delivered software patch including associated XML files
  164. NUP/TP via DAP Transition to Ops STAR Integration (SADIE, IPT) Functional Test (ALGTEST, Test) Performance Test (POP, Test) CCB (Approve Promotion to OPS) XML Files Install into OPS NDE Patch DAP Package contains all SPSRB documentation (build instructions, release notes) Patch is created from ‘built’ software
  165. NDE-TP Transition to Operations Integration (SADIE, IPT) Functional Test (ALGTEST, Test) Performance Test (POP, Test) CCB (Approve Promotion to OPS) Install into OPS NDE Patch Encapsulated within Production Rule (DSS) Script (IDL) or Both
  166. NDE Patch Contents for Product Sys DAP Development Test OPS CCB Approval NDE Patch XML Authoring Tool NDE Patch XML File Validation Install Install DB DB DB PGS Nodes PGS Nodes Product Description tables Algorithm/Production Rule tables External Product Data Source tables
  167. NDE Patch Contents for Product Sys Database NDE Patch XML files Product Description tables Algorithm/Production Rule tables External Product Data Source tables INSTALL Algorithms Processing Nodes
  168. NDE XML Configuration Files Product Definitions Product Profile Definitions Enterprise Measures Definitions Algorithm/Rule Definitions Priorities PGS/DIS Job Size PGS/DIS Gazetteer XML (Spatial AOI) NDE Support Functions Product Group Ingest Incoming Directory Platform Platform Sensor Deliver Notification Type Data Denial Flag
  169. Production Rule XML <ProductionRule prRuleName="NUCAPS Retrieval Granule" prRuleType="Granule" prStartBoundaryTime="null" prProductCoverageInterval_DS="null" prRunInterval_DS="null“ prTemporarySpaceMB="1" prEstimatedRAM_MB="1" prEstimatedCPU="120" prProcessingStart="60" prCoverage="60" prActiveFlag="1" jobPriority="Medium" jobSize="Small" prWaitForInputInterval="interval '101' minute" gzFeatureName="null"> <Algorithm algorithmName="NUCAPS_Retrieval.pl" algorithmVersion="1.0"/> <ProductionRuleParameters> <PRParameteralgoParameterName="OPS_BIN” prParameterValue=“~/NUCAPS/OPS/Common_Bin"/> … </ProductionRule>
  170. Algorithm XML <Algorithm algorithmName="NUCAPS_Retrieval.pl" algorithmVersion="1.0" algorithmCommandPrefix="/usr/bin/perl -w“ algorithmExecutableLocation="NUCAPS/OPS/scripts" algorithmNotifyOpSeconds="60" algorithmType="Science" algorithmPcf_Filename="NUCAPS_Retrieval.pl.PCF" algorithmPsf_Filename="NUCAPS_Retrieval.pl.PSF" algorithmLogFilename="NUCAPS_Retrieval.pl.log" algorithmLogMessageContext="." algorithmLogMessageWarn="WARNING:" algorithmLogMessageError="Error in" > <AlgorithmParameters> <AlgoParameteralgoParameterName="OPS_BIN" algoParameterDataType="string"/> … </Algorithm>
  171. Enterprise Measures XML <ndeEnterpriseSchema> <EnterpriseDimensions> <EnterpriseDimension name=“AlongTrack-1536-1” start=“0” interval=“1” end=“*” storageSize=“1536” storageMaxSize=“1536” /> <EnterpriseDimension name=“AlongTrack-1541-1” start=“0” interval=“1” end=“*” storageSize=“1541” storageMaxSize=“1541” /> <EnterpriseDimension name=“BeamPosition-96-0” start=“0” interval=“1” end=“96” storageSize=“96” storageMaxSize=“96” /> <EnterpriseDimension name=“Bulk SST Scale/Offset-2-1” start=“0” interval=“1” end=“*” storageSize=“2” storageMaxSize=“2” /> <EnterpriseDimension name=“Channel-22-0” start=“0” interval=“1” end=“22” storageSize=“22” storageMaxSize=“22” /> … </ndeEnterpriseSchema>
  172. Product Profile XML <NPOESSDataProduct> <ProductName>ATMS Science SDR: Channels 1 through 22</ProductName> <CollectionShortName>ATMS-SDR</CollectionShortName> <DataProductID>SATMS</DataProductID> <ProductData> <DataName>ATMS SDR Product Profile</DataName> <Field> <Name>BeamTime</Name> <Dimension> <Name>Scan</Name> <GranuleBoundary>1</GranuleBoundary> <Dynamic>0</Dynamic> <MinIndex>12</MinIndex> <MaxIndex>12</MaxIndex> … </NPOESSDataProduct>
  173. Gazetteer XML <Gazetteer gzFeatureName=“AWIPS ATLANTIC” gzLocationSpatial=“SDO_GEOMETRY(2003, NULL, NULL, SDO_ELEM_INFO_ARRAY(1, 1003, 1), SDO_ORDINATE_ARRAY(-120.000, 75.000, 5.000, 75.000, 10.000, 75.000, 10.000, 10.000, 10.000, -50.000, 5.000, -50,000, 5.000, -50.000, -120.000, -50.000, -120.000, 10.000, -120.000, 75.000))” gzSourceType=“” gzLocationElevationMeters=“0” gzDesignation=“AOI” </Gazetter>
  174. Product Quality Summary Report “Detect and report to operators and management the acceptability of [input and output product] metadata according to configurable thresholds…” Product Quality Summary Capture - Detect Ingest xDR metadata names/values extracted and cataloged Framework to capture ancillary metadata if necessary Product Generation Production Rules incorporate quality threshold conditionals Distribution Subscriptions incorporate quality threshold conditionals Product Quality Summary - Report Online and printed by product summary of file counts and percentages for value ranges
  175. Detailed Scenarios
  176. Ingest Scenario #2 – “Old” granules HelpDesk determines Old count is rising, an old granule file is > 1 orbit (102 minutes) old HelpDesk ignores only if NDE is working off a known backlog Failed/Old Drill-Down reachedby clicking on Old Count hyperlink HelpDesk writes Trouble Ticket (TT), informs TAL TAL investigates/takes corrective action
  177. IS#2 - Old Granule Drill Down NDE_OPS NDE Data Handling System Annie User | Help | Logout Back Old Granules in Incoming Input of 4 1
  178. IS#2 - TAL Actions If “No crc file received” – potential problem with the data provider Determines product validity Valid: Writes TT, “fixes” by generating crc file Invalid: Writes TT/Contacts data source/removes file If “crc file processed” – potential DHS problem Renames/Moves existing crc file Determines extent of the problem Writes TT, determines whether to Pop the Hood
  179. Ingest Scenario #3 – Not Getting Data HelpDesk sees Dashboard indicating Backlog low and Throughput down, products are not being created Ignores only if during a known anomaly – ESPC/DDS down (indicator is red) Backlog & Throughput Drill-Down reached by clicking on either Backlog or Throughput status bars Product Granule File Counts By Orbit reached by clicking Critical Files Late Count or B & T Drill-Down link HelpDesk investigates, writes TT, informs TAL TAL investigates/takes corrective action
  180. IS#3 - Backlog & Throughput Drill Down NDE_OPS NDE Data Handling System Annie User | Help | Logout Back Backlog & Throughput
  181. IS#3 - Backlog & Throughput Drill Down NDE_OPS NDE Data Handling System Annie User | Help | Logout Back Backlog & Throughput
  182. IS#3 - Granule File Counts By Orbit NDE_OPS NDE Data Handling System Annie User | Help | Logout Back Granule File Counts By Orbit
  183. IS#3 - TAL Actions Using Drill-Downs TAL determines which product is the missing, data source status also shown If DDS Up – Writes TT, contacts DDS TAL to determine why ancillary products are unavailable Down – Writes TT, contacts DDS TAL for further info If IDPS – Writes TT, calls IDPS If NDE-PGS – DHS ‘internal’ cause, Check PGS, Pop the Hood/take corrective action
  184. Ingest Scenario #4 – Ingest is Slow Help Desk determines Backlog is increasing /Throughput is decreasing or both, using backlog and throughput status bars Help Desk ignores only during a known anomaly – maneuver, instrument downtime, other outage (communications) Help Desk determines the count of Fatal messages in the Alert Summary is non zero (or has gone up) Backlog & Throughput Drill-Down reached by clicking on status bar (see above) Product Granule File Counts By Orbit reached by clicking on Critical Files Late Count or B & T Drill-Down link (see above) Help Desk writes Trouble Ticket (TT), informs TAL TAL investigates (determines scope)/takes corrective action
  185. IS#4 - TAL Actions Scope determined using both Drill-Downs – which products/Data Sources If no Ingest Resource Down – both servers green, all data sources available Check Alert Summary log on Dashboard If servers are ‘balanced’ Further investigation – JBoss Server logs, host resources, etc. If servers are ‘unbalanced’, writes TT, call JBoss Admin
  186. Ingest Scenario #5 - Resource Down HelpDesk determines which resource is causing the problem via Dashboard indicators (red) and/or alert messages HelpDesk ignores only during a known anomaly – maneuver, instrument downtime, other outage (communications) Depending upon which resource, HelpDesk writes TT and notifies TAL and/or SME for that resource
  187. PGS Scenarios - Background Help Desk is watching the Dashboard Indicators show status of PGS factory servers (2), and PGS Processors (n) Annotated status bars show Backlog and Throughput Available Counts Failures – Non-zero return code Excessive CPU – Job exceeds CPU threshold Old – Job remains in Queued state Expired – Job Spec exceeds ‘wait for input’ time Drill-Downs Product Rule Summary – Status/Job counts summarized by product type Job/Job Spec – Specific details of Jobs and Job Specs Product Coverage – Product file indicator by coverage time
  188. PGS Scenarios (Placeholder to repeat the Dashboard slide)
  189. PGS Scenario #1 – NUP/TP Not Created Help Desk notices anomaly Resource is not available Backlog status bar is increasing (Throughput status bar may be decreasing) Jobs Specs incomplete (waiting for an input file) Jobs Queued (awaiting execution) Any of the available counts are increasing Help Desk writes TT, informs TAL Production Rule Summary Drill-Down reached by clicking on Backlog/Throughput status bar(s) TAL investigates/takes corrective action
  190. PS#1 - Production Rule DD NDE_OPS NDE Data Handling System Annie User | Help | Logout Back Production Rule Status Summary of 4 1
  191. PS#1 - Production Rule DD NDE_OPS NDE Data Handling System Annie User | Help | Logout Back Product Generation Status Summary Links to Job Box Configuration Drill Down Links to PGS Job/Job Spec Drill Down w/ filter on Production Rule and Status Links to Production Rule Definition Drill Down of 4 1
  192. PS#1 - PGS Job/Job Spec DD NDE_OPS NDE Data Handling System Annie User | Help | Logout Back PGS Job Spec & Job Status of 4 1
  193. PS#1 - TAL Actions If PGS Resource Down Navigate to the Resource Control Panel Drill-Down and fix Determine the extent of the problem – single product/all products If single product see Scenario #3 If PGS operating slowly see Scenario #2 If all products Check Ingest
  194. PGS Scenario #2 – PGS is too Slow Help Desk determines Backlog increasing/Throughput decreasing Help Desk writes TT, informs TAL TAL investigates/takes corrective action To determine which products are not being made, use Check Chart drill down Production Rule Summary Drill-Down reached by clicking on Backlog/Throughput status bar(s)
  195. PS#2 - Product Coverage DD NDE_OPS NDE Data Handling System Annie User | Help | Logout Back Product Coverage Of 96 1
  196. PS#2 - Production Rule DD NDE_OPS NDE Data Handling System Annie User | Help | Logout Back Production Rule Status Summary of 4 1
  197. PS#2 – PGS Job/Job Spec DD NDE_OPS NDE Data Handling System Annie User | Help | Logout Back PGS Job Spec & Job Status of 4 1
  198. PS#2 - PGS Job/Job Spec Details NDE_OPS NDE Data Handling System Annie User | Help PGS Ingest Distribution Other Dashboard Gap Checker Update Priority Requeue Job Back Terminate Expire Job PGS Job Spec & Job Status Job Spec & Job Details X 40 Alg Process ID: 56778 of 4 1
  199. PS#2 - TAL Actions Corrects Jobs using excessive CPU time (slowing everything else down) – Terminates particular jobs using Job Spec/Jobs Details page Retries terminated jobs and monitors, updates TT Determines if enough job boxes are available – Update Job Box Configuration Drill-Down
  200. PGS Scenario #3 - Some Prod not created Help Desk sees Expired or Old counts increasing, contact from customer Help Desk writes TT, informs TAL TAL investigates/takes corrective action Check Chart reached by Menu Job/Job Spec Drill-Down reached by clicking on Expired/Old counts Production Rule Summary Drill-Down reached by clicking on Backlog/Throughput status bar(s) Production Rule Details reached by clicking on Production Rule Name
  201. PS#3 - PGS Product Coverage DD NDE_OPS NDE Data Handling System Annie User | Help | Logout Back Product Coverage Of 96 1
  202. PS#3 - PGS Job/Job Spec DD NDE_OPS NDE Data Handling System Annie User | Help | Logout Back PGS Job Spec & Job Status of 4 1
  203. PS#3 - Production Rule DD NDE_OPS NDE Data Handling System Annie User | Help | Logout Back Production Rule Status Summary of 4 1
  204. PS#3 - TAL Actions Determines if Job Specs are expiring or Job Specs are waiting for input Clicks Expired count to determine rule Clicks Product Coverage Drill-Down to determine which input product is missing (reason for expiration) Check ingest for input product arrival Update TT Determines if Production Rule is active Clicks Production Rule Name to display Production Rule Details If Rule is inactive, updates, updates TT
  205. PGS Scenario #4 – Jobs Failing/Not Picked up Help Desk sees Failed or Old counts increasing or the Backlog status bar increasing Help Desk writes TT, informs TAL TAL investigates/takes corrective action Job/Job Spec Drill-Down reached by clicking on Failed/Old counts Production Rule Summary Drill-Down reached by clicking on Backlog/Throughput status bar(s) Check Log/Alerts Summary for processing node device utilization
  206. PS#4 - PGS Job/Job Spec DD NDE_OPS NDE Data Handling System Annie User | Help | Logout Back PGS Job Spec & Job Status of 4 1
  207. PS#4 - PGS Job/Job Spec Details NDE_OPS NDE Data Handling System Annie User | Help PGS Ingest Distribution Other Dashboard Gap Checker Update Priority Requeue Job Back Terminate Expire Job PGS Job Spec & Job Status Job Spec & Job Details X 40 Alg Process ID: 56778 of 4 1
  208. PS#4 - TAL Actions Determines type/extent of Failures from Production Rule Summary Drill-Down Determines Job error code details by clicking on specific Job Id Pops the Hood (examine forensics, logs, etc.), updates TT If processing node device Alerts show out of space Clean up the forensics directories of old data to allow processing, updates TT Determines why Jobs are getting Picked Up – stay in QUEUED state Determines if enough job boxes are available – Update Job Box Configuration Drill-Down
  209. Operations – Is something wrong? Distribution Distribution Resources are down Customer not getting data Push Pull Customer not being notified Distribution throughput is low
  210. Distribution Scenarios (Placeholder to repeat the dashboard slide for visual reference)
  211. DIS Scenario #1 - Resources down Operator notices one or more DIS resources to be red on Resource status indicators, or a resource related alert has been triggered, writes TT informs TAL Throughput may not be affected TAL resolves a red indicator with the following, per the indicator: Factory - Restart Processor Node - Restart ftps Pull Server – Notify Sys Admin, update TT Customer push server – Contact to resolve connection issue, if needed involve Network Administrator To resolve a resource status alert the TAL will Pop the hood
  212. DIS Scenario #2 – Cust not getting data Customer calls and informs operator/helpdesk that data is not being received, Operator/TAL initiates one or more of the following to resolve the problem Operator checks status of Distribution Resources See Scenario #1 TAL checks if Distribution Jobs/Requests failing and/or not created TAL drills down to Subscription Coverage and examines If Jobs/Requests fail, TAL investigates and attempts to resolve If Jobs/Requests absent, checks Ingest/PGS Coverage; if coverage is OK, investigate TAL Checks Subscription’s active flag and check Subscription’s job boxes TAL drills down to Subscription Details and updates the Active flag (if deactivated) TAL checks the Job Boxes on the Resource Control Panel
  213. DS#2 - Distribution Summary NDE_OPS NDE Data Handling System Annie User | Help | Logout Back Distribution Summary
  214. DS#2 - Distribution Summary NDE_OPS NDE Data Handling System Annie User | Help | Logout Back Distribution Summary Links to Subscription Coverage Drill Down Links to Distribution Request Drill Down w/ filter on selected status
  215. DS#2 - Distribution Request NDE_OPS NDE Data Handling System Annie User | Help | Logout Back Distribution Request
  216. DS#2 - Distribution Request NDE_OPS NDE Data Handling System Annie User | Help | Logout Back Distribution Request Links to Subscription Drill Down Links to Distribution Request Details Modal Window
  217. DS#2 - Distribution Request NDE_OPS NDE Data Handling System Annie User | Help | Logout Back Distribution Request
  218. DS#2 - Distribution Request NDE_OPS NDE Data Handling System Annie User | Help | Logout Back Distribution Request Links to Notification Drill Down
  219. DS#2 - Distribution Request NDE_OPS NDE Data Handling System Annie User | Help PGS Ingest Distribution Other Dashboard Gap Checker Distribution Request Details X Cancel Save Back Distribution Request COMPLETE 40 40 FAILED
  220. DS#2 - Subscription Coverage NDE_OPS NDE Data Handling System Annie User | Help | Logout Back NCO_OPS Subscription Coverage Of 96 1
  221. DS#2 - Subscription Coverage NDE_OPS NDE Data Handling System Annie User | Help | Logout Back NCO_OPS Subscription Coverage Links to Gap Checker Drill Down w/ filter on that Product Of 96 1
  222. DS#2 - Customer Data GUIs NDE_OPS NDE Data Handling System Annie User | Help | Logout Subscription Definitions Reset Save
  223. DIS Scenario #3 – Cust not notified Customer calls and informs operator/Help Desk that the data delivery notifications are not being received, TAL performs one or more of the following to resolve the problem Operator checks Notification Processors status from Distribution Resource indicator or sees Customer Notification Server indicator is red See Scenario #1 Operator checks whether Notification is specified for the Subscription, TAL sets the Notification mechanism for the Subscription, if not already, to resolve the problem TAL checks whether the products are being created/delivered in the first place (refer to Scenario #2) TAL drills down to Distribution Job details of that customer and checks the Notification Job Status TAL investigates any Notification Job failures and attempts to resolve the problem
  224. DS#3 - Distribution Request NDE_OPS NDE Data Handling System Annie User | Help | Logout Back Distribution Request Links to Notification Drill Down
  225. DS#3 - Notification NDE_OPS NDE Data Handling System Annie User | Help | Logout Back Notification Jobs
  226. DS#3 - Notification NDE_OPS NDE Data Handling System Annie User | Help PGS Ingest Distribution Other Dashboard Gap Checker Cancel Save Back Notification Job Notification Job Details X FAILED
  227. DIS Scenario #4 - Low throughput Operator notices low DIS throughput from Status bars, informs TAL, TAL performs one or more of the following to resolve the problem TAL examines if Ingest/PGS throughput is low (See appropriate scenarios in Ingest/PGS) TAL drills down to Subscriptions and ensures their active flag is set accordingly (Jobs are never created for inactive Subscriptions) TAL drills down to Distribution Job Details and: examines Distribution request details to see if Distribution factory is slow in completing Distribution Requests/Jobs TAL investigates and attempts to resolve the problem TAL informs Sys/Network Admin if investigation suggests a network issue TAL checks Distribution processors status from Resource Status indicators Alerts on the Log/Alerts Summary portion of the Dashboard if they are down, investigates and attempts to bring them up (same as in Scenario #1)
  228. Management Report Detail
  229. Latency Data Collected
  230. Availability Data
More Related