1 / 35

The IBM Accelerator for Social Data Analytics

The IBM Accelerator for Social Data Analytics. Raghuram Velega rvelega@us.ibm.com Vijay Bommireddipalli vijayrb@us.ibm.com. IBM Accelerator for Social Data Analytics. Introduction Architecture Conceptual Diagram Install Pre-requisites Post install tasks Running the Accelerator

deo
Télécharger la présentation

The IBM Accelerator for Social Data Analytics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The IBM Accelerator for Social Data Analytics Raghuram Velega rvelega@us.ibm.com Vijay Bommireddipalli vijayrb@us.ibm.com

  2. IBM Accelerator for Social Data Analytics • Introduction • Architecture • Conceptual Diagram • Install • Pre-requisites • Post install tasks • Running the Accelerator • Inputs required • Outputs expected • Demo • Troubleshooting

  3. Accelerators Contextual Search HadoopSystem Stream Computing Data Warehouse Information Integration & Governance Big Data Accelerators - Summary • Software components that accelerate development and/or implementation of specific solutions or use cases on top of the Big Data platform • Provide business logic, data processing, and UI/visualization, tailored for a given use case • Bundled with Big Data platform components – InfoSphere BigInsights and InfoSphere Streams Key Benefits • Time to value • Leverage best practices around implementation of a given use case. Analytic Applications BI / Reporting Exploration / Visualization FunctionalApp IndustryApp Predictive Analytics Content Analytics BI / Reporting IBM Big Data Platform Visualization & Discovery Applications & Development Systems Management Cloud | Mobile | Security

  4. Accelerators Contextual Search HadoopSystem Stream Computing Data Warehouse Information Integration & Governance Big Data Accelerator Types Analytic Applications • Analytic Accelerators • Address specific data types or operations (e.g. Text Analytics, Geo Spatial, etc.) • Application Accelerators • Address specific applications (example: Log Analysis) • Address specific industry functions (Example: Cyber Security, Health care, etc) • Address use cases or specific business processes (Customer insights from Social Media) • Can be industry-specific or cross-industry. BI / Reporting Exploration / Visualization FunctionalApp IndustryApp Predictive Analytics Content Analytics BI / Reporting IBM Big Data Platform Visualization & Discovery Applications & Development Systems Management Cloud | Mobile | Security

  5. Application Accelerators Improve Time to Value Telecommunications CDR streaming analytics Deep Customer Event Analytics Social Data Analytics Sentiment Analytics, Intent to purchase Machine Data Analytics Operational data including logs for operations efficiency

  6. Social Data Analytics - Using social media as a rich source of information Behavior Maybe our politicians should take a playbook out of the rivalry between duke/unc and take it to the courts http://ity.com/wfUsir I'm at Mickey's Irish Pub Downtown (206 3rd St, Court Ave, Raleigh) w/ 2 others http://4sq.com/gbsaYR @silliesylvia good!!! U shouldnt! Think about the important stuff, like ur 43rd birthday ;) btwhappy birthday Sylvia ;) Interest Location @silliesylvia I <3 your leather leggings!! Its so katniss!! Consumption 360 degree profile Personal Attributes • Sylvia Campbell, Female, In a Relationship • 32 years old, birthday on 7/17 • Lives near Raleigh, NC • College graduate; Income of 80-120k Buzz/Sentiment • Retweets BF’s comments • Interest in BBC shows: Downton Abbey, Sherlock, Fringe, (P&P?) • Sherlock Holmes, Robert Downey, Jr. • Hunger Games, Katniss/J. Lawrence Interests/Behavior • Watch movies, tv shows • Romance plots, “hero types”, strong women • Uses iPad 3, Redbox, Hulu • Shopping , interest in sales/deals • Duke/ UNC basketball dear redbox please have kings speech for my new tv colin firth movie marathon Age Interest Intent to consume @bamagirlcan’t wait to watch sherlock with you! Oh, robert downey jr, I still love you but bbc is so amazing @silliesylvia$10 dollars says matthew & mary get married next season :) #downtownabbey OMG OMG. just dropped my new ipad3 crappola!!! Prediction Intent to consume Consumption

  7. Social Data Analytics - Comprehensive Entity Extraction and Integration Name: Jane Doe Id: jaydee Address: Home of the Buccaneers Interests: running, yoga, football… Name: Jane Doe Address: Tampa, FL Twitter: jaydee Blog Topic: food Hobbies: running, yoga, … Relationships: Tony C (brother)… Name: Jane Doe, Cava Address: Tampa, Fl Twitter: @maryguida Blog Topic: politics Hobbies: running, yoga, … Relationships: Tony C (brother)… Name: J Doe Blog Topic: food Entity Integration Challenges: • Scale • 1000’s sites, 100s millions users • Complex matching decisions • Partial, noisy and incomplete profile attributes • Only 3% of consumers have sufficient attribute information in their profiles. Name: jane Address: Tampa, FL Relationships: Tony C (brother)., … All names are fictitious

  8. Data Ingestand Prep Entity Analytics:Profile Resolution Extract Buzz, Intent , Sentiment Social Media Data Entity Analytics and Integration Extract Buzz, Intent , Sentiment And Consumer Profiles Comprehensive Social Media Customer Profiles Ad hoc access Index using Push API Social Media Analytics Architecture Online flow: Data-in-motion analysis Pre-defined views and charts Stream Computing and Analytics Social Media Dashboard BigInsights System and Analytics Pre-defined Workbooks and Dashboards Offline flow: Data-at-rest analysis Data Explorer Optional: Indexed Search

  9. Conceptual Flow On GA, conceptual diagram is available in Information Center, and from the BigInsights Web console

  10. Install – Pre-requisites • Install InfoSphere BigInsights Version 2. • Install InfoSphere Streams Version 3 • Optional: If you want to use the Data Explorer Indexing application in InfoSphere BigInsights after the data analysis, ensure that you have access to the IBM Data Explorer product. • Check out the topic - Planning to install in Info. Center for the full list • e.g. Install the perl-Time-HiRes RPM Package Manager (RPM) • BigInsights and Streams have to be operational (Streams instance has to exist)

  11. Install

  12. Install – trouble shooting • Log file - sda_install_debug.log • Note: Until Install finishes logs are under installer’s home directory. • After User hits ‘done’ – Logs are copied to /opt/ibm/accelerators/SDA/logs/install • Make sure they have specified Console logon password for BigInsights administrator (and not OS logon) • More under security considerations <SDA_HOME>

  13. Post Install Follow “Configuring IBM Accelerator for Social Data Analytics” topic in Install and configuration section of Information Center. • Update the Java™ heap size and other hadoop parameters in the mapred-site.xml (platform defaults do not work well for SDA) • Update mapred-site.xml • Update GNIP using http://console.gnip.com • Optional: Prepare real time visualization • Optional: Set up Data Explorer Indexing

  14. Uninstall • /opt/ibm/accelerators/SDA/uninstall • [biadmin@hdtest121 uninstall]$ ./uninstall • Username / Password of biadmin (console password) • Uninstalled state • Apps are removed • Sample config files are removed from HDFS • Flows on Streams are removed • Removes files from local File system where SDA is installed • Removes installed files from Streams node local FS

  15. SDA Applications • Import applications • Brand Management • Retail • Finance • Media and Entertainment • Lead Generation • Retail • Finance • Generic • Starting point for all others • Data Explorer Indexing app Tip: Use tree view to navigate

  16. Example Sequence of running SDA for Brand Management Retail scenario Import Data with appropriate keywords (Decahose + Powertrack + Boardreader blogs+ forums) Provide inputs for SDA using the right schema for retail Pick appropriate Applications e.g. Brand Management Retail View visualizations using Dashboards and Collections Run Config-Local-Global Analysis

  17. SDA Import Applications • Data Import • Gnip for Tweets • Boardreader : Boards, Blogs • Input/Considerations • Credentials to connect • Keywords (csv file) • Date ranges • Troubleshooting/ Common mistakes • Make sure no existing jobs are runningIf there are previous jobs running, make sure to cancel existing jobs

  18. Configuration Application • Takes user input to generate Text Analytics rules (TAMs) for running analysis • Specify Objects to be analyzed • Specify Alias objects • Positive, and Negative filters • Remember:Everything is tied together by <scenario name> • Note: Current version limitation: Scenario name cannot contain spaces

  19. Configuration Application - continued

  20. SDA BigInsights Offline flow Applications Local Analysis • Local Analysis: The Local Analysis application runs all the extractors that were compiled by the Configuration application on one tweet, board, or blog at a time. • Remember:Everything is tied together by <scenario name>

  21. SDA BigInsights Offline flow Applications Global Analysis • Global Analysis: Performs cross document analysis to extract meaningful insights, and build consumer profiles based on this accumulated context across multiple Local Analysis runs. • Remember:Everything is tied together by <scenario name>

  22. SDA BigInsights Application chaining • Chained applications are provided for convenience

  23. SDA Offline Applications – trouble shooting/usual suspects • Local Analysis – Specify the correct date range and time stamp • Otherwise, may end up without any LA output • Glocal Analysis – Make sure to ‘Update WorkBook’ - User needs to select the workbook/collection needed to be updated and then select the Add Row button (usually forget to select the Add Row Button, even after selecting the workbooks to be updated) • Otherwise, no output. • Out of Memory – Make sure post install recommendations are followed • Make sure the number of reducers are changed from default • Out of Memory (right at the start) - Technote to follow. • Limitations: Spaces in Scenario name are not allowed

  24. SDA Real time Application • Note: At least 1 run of Global Analysis per scenario needs to be complete before running Real-time application. • Feedback is captured real time as data is streaming in. • Remember:Everything is tied together by <scenario name>

  25. SDA Data Explorer Indexing • Currently designed only to work with SDA output • Input directory is output of Global Analysis application • Only available if Data Explorer is installed

  26. SDA Outputs • Pre-defined Workbooks • Dashboards • LA/GA CSV outputs • ‘Understanding the output schema’ in Information Center

  27. Application Dependency • In context of a Scenario , Local Analysis , Global Analysis and Real-time Analysis Apps are dependent on Configuration App. • An implicit locking of dependent applications is implemented in hdfs, which would prevent running the Configuration App, if any of the dependent apps are running for the same Scenario. • Lock error would show up on the console. • error message[Could not get needed locks] • Removal of locks • Local Analysis and Global Analysis Apps : Stop them explicitly from BigInsights Console • Real time analysis : Stop it explicitly from BI Console , if that fails , stop the job from command line using streamtool utility in streams.

  28. SDA Install , configuring and running secure mode (1) • PreInstall • BigInsights Console Admin need to be a member of BigInsightsApplicationAdmin Role, as the SDA Installer deploys apps using secure REST API to BigInsights Cluster • Error : Deploying of Apps would fails so the Installer will fail • Install • BigInsights Console Admin user / password need to be provided as part of installation input, as SDA Applications are deployed using BigInsights Secure Rest API's to BigInsights Cluster • Error: The install will fail

  29. SDA Install , configuring and running secure mode (2) • Post Install • Download the sample bi.properties file from /user/applications/SDA/sample_bi.properties file. • Specify the BigInsights Admin username/password in the file and copy it to the biginsights Admin users public credstore location as bi.properties eg : if BigInsights Admin user is "biadmin, then the location of the properties file on DFS is /user/biadmin/credstore/public/bi.properties Error : The configuration App will fail to execute. • UnInstall • BigInsights Console Admin user / password need to be provided as part of uninstall input, as SDA Applications are undeployed using BigInsights Secure Rest API's from BigInsights Cluster. • Error : UnDeploying of Apps would fail so the UnInstall will fail

  30. SDA Install , configuring and running secure mode (3) • Running SDA Apps and non BigInsights Admin User • BigInsights Application Admin need to permission a non BigInsights Admin user to execute the apps, refer to BigInsights Application security documentation. • Error : Non admin user will not be able to see the SDA apps in BigInsights Console • Make sure the non-admin user has write privileges to the /accelerators/SDA/*.* on Distrubuted file System • Error : Configuration App will fail to execute • Setup password SSH between the streams master node and all BigInsights Nodes for the non BigInsights Admin user. • Error : Import Apps and Real-time App will fail to execute

  31. SDA Install , configuring and running secure mode (4) • Misc Possible Errors • BigInsights admin-user forgets to create the bi.properties file and run SDA applications and encounters error (missing bi.properties file). • BigInsights username/password incorrectly specified in the bi.properties file • The BigInsights admin user creates the bi.properties file in the private folder instead of public, which restricts other users to access the file for validation.

  32. SDA 1.1 • Social Media Sources Supported • Gnip, Boardreader • Tweets, Boards, Blogs • Streaming data as well as data at rest • Streams for processing of streaming data • BigInsights/Hadoop for input, output and configuration data • Key Micro-segmentation Attributes (out-of-box) • Personal Info: Gender, Location, Parental status, Marital status, Employment • Interests: Movie interest, Comic book fan, Product interest, Current customer of, Products owned • Entity resolution across the different social media sources

  33. SDA 1.1 • Outputs/Measures (out-of-box) • Buzz • Sentiment • Intent to buy/start service • Intend to attend/see • Example use cases • Retail – Lead generation, Brand management • Financial – Lead generation and Brand management • Media & Entertainment: Brand management • Generic • Visualization using BigSheets • Extendable/Customizable Solution

  34. Available documentation • This deck !  • Information Center • SDA IOD lab

  35. Thank You!

More Related