1 / 16

Search Stack Secrets

Search Stack Secrets. Ryan Gehring - Indiegogo. Practical Search for Rubyists. Elasticsearch / SOLR / alternatives roundup. Essential plugins you need to install today. Semi SOA search design. Schemaless is for amateurs! Mappings = friend. Problem solving with analyzers.

dewey
Télécharger la présentation

Search Stack Secrets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Search Stack Secrets Ryan Gehring - Indiegogo

  2. Practical Search for Rubyists Elasticsearch/ SOLR / alternatives roundup. Essential plugins you need to install today. Semi SOA search design. Schemaless is for amateurs! Mappings = friend. Problem solving with analyzers. Avoiding Tire DSL- query json ingredients.

  3. Elasticsearch v SOLR v … Horizontal scalability GREAT API Developer support (analyzers, etc.) Downside: slightly less great ruby client.

  4. Awesome Plugins elasticsearch-head A web front end for an ElasticSearch cluster http://mobz.github.com/elasticsearch-head ElasticSearch Paramedic Paramedic is a simple yet sexy tool to monitor and inspect ElasticSearch clusters. ElasticsearchJDBC river https://github.com/jprante/elasticsearch-river-jdbc

  5. One solid service-y and Rails 4-approved design Webform in view supplies GET parameters, submits to a search controller. Search controller okays the proper, permissioned parameters via strong parameters, instantiates a search object. Search model translates parameters into a query --- either using Tire (the ruby client) or JSON. Query fired and results are served!

  6. Mappings + Analyzers: Ingredients for Success! Elasticsearch is schemaless by default, but you can optimize by providing a schema. What fields to index, How to analyze+tokenize fields. These analyzers help a lot!

  7. Problem solving with analyzers • My search isn’t robust to misspellings! • N-gram • Edge n-gram • My search isn’t robust to plurals / caps / whitespace/ etc. • Snowball (standard+lowercase+someenglish language stemming + stopwording) • I can only solve one of these at once! • Multi field analysis.

  8. Problem solving with boosts • Boosts are a concept from Lucene; they are multipliers on scores. • You can set the relative importance of matching fields: example: title -> 10, vs. free_text -> 1 • You can set the relative importance of matching on ANALYZED fields: example: ngram_title -> 6, snowball_title -> 10. • Bonus for fields with exact token matches.

  9. Key queries in Elasticsearch • Filtered Query: • Apply binary filters to an arbitrary query; try it with the query_string query type for full text, analyzed search queries + filters. • Custom Score Query • Provide the exact equation for scoring --- you can take mathematical transforms of variables using MVAL or even python with the right plugin.

  10. Theoretical Section Integrating models via custom scoring. Learning models – a qualitative, quantitiative process. Data sources and paradigms. Key metrics for search. Monitoring statistical model performance.

  11. Custom score queries are regression equations. You can use supervised learning methods to train them over time like Google.

  12. Statistical learning & search. • Clickstream models • Logistic regression • Binary target, click no click • Learn boosts, coefficients, etc. • Paired comparison models • Logistic regression • Binary target, A > B • Learn boosts, coefficients, etc.

  13. Search model training is a qualitative-first process. Review search algorithms before you push them. Have other people review search results before you push them. Make your app robust to new search query models – abstract the regression to a query model. Do side-by-side qualitative search QA.

  14. Search success metrics… any googlers here? Items consumed / session for browse pages. 1- abandoned search % for search pages. Conversion rate originating from search page.

  15. Search model learning Explain output --- the ultimate training data, in a nasty, semi-structured mess. Built an AST parser for Lucene explain output so you can get clean rows of observations. Every query’s intimate scoring details are logged into a DB as lines of training data.

  16. Search model monitoring You can calculate stability metrics for thousands of queries between two models and highlight the least stable queries. You can monitor prediction accuracy on clickstream data for performance degradation.

More Related