1 / 10

Demonstration of Prototype

Demonstration of Prototype. Some challenges (and some solutions…) Classification – self selection vs. categorisation Solution, for now, is a combination of approaches (more in a second) Expectation Management

thisbe
Télécharger la présentation

Demonstration of Prototype

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Demonstration of Prototype

  2. Some challenges (and some solutions…) • Classification – self selection vs. categorisation • Solution, for now, is a combination of approaches (more in a second) • Expectation Management • Might have been handled better from the outset: making our expectations clear is probably important • ‘Prototype’ status has its issues • Relating themes to specific events/projects • Have begun incorporating events & projects into the system, using the same sort of vocabulary as that used for themes & researchers

  3. Classification – the solution (?) Mixture of controlled classification schemes: RCUK research classification scheme Cross-disciplinary Hierarchical Tied to funding - Relational MySQL version of the scheme created, and shared on the blog

  4. Classification – the solution (?) Some of the other classification schemes we considered include: The University’s own College/School structure Lacked granularity. Recently re-structured... Eurostat’s Classifications metadata Focus on economic activity The EU’s Nomenclature for the analysis and Comparison of Scientific Programmes and Budgets (NABS) classification Largely science-based The Universal Decimal Classification Summary (udcS) Probably closest to our needs Perhaps lacked familiar nomenclature

  5. Classification – the solution (?) ESRC National Centre for Research Methods Degree of top-down approval (Research Council) Provides an implicit hierarchy None of the potential schemes we found to be exhaustive Social Sciences focus of the NCRM scheme actually includes a pretty comprehensive list of qualitative and quantitative methods

  6. Some technical points • Text extraction (from PDF) was less trivial than expected • Decoding streams, dealing with odd characters, etc. • Authentication was somewhat problematic • More of an institutional hurdle than a technical challenge • Search and comparison algorithms have been improved by incorporation of stemming and fuzzy search

  7. Stemming • Using a version of the Porter stemming algorithm • Used to suggest keywords from publications and project descriptions • Much more useful (in my opinion!) when used to conflate search results • Can optionally allow for stemming in search engine

  8. Fuzzy Search • Experimented with an implementation of Jaro–Winkler distance • Also tried PHP’s built-in similar_textfunction • Finally settled on Levenshtein distance • Wrapped up the native PHP function with some additional parameters for acceptable distances • Fuzzy search is another option in the search engine • But quite useful as another conflation tool behind the scenes

  9. Demo…

More Related