Improving Interpretive Interfaces for Math Entry
230 likes | 344 Vues
This project aims to improve the theory and tools for constructing and evaluating pattern recognition systems, specifically within the field of document recognition and pen-based computing. Under the leadership of Richard Zanibbi at the Rochester Institute of Technology, the lab focuses on developing game-theoretic models, machine learning algorithms, and open-source systems like the Freehand Formula Entry System. We invite new collaborators to join us in addressing recognition challenges posed by complex mathematical symbols and layouts.
Improving Interpretive Interfaces for Math Entry
E N D
Presentation Transcript
Improving Interpretive Interfaces for Math Entry Richard Zanibbi Department of Computer Science Rochester Institute of Technology
RIT Document and Pattern Recognition Lab (DPRL) • Goals: • Improve theory and tools for constructing and evaluating pattern recognition systems • Apply these to problems in document recognition and pen-based computing • Members: • Richard Zanibbi • Kurt Kluever (Master’s student) • New members welcome! • http://www.cs.rit.edu/~rlaz/dprl.html
Current Directions: • Theory and Tools: • Tools for recognition module integration and evaluation, such as the Recognition Strategy Language (Zanibbi et al.) • Game-theoretic models of recognition problems and systems (e.g. for classifier combination) • Machine learning algorithms for system optimization • 2. Applications: • Pen and image-based math entry (lab maintains open-source Freehand Formula Entry System(Smithies, Novins, Arvo, Zanibbi et al.) • Optical character recognition (OCR) • Image and text-based document retrieval • “CAPTCHAs” (for distinguishing humans from 'bots’) • Table recognition, etc.
Pen-Based Math Entry • Recognition Challenges • Large number (e.g. > 500 in LaTeX) of symbols, many similar in structure (e.g. 0 and O) • Layout of symbols on baselines can be ambiguous • Little redundancy • Context influences symbol identity and layout interpretation
Example:Freehand Formula Entry System/DRACULAE • Contributors: • FFES first developed as an MSc project at University of Otago (Smithites, Novins), New Zealand, using CIT tools of Jim Arvo et al. in 1998 • Since then, contributors from Queen’s University (CA), Concordia University (CA), and around the world (CMU, UC Berkley, Companies and non-profits in California and France)
DRACULAE (Zanibbi, 2002) • “Diagram Recognition Application for Computer Understanding of Large Algebraic Expressions”
DRACULAE:Layout Classes for Symbols • Symbol name defines class membership.
DRACULAE Layout Analysis: Sketch • Algorithm: • Symbols assigned layout type (class) based on symbol identity • Sort symbols left-right on leftmost edge of Bounding Box • Create baseline structure tree with region node “Expression” • Recursively: • Search right-to-left, locate the leftmost (“start”) baseline (dominance rules for symbol layout class pairs) • From start symbol, search left-right in symbol list for symbols adjacent on baseline (**Zhang: fuzzy version) • Add baseline symbols as children of parent region node • Place non-baseline symbols in lists associated with region nodes (e.g. for super/subsc/bleft etc.) • Apply a-d to each new region, until no new regions created
Expanding the View… • Integration of scanned and pen-based expressions • Infty system, FFES prototype (impl. Josh Zimler 2006) • Long Term Goal: Flexible input and combination • Allow one to easily combine and then reformat/interpret • LaTeX, eqn, etc. • MATLAB, Mathematica, etc. • Handwritten expressions (tablet/mouse) • Scanned images of handwritten or typeset expressions • “Vector drawing” interface input, e.g. as in Xpress (Pollanen et al.)
Other Math Entry Interfaces • Natural Log by Matsakis, Miller, and Viola (MIT) • JIMHR: (Java-Based) Interactive Math Handwriting Recognizer, a merge and port of FFES/DRACULAE and the Natural Log system by Joy-Gong Ho (Acuitus Corp., USA) • JMathNotes by Ernesto Tapia Rodriguez (Free University of Berlin) • Infty by M. Suzuki et. al. (Kyushu University, Japan) • MathJournal by XThink Inc: first commercial pen-based math recognition system • MathPad by Joseph LaViola • Links available: http://www.cs.rit.edu/~rlaz
Motivation: A high-level language for pattern recognition algorithms • Table Recognition Survey (Zanibbi et al. 2004) • Summarizes literature in terms of observations, transformations, and inferences. • Techniques studied characterized as making the follow types of inferences (decisions): • Parameter values (e.g. thresholds) • Interpretation Model Operations: • Segmentation (identifying regions of interest in data) • Classification (assigning types to regions) • Relating regions (e.g. topology (adjacencies)) • Rejecting segments, classes, and region relationships • (Unanswered) Question: • How should we combine recognition modules in a complex math entry system?
Example: Simple Table Structure Recognition Algorithm (Part 1) • model regions • Image Word Cell % default:’Region’ • Row Column • end regions • model relations • % default:’contains’ • adjacent_right adjacent_below • end relations • recognition parameters • sMaxRowSeparation 2 % millimetres • sMaxColumnSeparation 2 % millimetres • aResolution 300 % dpi; default • end parameters
strategy main adapt aResolution using getScanResolution() observing {Image} regions classify {Word} regions as {Cell} relate {Cell} regions with {adjacent_right} using defineRightAdjacency(sMaxRowSeparation,aResolution) segment {Cell} regions into {Row} regions using relationClosure() observing {adjacent_right} relations relate {Cell} regions with {adjacent_below} using defineLowerAdjacency(sMaxColSeparation,aResolution) segment {Cell} regions into {Column} regions using relationClosure() observing {adjacent_below} relations accept interpretations end strategy External Decision Function Observation Specification Decision type Trivial Decision Decision Function Parameters Input: Params, Graph with Image, Word regions (BBs) Output: Cells, Rows, Cols
Running RSL Programs • Translate RSL Program to TXL (Using TXL) • Pass Input Graph (text file) to Program • Output (text files): • Accepted Structures (interpretations) • Log of all decisions and their outcomes
New Metrics Based on Hypothesis Histories: Historical Recall and Precision False Negatives ( F ) Generated Hypotheses: ( A U R ) Recognition Targets: Correct Hypotheses
*Inference times shown are those affecting cells Cell Detection Results (Handley, 2001) RSL Re-implementation on Table ‘a038’ (UW-III) • 0: Input (words and lines) • 1: Classify words as cells • 16: Merge ‘horizontally close’ cells • 35: Merge cells sharing column, row assignments. Nearly 50% of correct cells rejected; new correct cells also detected • 47: Two cells merged producing column header ‘Total pore space (percent)’ • 51: Merge header cells bounded by two horizontal lines • 83: Merge cells sharing line and white space separators
RSL and Math Entry • Proposal: “MIN” System • New interface for math entry and offline experiments • Use RSL to define recognition strategies, capture results. • (Really): testbed for studying recognition algorithms and their intelligent combination, organization, and deployment in practice. • Goals: • Compare different approaches to recognizing mathematical expressions (from input to output) represented in RSL • Allow flexible training, combination, and alteration of various recognition strategies. • Extend RSL to accommodate math and other problem domains more effectively, while remaining abstract
(Some) Relevant Journals and Conferences • Journals • IEEE Trans. Pattern Analysis and Machine Intelligence • Machine Learning • Pattern Recognition • Pattern Recognition Letters • Artificial Intelligence • Int’l J. Document Analysis and Recognition • … • Conferences • Int’l Conf. Machine Learning • IEEE Computer Vision and Pattern Recognition • Computational Learning Theory (COLT) • Int’l Conf. Document Analysis and Recognition • Int’l Work. Document Analysis Systems • …
Thank you. • Questions? • Support: GCCIS Department of Computer Science