340 likes | 448 Vues
CMo: When Less Is More . Context-Directed Browsing for Mobiles. Yevgen Borodin Jalal Mahmud I.V. Ramakrishnan. Miniaturization and Mobility. Mobile Web. Regular Web Sites. Happy Scrolling. Browsing Example. Mobile Browsing Problems. Data Transfer Cost is High Connection is Slow
E N D
CMo:When Less Is More Context-Directed Browsing for Mobiles Yevgen Borodin Jalal Mahmud I.V. Ramakrishnan
Mobile Browsing Problems • Data Transfer Cost is High • Connection is Slow • Small Screens • Lots of Scrolling • Time-Consuming • Strenuous • Tiring
Interface Manager Context Analyzer Browser Object Geometric Analyzer Architecture CMo Proxy Server
First Problem: identifying significant frames • CMo HTTP proxy • Utilizes Mozilla to parse DOM • Get a tree of “frames” • Tag these by content “link”, “text”, “image link” … • Identify “maximal semantic blocks” • Discard leaves • look for all X or Y aligned blocks
Context Collection The Page is Segmented into 5 Blocks
Next Problem: identifying context of links • User has clicked somewhere • What is the context? • Possible ideas • The text of the link itself • The surrounding text (in the HTML stream) • The surrounding text (on the page) • CMo looks at the nearby text • … only if it has something to do with the link text
Next Problem: identifying context of links • Link text parsed into 1, 2, 3-grams • “Rice not ruling out talks with Iranians” -> • Rice, not, ruling, out, with, Iranians • Rice ruling, ruling out, … • Rice ruling out, ruling out talks, …
Next Problem: identifying context of links • Perform similar analysis on sibling blocks • Calculate cosine similarilybetween m-sets • Cardinality of intesecting members • Divided by the product of the square root of each set’s cardinality. • USA, news, sports | USA, world -> .4 • USA, news | USA, world -> .5 • USA, news | USA, news -> 1
Cos(M1, M2) > T Cos(M1, M2) < T Context Collection M2 M1 M1 M2 M2
Last Problem: where to zoom at target • Break target page into frames • Compare each frame with context • Metrics used: • Words, 2-, 3-grams matched exactly • Words, 2-, 3-grams that stem match
Next Problem: where to zoom at target • End up with a 6-tuple for each target block • How to rank… Machine Learning! • Supervised learning using SVM • Linear classifier • maximizes distance from hyperplane (QP) • 900 labeled examples, 100 unlabled.
Features SVM Rank The Page is Segmented into 3 Blocks 0.1 0.4 0.8
Exact Match of Context Words: Rice Exact Bigram Match: ruling talks Exact Trigram Match: Secretary State Condoleezza Match of Word Stems: rule Match of Stemmed Bigrams: talk Iranian Match of Stemmed Trigrams: Iranian offici confer
Experimental Setup • Web Site Domains (5 Websites in Each) • News, Books, Consumer Electronics • Office Supplies, Informational • 30 Graduate Students
Training SVM for Block Relevance • Data Collection • Collected 1000+ Pairs of Pages from 25 Web Sites • Labeled Data with Link, Context, Relevant Block • Training SVM • Computed Features for 900 Pairs of Pages • Trained SVM Model with Feature Vectors • Used 100 Pages for Cross-Validation
Somewhat complicated procedure for training • Classificaion of blocks on link targets • Feeds back into the link context threshold
Evaluation • Accuracy of Context Identification • Accuracy of Relevant Block Identification • Browsing Time with CMo vs. Regular Browser • Number of Pen Taps with CMo
Evaluation: Context Collection • Using 500 Web Pages from 25 Websites
Evaluation: Relevancy Detection • SVM Model Trained Using 900 Page Pairs • Testing Done with Remaining 100 page pairs
Evaluation • Users perform news tasks such as (T1) • In Google news, find a given story • Click link to New York Times • Provide a specific piece of information contained in that story. • Other tasks were shopping-like (T8) • Go to amazon • Click on “Pink ipod” • Determine its sales rank
Future Work • Porting CMo to Client Side • Expand SVM Features • Use Partitioning to Improve Segmentation • Explore Navigation Options
Contributions • Using Context to Find Relevant Information • Saving Users Browsing time • Reducing the Number of Stylus Taps • Conveying the Richness of Web Pages