Data Mining at work
Data Mining at work. Krithi Ramamritham. Dynamics of Web Data. Ad Component. Headline Component. Headline Component. Navigation Component. Headline Component. Headline Component. Personalized Component. Dynamically created Web Pages -- using scripting languages.
Data Mining at work
E N D
Presentation Transcript
Data Miningat work KrithiRamamritham
Dynamics of Web Data Ad Component Headline Component Headline Component Navigation Component Headline Component Headline Component Personalized Component Dynamically created Web Pages -- using scripting languages
1. What to deliver? Page content may be based on • queries on dynamically changing data • e.g., sports scores, stock prices, environment • type of access device • time and location of access/user Existing sites may contain new information New sites (URLs) may come into being
2. How to deliver? wiredhost sensors Network Network servers Proxies /caches mobile host Data sources End-hosts
Update Mumbai temperature every 2 degrees The proxy obtains data from the source(s) Maintains | U(t) - S(t) | <= 2 Keep Data Up-to-date Source S(t) Proxy / DB P(t) User U(t)
After a specific interval When to poll the source? Server Proxy User Pull Basedon temporal data mining – time series analysis – and prediction of when change will exceed 2 degrees
Where to do the work? • Diverse client devices • Differ in hardware, software, network connectivity, form factor • Web content needs to be tailored for each client type • Each response depends • not only on the requested URL • but also on the capabilities of the client
Transcoding Conversion of one data version to another • Decreasing Image Quality (JPEG quality level) and size - “convert” utility in Linux • Summarizing text • transcode => Info extraction/ retrieval/ classification
Who should transcode? • Download desired version from server • Transcode higher version locally • Factors influencing decision • Transcoding Complexity • Proxy-server network connection • Load on proxy (Multiple Linear) Regression Predict based on a (linear) model of overheads
What is new on the Web? How is the monsoon progressing? Time series analysis: Change prediction, pattern mining
‘Bhav Puchiye’ www.broadmoor.com Interface for Bhav Puchiye
Inverted Pyramid Interfaces Conclusion Discussions Background & related Information Findings Findings Background & related Information Discussions Conclusion Inverted pyramid approach
Bhav Poochiye Pricing Module developed for selected commodities for selected markets for selected areas DEMO
Building Usage Profiles Estimate access probabilities based on: • Current user/community navigational patterns over site contents (in the form of click streams) • Historical user/community access patterns over site contents (in the form of association rules) Cluster needs based on location, income/age of user, time-of-day
Data Mining From data to information to knowledge to money!