Cost-Sensitive Learning for Large-Scale Hierarchical Classification of Commercial Products

Cost-Sensitive Learning for Large-Scale Hierarchical Classification of Commercial Products Jianfu Chen, David S. Warren Stony Brook University

Classification is a fundamental problem in information management. UNSPSC Vehicles and their Accessories and Components (25) Food Beverage and Tobacco Products (50) Office Equipment and Accessories and Supplies (44) Segment Product description Email content Marine transport (11) Motor vehicles (10) Aerospace systems (20) Family Product and material transport vehicles (16) Safety and rescue vehicles (17) Passenger motor vehicles (15) Class Spam Ham Buses (02) Automobiles or cars (03) Limousines (06) Commodity

How should we design a classifier for a given real world task?

Method 1. No Design f(x) Training Set Test Set Try Off-the-shelf Classifiers SVM Logistic-regression Decision Tree Neural Network ... Implicit Assumption: We are trying to minimize error rate, or equivalently, maximize accuracy

Method 2. Optimize what we really care about What’s the use of the classifier? How do we evaluate the performance of a classifier according to our interests? Quantify what we really care about Optimize what we care about

Hierarchical classification of commercial products UNSPSC Textual product description Vehicles and their Accessories and Components (25) Food Beverage and Tobacco Products (50) Office Equipment and Accessories and Supplies (44) Segment Marine transport (11) Motor vehicles (10) Aerospace systems (20) Family Product and material transport vehicles (16) Safety and rescue vehicles (17) Passenger motor vehicles (15) Class Buses (02) Automobiles or cars (03) Limousines (06) Commodity

Product taxonomy helps customers to find desired products quickly. • Facilitates exploring similar products • Helps product recommendation • Facilitates corporate spend analysis Toys&Games Looking for gift ideas for a kid? dolls puzzles building toys ...

We assume misclassificationof products leads to revenue loss. Textual product description of a mouse Product ... ... ... Desktop computer and accessories ... ... pet mouse keyboard lose part of the potential revenue realize an expected annual revenue

What do we really care about? A vendor’s business goal is to maximize revenue, or equivalently, minimize revenue loss

Observation 1: the misclassification cost of a product depends on its potential revenue.

Observation 2: the misclassification cost of a product depends on how far apart the true class and the predicted class in the taxonomy. Textual product description of a mouse Product ... ... ... Desktop computer and accessories ... ... pet mouse keyboard

The proposed performance evaluation metric: average revenue loss revenue loss of product x • example weight is the potential annual revenue of product x • error function is the loss ratio • the percentage of the potential revenue a vendor will lose due to misclassification from class y to class y’. • a non-decreasing monotonic function of hierarchical distance between y and y’, f(d(y, y’))

Learning – minimizing average revenue loss Minimize convex upper bound

Multi-class SVM with margin re-scaling

Multi-class SVM with margin re-scaling Convex upper bound of plug in any loss function

Dataset • UNSPSC (United Nations Standard Product and Service Code) dataset • Product revenues are simulated • revenue = price * sales

Experimental results Average revenue loss (in K$) of different algorithms

What’s wrong? Revenue loss ranges from a few K to several M

Loss normalization • Linearly scale loss function to a fixed range , say [1, 10] The objective now upper bounds both 0-1 loss and the average normalized loss.

Final results 7.88% reduction in average revenue loss! Average revenue loss (in K$) of different algorithms

Conclusion empirical risk, average misclassification cost: What do we really care about for this task? Minimize error rate? Minimize revenue loss? Performance evaluation metric regularized empirical risk minimization A general method: multi-class SVM with margin re-scaling and loss normalization How do we approximate the performance evaluation metric to make it tractable? Model + Tractable loss function Optimization Find the best parameters

Thank you! Questions?

Cost-Sensitive Learning for Large-Scale Hierarchical Classification of Commercial Products

Cost-Sensitive Learning for Large-Scale Hierarchical Classification of Commercial Products

Presentation Transcript

Hierarchical Cost-sensitive Web Resource Acquisition for Record Matching

Class Imbalance vs. Cost-Sensitive Learning

EMERGING SYSTEMS FOR LARGE-SCALE MACHINE LEARNING

Large Scale Multi-Label Classification

Large -Scale Cost-sensitive Online Social Network Profile Linkage

ImageNet : A Large-Scale Hierarchical Image Database

Efficient Large-Scale Structured Learning

Hierarchical Load Balancing for Large Scale Supercomputers

Hierarchical Semantic Indexing for Large Scale Image Retrieval

Cost-Sensitive Learning for Large-Scale Hierarchical Classification of Commercial Products

Ensembles for Cost-Sensitive Learning

Large Scale Acquisition of Land for Commercial Investment

Cost- sensitive boosting for classification of imbalanced data

Test-Cost Sensitive Naïve Bayes Classification

LARGE-SCALE DISTANCE LEARNING INITIATIVES

Large-Scale Machine Learning: SVM

Landmark Classification in Large-scale Image Collections

Hierarchical Features of Large-scale Cortical connectivity

A Framework for Hierarchical Cost-sensitive Web Resource Acquisition

Large-Scale Automatic Classification of Phishing Pages

Hierarchical Data Dissemination Scheme for Large Scale Sensor Networks

ImageNet : A Large-Scale Hierarchical Image Database