210 likes | 340 Vues
This talk outlines a framework for automatic main subject detection in photography, emphasizing the integration of domain-specific knowledge through Bayesian networks. It addresses key challenges in identifying the main subject within images, which often lack ground truth due to the subjective nature of photography. The evidence fusion framework combines data from various feature detectors, leveraging metadata (like camera orientation and subject distance) to enhance detection accuracy. The research proposes innovative methodologies for segmentation and recognition, and explores future directions for improving image processing applications.
E N D
Adding Domain-Specific Knowledge Amit Singhal & Jiebo Luo Research Laboratories Eastman Kodak Company FUSION 2001, Montreal August 7-10, 2001
Outline of Talk • Problem Statement • Background • Relevant Prior Art • Evidence Fusion Framework • Automatic Main Subject Detection System • Injecting Orientation Information • Feature detectors • Conclusions • Future Work
Main Subject Detection • What Is the Main Subject in A Picture? • 1st-party truth (the photographer): in general not available due to the specific knowledge the photographer may have about the setting • 3rd-party truth: in general there is good agreement among 3rd-party observers if the photographer successfully used the picture to communicate his interest in the main subject to the viewers
Related Prior Art • Main subject (region-of-interest) detection • Milanese (1993) : Uses biologically motivated models for identifying regions of interest in simple pictures containing highly contrasting foreground and background. • Marichal (et al.) (1996), Zhao (et al.) (1996) : Use a subjective fuzzy modeling approach to describe semantic interest in video sequences (primarily video-conferencing). • Syeda-Mahmood (1998) : Uses a color-based approach to isolate regions in an image likely to belong to the same object. Main application is reduction of search space for object recognition • Evidence Fusion • Pearl (1988) :Provides a theory and evidence propagation scheme for Bayesian networks. • Rimey & Brown (1994) : Use Bayesian networks for control of selective perception in a structured spatial scene. • Buxton (et al.) (1998) : Use a set of Bayesian networks to integrate sensor information to infer behaviors in a traffic monitoring application.
The Evidence Fusion Framework • Region based representation scheme. • Virtual belief sensors map output of physical sensors and algorithmic feature detectors to probabilistic space. • Domain knowledge used to generate network structure. • Expert knowledge and ground truth-based training methodologies to generate the priors and the conditional probability matrices. • Bayesian network combines evidence generated by the sensors and feature detectors using a very fast message passing scheme.
Bayesian Networks • A directed acyclic graph • Each node represents an entity (random variable) in the domain • Each link represents a causality relationship and connects two nodes in the network • The direction of the link represents the direction of causality • Each link encodes the conditional probability between the parent and child nodes • Evaluation of the Bayes network is equivalent to knowing the joint probability distribution
Automatic Main Subject Detection System • An Interesting Research Problem • Conventional wisdom (or how a human performs such a task) • Object Segmentation -> Object Recognition -> Main Subject Determination • Object recognition is an unconstrained problem in consumer photographs • Inherent Ambiguity • 3rd party probabilistic ground truth • Large number of camera sensors and feature detectors • Speed and performance scalability concerns • Of extreme industrial interest to digital photofinishing • Allows for automatic image enhancements to produce better photographic prints • Other applications such as • Image compression, storage, and transmission • Automatic image recompositing • Object-based image indexing and retrieval
Overview • Methodology • Produce a belief map of regions in the scene being part of the main subject • Utilize a region-based representation of the image derived from image segmentation and perceptual grouping • Utilize semantic features (human flesh and face, sky, grass) and general saliency features (color, texture, shape and geometric features) • Utilize a Bayes Net-based architecture for knowledge representation and evidence inference • Dealing with Intrinsic Ambiguity • Ground truth is “probabilistic” not “deterministic” • Limitations in our understanding of the problem • Dealing with “Weak” Vision Features • Reality of the state-of-the-art of computer vision • Limited accuracy of the current feature extraction algorithms
Injecting Metadata into the System • Sources of metadata • Camera : Flash fired, Subject distance, Orientation etc. • IU Algorithms : Indoor/Outdoor, Scene type, Orientation etc. • User annotation • The Bayesian network is very flexible and can be quickly adapted to take advantage of available metadata • Metadata enabled knowledge can be injected into the system using • Metadata-aware feature detectors • Metadata-enhanced Bayesian networks
Orientation • Main difference between orientation-aware and orientation non-aware systems is in the location features
Borderness Feature • Orientation Unaware • a=b=c=d=e • Orientation Aware • a < b < c < d < e
Orientation Aware Bayesian Network • Use orientation aware centrality and borderness features • Other feature detectors affected by orientation but not retrained: • sky, grass • Not retrained if BN is used for main subject detection as the location features would account for the orientation information • Using orientation information to compute the sky and grass evidence would lead to better performance for a sky or grass detection system. • Retrain the links in the Bayesian network for each feature affected by orientation information • BorderA-Borderness • BorderD-Borderness • Borderness-Location • Centrality-Location • Location-MainSubject
Conclusions and Future Work • Bayesian networks offer the flexibility of easily incorporating domain specific knowledge such as orientation information into the system • This knowledge can be added by : • modifying the feature detectors • using new feature detectors • changing the structure of the Bayesian network • retraining the conditional probability matrices associated with the Bayesian network • Directions for Future Work • Use of additional metadata such as indoor/outdoor, urban/rural, day/night • Single super BN versus a library of metadata-aware BNs?