Visualizing Symbolic Data by closed shapes

Visualizing Symbolic Data by closed shapes Antonio Irpino*, N.Carlo Lauro**, Rosanna Verde* * Second University of Naples, Italy ** University of Naples Federico II, Italy

Problem • In the symbolic object visualization on factorial planes the MCAR (Minimum Covering Area Rectangle) shape arise overgenarization problem - even if it allows to easily interprete SO images in terms of symbolic assertions. • Aims • Visualize SO’s by new shapes (convex hull, connected shapes) • Tools • A fast algorithm for the generation of a 2D convex-hull, PECS (parallel edges connected shapes) • some indexes for evaluating the overgeneralization induced by the new visualization shapes

Visualization of symbolic data in the original space where the symbolic descriptors are intervals hypercube representation

visualization of symbolic data in reduced subspaces (factorial planes) • Factorial analysis performsan orthogonal projections of SO’s onto factorial subspaces. • A factorial subspace is a linear combination of the symbolic descriptors • the space spanned by factorial variables is an affine transformation of the originary space • An affine transformation is invariant with respect to linear properties of the originary space i.e.: a linear projection of a convex shape is a convex shape, and so on.

Choice of the most suitable SO visualization(1) • A linear projection of a convex shape is a convex shape (M,M,M) (m,M,M) V2 Y2 (m,M,m) (M,M,m) CH Y3 (m,m,M) (M,m,M) MCAR (m,m,m) (M,m,m) V1 Y1 Factorial plane Originary space Convex hull of the vertices is the best visualization shape (i.e. no-overfitting)

A quick algorithm for 2D convex-hull(1) Given n separated points the best algorithm for the 2d convex hull computation have to do at least n2comparison operations

A quick algorithm for 2D convex-hull(2) • The proposed algorithm (2dPPCH- 2d Parallelotope Projected Convex Hull) is based on the following principles: • In the original descriptors space the hypervolume is a parallelotope symmetrical to the baricentre • A linear projection of a convex shape is a convex shape, simmetrical shape is symmetrical in the reduced subspace • The 2d projected convex shape edges are always edges of the original shape Given p interval variables each object with no tiny intervals is represented in the original space by n=2p points and p2p-1 edges. Classical 2d CH algorithm falls down!

A quick algorithm for 2D convex-hull(3) • It is possible to demonstrate that the 2D convex hull of a parallelotope has 2p extreme points. • By means of baricentrical symmetrical properties, it is sufficient to determine only p points and the other p are easily identified. • First step • Identification of an extremal point • Point1 • its symmetrical (Pointp+1) is determinated Factorial plane

A quick algorithm for 2D convex-hull(4) • Second step • i=1 • While (ip) • { • Search the p-i+1 points candidated linked to Pointi • (for example when p=3 and i=1 if mMm is the configuration of Pointi the linked point are MMmmmmmMM) • Calculate angles between V1 and the edges at which Pointi belongs • The minimum angle is the external edge and Pointi+1 is determinated and its symmetrical Pointp+i+1 too • i=i+1 }

A quick algorithm for 2D convex-hull(5) Computational complexity Given p>4 variables the operations (in terms of comparation of points) done in step 1 and 2 are p • 1 1 2 • 3 4 4 • 6 9 8 • 10 16 16 • 15 25 32

A SO quality visualization index based on Potential Descriptor according to the analysis technique The Potential descriptor (De Carvalho, 92) for an object is the volume of the SO in the originary space of representation then is a multiplicative function Using Log trasformation we can obtain an additive function In Symbolic Factorial analysis it corresponds to the volume of the convex hull of the vetices projected in the space of defined by the new n factors that can easily computed as a determinant of a a quare matrix of dimension n. Given a subspace of k<n factors the % SO variability explained can be computed as follows

v2 Overfitting v1 Factorial plane 1-2 Interpretation: MCAR vs CH

V2 V1 Visualization vs Interpretation(Parrallel Edges Connected Shapes) Aim: To build a connected shape with edges parallel to factorial axes both contained in MCAR and contains CH of projected vertices

Searching step 1 For each p/2 edge Area Straight Line equation Segment Bounds Being a quadratic optimization problem it admits a global optimum We choose the point which generates the maximum cut area among the p/2 edges

V2 V1 Searching step 2 We perform the same optimization of step 1 for zone 1, 2 and 3, and so on for the other steps Zone 1 Zone 2 Zone 3

v2 v2 60 5045 40 60 40 Overfitting Convex hull of projected vertices v1 v1 10 14 16 20 10 20 MCAR vs PECS

V2 V1 Example V2 4r 2r V1 V2 V2 8r 6r V1 V1

How many rules? • It is important to choose a criterion to stop the generation of cutting rules. In fact, in order to have no overfitting the number of rules is  • Given the following % of overfitting • Two criteria can be choosen • Given a soil [0,1] on the % of overfitting (Over fitting criterion) • The user can choose a maximum number of rules, and only those rules that allows the maximum decrease of overfitting are generated (Interpretability criterion)

Open problems • generalization of the CH algorithm to 3D and to n-dimensions • generalization of PECS to 3D and to n-dimensions • new visualization indexes based on the explained internal variability of the SO’s according to the analysis • reduction of over-generalization problems in classification context by closed shapes (Rasson, 2002)

Ichino oils data set

Visualizing Symbolic Data by closed shapes

Visualizing Symbolic Data by closed shapes

Presentation Transcript

Visualizing Data

Visualizing Network Data

Visualizing Census Data

Visualizing Library Data

Visualizing the Data

Analyzing and Visualizing Data

2.3 Visualizing Data

Visualizing Data with ROOT

Visualizing Linked Open Data

Processing Symbolic Data

Visualizing Government Data Flows

Visualizing Big Data

Visualizing Tabular Data

Visualizing Multi-Dimensional Data

Visualizing Big Data

Shapes of Closed Phospholipid Membranes with Compartments

CHS 221 Visualizing Data

Visualizing Census Data