CIS 2033

CIS 2033 Based on text book: F.M. Dekking, C. Kraaikamp, H.P.Lopulaa, L.E.Meester. A Modern Introduction to Probability and Statistics Understanding Why and How Instructor: Dr. Longin Jan Latecki

Chapter 15 Exploratory data analysis: graphical summaries The set of observations is called a dataset. By exploring the dataset we can gain insight into what probability model suits the phenomenon. To graphically represent univariate datasets, consisting of repeated measurements of one particular quantity, we discuss the classical histogram, the more recently introduced kernel density estimates and the empirical distribution function. To represent a bivariate dataset, which consists of repeated measurements of two quantities, we use the scatterplot.

15.2 Histograms: The term histogram appears to have been used first by Karl Pearson.

Histogram construction and pdf Denote a generic (univariate) dataset of size n by First we divide the range of the data into intervals. These intervals are called bins and denoted by The length of an interval Bi is denoted by ǀBiǀ and is called the bin width. We want the area under the histogram on each bin Bi to reflect the number of elements in Bi. Since the total area 1 under the histogram then corresponds to the total number of elements n in the dataset, the area under the histogram on a bin Bi is equal to the proportion of elements in Bi: The height of the histogram on bin Bi must be equal to As we know from Ch. 13.4, the histogram approximates the pdf f, in particular, for a bin centered at point a, Ba=(a-h, a+h], we have

In Matlab: binwidth=0.5; bincenters=[0.5:binwidth:9.5]; hx=hist(x,bincenters)/(200*binwidth); The function g in blue is a mixture of two Gaussians. We draw 200 samples from it,which are shown as blue dots. We use the samples to generate the histogram (yellow) and its kernel density estimate f (red). The Matlab script is twoGaussKernelDensity1.m

Choice of the bin width Consider a histogram with bins of equal width. In that case the bins are of the from where r is some reference point smaller than the minimum of the dataset and b denotes the bin width. Mathematical research, however, has provided some guide- line for a data-based choice for b or m, where s is the sample std:

15.3 Kernel density estimates

A kernel K is a function K:RR and a kernel K typically satisfies the following conditions.

Examples of Kernel Construction

Scaling the kernel K Then put a scaled kernel around each element xi in the dataset Scale the kernel K into the function

The bandwidth is too big The bandwidth is too small

The function g in blue is a mixture of two Gaussians. We draw 200 samples from it,which are shown as blue dots. We use the samples to generate the histogram (yellow) and its kernel density estimate f (red). The Matlab script is twoGaussKernelDensity1.m

15.4 The empirical distribution function Another way to graphically represent a dataset is to plot the data in a cumulative manner. This can be done by using the empirical cumulative distribution function .

Empirical distribution function Continued

Example 15.6. Given is the following information about a histogram, compute the value of the empirical distribution function at point t = 7: Because (2 - 0) * 0.245 + (4 - 2) * 0.130 + (7 - 4) * 0.050 + (11 - 7) * 0.020 + (15 - 11) * 0.005 = 1, there are no data points outside the listed bins. Hence By: Wanwisa Smith

Relation between histogram and empirical cdf 15.11. Given is a histogram and the empirical distribution function Fnofthe same dataset. Show that the height of the histogram on a bin (a, b]is equal to The height of the histogram on a bin Bi = (a, b] is Hence By: Wanwisa Smith

15.5 Scatterplot In some situation we might wants to investigate the relationship between two or more variable. In the case of two variables x and y, the dataset consists of pairs of observations: We call such a dataset a bivariate dataset in contrast to the univariate. The plot the points (Xi, Yi) for i = 1, 2, …,n is called a scatterplot.

CIS 2033

CIS 2033

Presentation Transcript

Teacher Evaluation and Career Status under Senate Bill 2033

Faculty of Economics & Business (Systems Development : EBS 2033)

SENATE BILL 2033

CIS 725

Bi Tran 2033 Emulator

CIS

CIS 2033

CIS 4930 / CIS 6930 Mobile Networking

CIS 203

IP AUDIO CONFERENCE PHONE 2033 CUSTOMER OVERVIEW April 2005

Long Term Revenue Outlook 2008-2033

cm4k-1395842167-2033-7002

IKG2B3 2033 Metoda Komputasi

2033

CIS 2033

CIS 2033

CIS 2033

Presentation Transcript

Teacher Evaluation and Career Status under Senate Bill 2033

Faculty of Economics &amp; Business (Systems Development : EBS 2033)

SENATE BILL 2033

CIS 725

Bi Tran 2033 Emulator

CIS

CIS 2033

CIS 4930 / CIS 6930 Mobile Networking

CIS 203

IP AUDIO CONFERENCE PHONE 2033 CUSTOMER OVERVIEW April 2005

Long Term Revenue Outlook 2008-2033

cm4k-1395842167-2033-7002

IKG2B3 2033 Metoda Komputasi

2033

CIS 2033

Faculty of Economics & Business (Systems Development : EBS 2033)