# Program 3

Télécharger la présentation

## Program 3

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. BCNF vs 3NF Program 3

2. BCNF • Written recursively • Pseudo code: doBCNF(schema r , fds) Let fd = BCNF violating dependency in fds if fd is null then return else split into two tables table1 = fd[domain] U fd[range] table2 = r – fd[range] doBCNF(table2, fds)

3. BCNF Results • Average time taken to do decomposition • 36ms • Pros: • Easy to code • Fast • Cons: • Fds lost (see sample inputs)

4. BCNF Sample Inputs • Example From Class • Input: • R = {A,B,C,D,E,G} • F = {AB->C, C->A, BC->D, ACD->B, D->EG, BE->C,CG->BD,CE->AG} • Output: • { {C,A}, {D,E,G}, {B,C,D} } • Lost Fds: • ACD->B, AB->C,BE->C,GC->BD,CE->AG

5. Another BCNF Example • From Class Also • Input: • R = {C,T,H,R,S,G} • F = {CS->G, C->T, HR->C, HS->R, HT->R} • Output: • { {C,S,G}, {C,T}, {H,R,C}, {H,R,S} } • Lost Fds • HT->R

6. Beers Example (from book) • Input: • R = {beer, manf, bar, bar_addr, license, person, phone, person_addr, price} • F = { {beer -> manf}, {bar -> bar_addr license}, {person -> person_addr, phone}, {bar, beer -> price } } • Output: • { {beer, manf}, {bar, bar_addr, license}, {person, phone, person_addr}, {beer, bar, price} } • Lost Fds • None

7. Movies Example • Input: • R={Movie, Year, Director, Studio, LeadActor, LeadActress, Duration, BoxOfficeRevenue, Awards, Rating} • F = { {Movie,Year -> Director,Studio,Duration,BoxOfficeRevenue, Awards, Rating}, {Director, Studio, Year -> Movie}, {LeadActor,Year -> Movie} {LeadActress, Year -> Movie} } • Output • {Movie,Year,Director,Studio,Duration,BoxOfficeRevenue,Awards,Rating}, {Year,LeadActor,Movie} ,{Year,LeadActor,LeadActress} • Lost Fds • LeadActress,Year->Movie

8. 3NF Synthesis • Implementation • Broke into modules that performed a separate step in the algorithm • MinimalCover, MergeLHS, FormSubSchema, MergeSubShema, AddMissing, AddKey • Average Time Taken: 200ms • Pros: • Lossless • Cons: • Slow and complex

9. 3NF Example 1 • Example From Class • Input: • R = {A,B,C,D,E,G} • F = {AB->C, C->A, BC->D, ACD->B, D->EG, BE->C,CG->BD,CE->AG} • Output: • { {A,B,C}, {B,C,D}, {D,E,G}, {B,E,C}, {C,G,B}, {C,E,G} }

10. 3NF Example 2 • From Class Also • Input: • R = {C,T,H,R,S,G} • F = {CS->G, C->T, HR->C, HS->R, HT->R} • Output: • { {S,C,G}, {C,T}, {R,H,C}, {S,H,R}, {T,H,R} }

11. 3NF Example 3 • Input: • R = {beer, manf, bar, bar_addr, license, person, phone, person_addr, price} • F = { {beer -> manf}, {bar -> bar_addr license}, {person -> person_addr, phone}, {bar, beer -> price } } • Output: • { {beer, manf}, {bar,bar_addr,license}, {person,phone,person_addr}, {beer,bar,price} } • Same as BCNF

12. 3NF Example 4 • Input: • R={Movie, Year, Director, Studio, LeadActor, LeadActress, Duration, BoxOfficeRevenue, Awards, Rating} • F = { {Movie,Year -> Director,Studio,Duration,BoxOfficeRevenue, Awards, Rating}, {Director, Studio, Year -> Movie}, {LeadActor,Year -> Movie} {LeadActress, Year -> Movie} } • Output • {Movie,Year,Director,Studio,Duration,BoxOfficeRevenue,Awards,Rating}, {Year,LeadActor,Movie} ,{Year,LeadActor,LeadActress}, {Year,LeadActress,Movie}

13. Hybrid Approach • First use 3NF to generate temporary schemas • Next, use BCNF on each of the temporary schemas to further decompose • Eliminate possible redundancies • Average Time: 227ms (which is to be expected)

14. Hybrid Example 1 • Example From Class • Input: • R = {A,B,C,D,E,G} • F = {AB->C, C->A, BC->D, ACD->B, D->EG, BE->C,CG->BD,CE->AG} • Output: • { {C,A}, {B,C,D}, {D,E,G}, {E,B,C}, {G,C,B}, {E,C,G} } • Lost Fds • ?

15. Hybrid: Movies • Input: • R={Movie, Year, Director, Studio, LeadActor, LeadActress, Duration, BoxOfficeRevenue, Awards, Rating} • F = { {Movie,Year -> Director,Studio,Duration,BoxOfficeRevenue, Awards, Rating}, {Director, Studio, Year -> Movie}, {LeadActor,Year -> Movie} {LeadActress, Year -> Movie} } • Output • {Movie,Year,Director,Studio,Duration,BoxOfficeRevenue,Awards,Rating}, {Year,LeadActor,Movie} ,{Year,LeadActor,LeadActress}, {Year,LeadActress,Movie} • Same as 3NF

16. Data Mining Program 4

17. Data Mining • Artificial Neural Network • Weka Implementation • ANN with error back propagation • Oracle-XE Database • Java Connectivity • Oracle Thin Driver • Get database data with java and use Weka to build ANN and classify

18. Data Set • Census Data • Predict Income Class of US citizens given census data. • Income Class: <=\$50,000 or >\$50,000 • For simplicity (difficult to predict exact income) • > 30,000 tuples • CSV: • http://cs.uga.edu/~mcknight/nlp/data.csv • Database: (login required) • http://128.192.101.74:9090/apex

19. Results • ANN with 27 internal nodes, 55 input nodes • Training Data: • 66% of original data used for training • the remaining 33% used as test set for classification • Training Time: 2050.91 seconds • Error: Correctly classified: 84.0665 % Mean absolute error: 0.1698 Root mean squared error: 0.3558 Relative absolute error: 46.6323 % Root relative squared error: 83.9026 %