1 / 37

PRESENTED BY TEAM 3 Min Lin Minchao Lin Tian Liu Yifai NG Haochen shan Yigang Wang Guo Yu

Center for African Studies The MasterCard Foundation Scholars Program Database Project Final Presentation. PRESENTED BY TEAM 3 Min Lin Minchao Lin Tian Liu Yifai NG Haochen shan Yigang Wang Guo Yu Ying Yang Ye Zhong. Client Overview. UC Berkeley Center for African Studies (CAS)

Télécharger la présentation

PRESENTED BY TEAM 3 Min Lin Minchao Lin Tian Liu Yifai NG Haochen shan Yigang Wang Guo Yu

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Center for African StudiesThe MasterCard Foundation Scholars ProgramDatabase Project Final Presentation PRESENTED BY TEAM 3 Min Lin Minchao Lin Tian Liu Yifai NG Haochenshan Yigang Wang Guo Yu Ying Yang Ye Zhong

  2. Client Overview • UC Berkeley Center for African Studies (CAS) • The MasterCard Foundation Program • Provide comprehensive support for Education • For economically disadvantaged students from developing countries in Africa • Financial, academic, social, and career counseling • 2012-2020, 25 new students per year • Client Need • A new database help better manage the program • Specialized in tracking financial transactions • Also store data andinformationrelated to the program • Financial, academic, social, career

  3. Project Review DP I DP II • Familiarized with clients and understood their needs • Created 1st version of Simplified EER diagram • Set up schedule for the project • Expanded and Revised EER diagram • Conceptualized 5 queries • Developed Relational Schema DP III Final Presentation • Finalized EER Diagram • Created database in Access and implemented relations into Access • Altered Queries and developed SQLcode • ImplementedQueries into Access • Modified relational Schema • Did Normalization Analysis

  4. EER Diagram

  5. Access Relationships

  6. Relational Schema 1. Person (PID, Lname, Fname, MI, Birth_Date, Nationality, Gender, Primary_Phone_No, Address) 1a. Alumni (Alumini_PID1, Occupation, Company, Degree, Graduation_Date, Admission_Date) 1b. Student (Student_PID1, SID) 1bb. Pre_Student(Student_PID1, Financial_Background, SAT_Score, TOFEL_Score, High_School_GPA, No_HighSchool_Honor, No_HighSchool_Awards, HID9) 1bc. Student_current_status(SID, Mentor_Employee_PID1, Class_Level, Expected_Graduation_Date, App_Date, College_GPA) 1c. Employee (Employee_PID1, Position) 2. TransitionAccount (TID, Amount, Time, Year, Month, Date, PC_ID3) 2a. Withdrawal (W_TID2) 2aa. Check (Check_TID2, Description, Responsible_Person, Account_No) 2aaa. Account_Detail (Account_No, Routing_No) 2ab. Bearbuy (Bear_TID2 ) 2ac. Cash (C_TID2, Responsible_Person) 2ad. BlueCard (BC_TID2) 2b. Deposit (D_TID2) 3. Personal_Card (PC_ID, Year, Month, Day, Time, Amount, Reimbursed, Employee_PID1) 4. Expense (EID, Amount, Year, Month, Day, Description, PMID5) 4a. InternalExpense (I_EID4) 4ab. Tuition (T_EID4, Semester, Degree_Class) 4ac. CourseMaterial (C_EID4, Course_Name, ISBN) 4b. ExternalExpense (E_EID4, OID5) 4ba. Summer_Winter_Housing (SWH_EID4, Address, Start_Date, Finish_Date) 4baa. Summer_Winter_Housing_Timeline (Start_Date, Finish_Date, Duration) 4bab. Housing_Information(Address, Landlord, Agent,Contact_Phone_No) 4bb. External_Course_Material (EC_EID4, Course_Name, ISBN) 4bc. SEVIS_Fee (S_EID4) 4bd. Travel_Airline_Ticket (TAT_EID4, Flight_No, Airline_Name) 4bda. Ticket_Detail (Flight_No, Airline_Name, Date, Departure_Location, Destination, Class) 4be. Office_Supplies (O_EID4, Supply_Name, Quantity, Lead_Time, Price, Discount) 5. Order (OID, Total_Amount, Description, Product_Name, Quantity, Price, Date,Req_Student_PID1) 5a. OnlineOrder (OOID5, Tracking_No, Discount, OSID) 5b. NormalOrder (NOID5) 6. Item (NOID5, IID, Iname, Description) 7. OnlineShop(OSID7, Website, Shipping_Method) 7a.Website_Email_Address (Website, Supplier_Email_Address ,Description)

  7. Relational Schema (Continue) 8. Course (CID, Semester, Professor, Final_Date, Description, Ctitle) 8a. GE (GE_CID8, Category) 8b. LowerDivision (L_CID8) 8c. UpperDivision (U_CID8) 8d. Sections(CID, Section_Number) 8e. Ctitle (CID, Ctitle) 9.High_School(HID, Name, School_Type, Year_Founded, Relidious_Affiliation, Academic_Calender, Setting, Student_Faculty_Ratio, Total_Enrollment_No, Gender_Ratio, College_Enrollment_Rate) 10.Job (JID, Position, No_Employee, Start_Date, Finish_date, Work_On_Alumini_PID1, Work_on_Student_PID1) 10a.Job_Detail (Position, Job_function, Location, Salary, Major_Req, Skill_Need, Description, Hour_Per_Week, Language_Need, Visa_Req) 10b.Job-Timeline (Start_Date, Finish_date, Duration) 10c. Full_1Time (F_JID10, Degree_Level, Exp_Req, Pre_I_JID10c) 10d. Part_Time (P_JID10, Degree_Level, Exp_Req) 10e. Internship (I_JID10, Class_Level, Referrer) 11.Company (Company_ID, Company_Name, Description, Company_Type, Contact_Phone_No, Size, Industry_Type, Email_Address, Website, City, Street, Zip_Code) 12.Country (Name, No_Student) 13.Event (Event_ID, title, Start_Date, Finish_date, Duration, No_People, Sponsor, City) 13a. CAS_Event (CAS_Event_ID13, Description, Cost) 13b. Other_Event (O_Event_ID13, Description) 13ba. Recruiting_Event. (R_Event_ID13, Industry_Type, Job_Class, Major_Preferred) 14. Grade (Student_PID1, CID8, Grade_Option, Grade, Semester) 15. Survey (Survey_ID, title, date, description) 15a. Alumni_Survey (A_Survey_ID15, Alumini_PID1) 15b. Student_Survey (S_Survey_ID15, Student_PID1) N to N Relationships: 16.Withdrawal_Pay_Expense (W_TID2, EID4) 17.Employee_Check_Inventory (Employee_PID1, OID5) 18.Student_Take_Course (Student_PID1, CID8, Grade, Grade_Option, Semesters) 18a. Ctitle(CID8, Ctitle) 19.Student_Participate_Event (Student_PID1, Event_ID13) 20.Company_Provide_Job (Company_ID11, JID10) 21.Company_Participate_Recuriting_Event (Company_ID11, Recuriting_Event_ID13) 22.Company_Located_Country (Company_ID11, Country_Name12) 23.Event_Located_Country (Event_ID13, Country_Name12) Multivalue: 24.Person_Email (PID1, E-mail_Address) 25.Alumni_Major (Alumni_PID1a, Major) 26.Student_Major (Student_PID1b, Major) 27.Course_Midterm_Date (CID8, Year, Month, Day)

  8. Normalization Analysis

  9. Decomposing to 1NF and 2NF Course (CID, Semester, Professor, Final_Date, Section_Number, Description, Ctitle) To1NF Course (CID, Semester, Professor, Final_Date, Description, Ctitle) Sections(CID, Section_Number) Student_Take_Course Student_PIDCID Grade Grade_Option Semester Ctitle To2NF Student_Take_Course (Student_PID, CID, Grade, Grade_Option, Semesters) CourseTite(CID, Ctitle)

  10. Decomposing to 3NF Check Check_TID2 Description Responsible_PersonAccount_NoRouting_No To3NF Check (Check_TID2, Description, Responsible_Person, Account_No) Account_Detail (Account_No, Routing_No)

  11. Decomposing to BCNF Course CID Semester ProfessorFinal_Date DescriptionCtitle ToBCNF Course (CID, Semester, Professor, Final_Date, Section_Number, Description) Ctitle (CID,Ctitle)

  12. Queries

  13. Query 1: Demand Forecasting and EOQ Object Benefits

  14. Query 1: Demand Forecasting - SQL SELECT Order.ProductName AS Product, sum(Order.Quantity) AS SepTotalQuantity FROM [Order] WHERE Order.Date like "8/*/2013” GROUP BY Order.ProductName; Step 1: Extract the Data from Access. Get the order quantity of a specificproduct over a period. Sampleoutput: A 3*3 matrix include all the product’s order quantity for a Specific period of a specific year

  15. Query 1: Demand Forecasting - Process Step 2: Calculate the seasonalfactor and monthlydemandforecast with a calculator program wrote by Java. Step 3: Putthe result backinto SQL to get the Economic order quantity (EOQ) model.

  16. Query 1: Demand Forecasting - Process Part of Java code

  17. Query 1: EOQ - SQL Checkwhether the inventory is stockedout or not SELECT DISTINCT Order.ProductName, IIF(Item.Quantity=0,“Yes”,“No”) AS StockOut, Round(Sqr(2*(OnlineOrder.ShippingFee)*(OnlineOrder.MonthlyDemand)/(0.1*Order.Price))) AS OptimalOrderQ, IIf(OnlineOrder.ShippingTime>0,Round((OnlineOrder.ShippingTime)*(OnlineOrder.MonthlyDemand)/(0.1*Order.Price)),0) AS ReorderPointQuantity, Round(((Sqr(2*(OnlineOrder.ShippingFee)*(OnlineOrder.MonthlyDemand)/(0.1*Order.Price))/10))*30) AS OrderCycleDays, OnlineShop.Website AS Website, Order.Date AS OrderDate, Order.Date+Round(((Sqr(2*(OnlineOrder.ShippingFee) *(OnlineOrder.MonthlyDemand)/(0.1*Order.Price))/10))*30) AS NextOrderDate FROM Order, OnlineOrder, OnlineShop, Item WHERE Order.OID=OnlineOrder.OID AND OnlineOrder.OSID=OnlineShop.OSID AND Item.OID=Order.OID ORDER BY Order.Date; Calculate optimalorderquantity Calculate Reorderpoint Calculate ordercycleand dates Calculate Next order date

  18. Query 1: EOQ - Output

  19. Query 2: Academic Performance Object Benefits

  20. Query 2: Academic Performance - SQL Select specific variable and combine with coefficients to obtain result by linear regression model SELECT Student.SID,1.714+0.589*A.Indicator-0.00632*IIf(Student.Degree=“Undergraduate”,1,0)+0.0165*IIf(Student.Gender=“Male”,1,0)+0.000644*Student.SATScore-0.0147*B.NumberEventAttend+0.0528*IIF(Student.Research=“Yes”,1,0) AS ExpGPA From Student, (SELECT Student.SID,Count(internship.JID) AS Indicator FROM Student, Job,Internship Where Student.PID=Job.PID AND Job.JID=Internship.JID Group BY Student.SID Union Select Student.SID,0 From Student Where Student.PID NOT IN(SELECT student.PID From Student,Job,Internship Where Student.PID=Job.PID AND Job.JID=Internship.JID))As A, Determine whether a student have ever attend any internships or not (binary variable) Defined as table A

  21. Query 2: Academic Performance – SQL (Cont.) (SELECT Student.SID,Count(StudentParticipateEvent.EventID) AS NumberEventAttend From Student,StudentParticipateEvent Where Student.PID=StudentParticipateEvent.PID Group BY Student.SID) AS B Where A.SID=Student.SID AND B.SID=Student.SID; Determine how many events a student have attend (numerical value) Defined as table B Step 2: Run linear regression over all the variables, then useAkaike Information Criterion to reduce the model to the most efficient model. Implement with R

  22. Query 3: High School Comparison Object Benefits

  23. Query 3: High School Comparison - SQL SELECT Student.PID, Student.HID, Student.CollegeGPA, Student.SATScore, Student.TOFELScore, Student.[HighSchoolEvents#], Student.[HighSchoolAward#] FROM Student WHERE (((Student.HID)=1)) OR (((Student.HID)=2)); Step 1: Extract the data from Access by SQL Step 2: Calculate the mean of each corresponding category of all admitted students from these two high schools, then use t-test with unequal variances to get the p-value Implement with Excel

  24. Query 3: High School Comparison - Output Step3: Use Holm-Bonferroni method to find out if each difference is significant. Implement with MATLAB. Step4: Sample output from MATLAB [corrected_p, h]=bonf_holm([0.38 0.414 0.0513 0.334 0.257] ,0.5) corrected_p =1.0020 0.7600 0.2565 1.0280 1.0280 h =0 0 1 0 0

  25. Query 4: Category Expense Object Benefits

  26. Query 4: Category Expense - SQL SELECT Student.PID, Sum(Expense.Amount) AS AmountOfSum FROM (OfficeSupply INNER JOIN Expense ON OfficeSupply.EID = Expense.EID) INNER JOIN Student ON Expense.PID = Student.PID GROUP BY Student.PID ORDER BY Student.PID; Step 1: Find the total expense of each student SELECT Count([OfficeSupply Query].PID) AS CountOfPID, Partition([AmountOfSum],0,1100,100) AS Expr1 FROM [OfficeSupply Query] GROUP BY Partition([AmountOfSum],0,1100,100); Step 2: Generate the data for histogram Step 3: Use Report function to generate graphs, and use Access toolbox to generate the Mean, and Standard Deviation of the distribution. Step 4: If, in most cases, the distribution is bell-distributed, we could use 68-95,99.7 rule, aka Three-sigma rule, to set up expense constraints for students.

  27. Query 4: Category Expense - Output Query Results after the first step & The toolbox that could be used to calculate average and variance quickly Results after the second step Count the number of PID to generate a histogram

  28. Query 4: Category Expense - Output Expense Summary Report (Based on Sample Data)

  29. Query 5: Alumni & Employment Object Benefits

  30. Query 5: Alumni & Employment - SQL SELECT Alumni.PID, IIF(Alumni.Degree="Undergraduate",1,0) AS Degree,count(StudentParticipateEvent.EID) AS NumEventAttend, IIF(AlumniMajor.school_of_medicine = "Yes", 1, 0) As Sch_Medicine, IIF(AlumniMajor.school_of_law= "Yes", 1, 0) As Sch_Law, IIF(AlumniMajor.college_of_engineering = "Yes", 1, 0) As Sch_Engi, IIF(AlumniMajor.school_of_optometry = "Yes", 1, 0) As Sch_Opt IIF(AlumniMajor.college_of_natural_resource = "Yes", 1, 0) As Sch_Nat,IIF(AlumniMajor.college_of_letter_science = "Yes", 1, 0) As Sch_Science, IIF(AlumniMajor.school_of_information = "Yes", 1, 0) As Sch_Inf, IIF(AlumniMajor.school_of_social_welfare = "Yes", 1, 0) As Sch_welfare, IIF(AlumniMajor.haas_business_school = "Yes", 1, 0) As Sch_has, Company.c_latitude, Company.c_longitude FROM Alumni, StudentParticipateEvent, AlumniMajor, Company, FullTime WHERE FullTime.JID = Job.JID AND Company.CompanyID=Job.CompanyID AND Job.PID=Alumni.PID AND Alumni.PID=StudentParticipateEvent.PID AND AlumniMajor.PID=Alumni.PID; Step 1: Extract the data from Access by SQL

  31. Query 5: Alumni & Employment - Process • Step 2: Fitting Logistic Regression Model with R • this model will give the result of the probability for predicted variable to be 1 ( which means this person will get employed or not) • Potentially 20 candidate models • Step 3: Model Selection • Cross-Validation • Employment = GPA + Event + Degree • AIC ( Akaike Information Criterion) • Employment = GPA + Event + Degree + School of Information • BIC ( Bayesian Information Criterion) • Employment = GPA • Deviance Selection • Employment = GPA + Event + Degree

  32. Query 5: Alumni & Employment - Process • Step 4: Cut-off Selection • Find a cut of the predicted probability which will let us judge if the predicted value is 1 or 0 • Method: Building Confusion Matrix • Choosing a cut off probability first • Using confusion Matrix to find the best cut off • Base on the result we choose 0.58

  33. Query 5: Alumni & Employment - Process Step 6: Plot

  34. Query 5: Alumni & Employment - Process • Step 7: Creating XML concatenate with KML plot Alumni’s Company on Google Earth • Creating an KML plot for Alumni who graduated from different college • Plot those coordinates on Google earth • Get intuition employment status geographically for each college in UC Berkeley, which will give us an intuition where has higher employment rate for corresponding college’s current student. • Example for School of Medicine

  35. Query 5: Alumni & Employment - Output Step 8: Implement into GoogleEarth

  36. Future Work & Improvements • Create forms & reports • Make our database more user-friendly • Create additional queries: • Other useful Queries • Such as monthly balance check

  37. Q&A Thank you for listening.

More Related