1 / 14

CS143 Project 1

CS143 Project 1. Due: Oct 24 th , 11:59 PM All the materials will be posted in courseweb . Before we start. Two things to do: Find your partner At most 2 students

clyde
Télécharger la présentation

CS143 Project 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS143 Project 1 Due: Oct 24th, 11:59 PM All the materials will be posted in courseweb.

  2. Before we start • Two things to do: • Find your partner • At most 2 students • Send team information (you and your partner's name, UID, email, expected password for your MySQL account) to one of TAs by Oct 12th • Note: This team is for your project, NOT homework. You need to finish your homework individually. • Get familiar with Linux and MySQL

  3. Linux • A Unix-like, open source operating system • All the projects will be done on the SEASNET linux server. • lnxsrv03.seas.ucla.edu • How to access the server? • If you are using SEASNET machine, all SEASNET machines already have a secure shell client installed, so you simply need to run the client. • If you need to access from a personal machine that does not have a secure shell client, you can download a windows secure shell client http://www.filewatcher.com/m/SSHSecureShellClient-3.2.9.exe.5517312.0.0.html Or Putty http://www.chiark.greenend.org.uk/~sgtatham/putty/ • Mac OS X or Unix machines have a secure shell client preinstalled. Simply type "ssh -l <userid> lnxsrv03.seas.ucla.edu" within your command line interface

  4. Linux • Account for Linux Server • Apply SEASNET account if you don’t have one • Frequently used Linux command: • http://linuxcommand.org/learning_the_shell.php • Try it!

  5. MySQL • Already installed on lnxsrv03 server • Username & password will be assigned after you submit your team request. • MySQL document: http://dev.mysql.com/tech-resources/articles/mysql_intro.html

  6. Project 1 • Step 1: Loading the data • Step 2: Running easy queries • Step 3: Applying some constraints • Step 4: Join operation • Step 5: A more complicated query • Step 6: Putting all together

  7. Step 1: Loading the data • There are 5 data files located at /u/cs/class/cs143/cs143ta/proj1/data/ • Author.csv, Coauthored.csv, Authored.csv, Paper.csv, Cites.csv • Also available at the courseweb • Task: Load these 5 data files to MySQL using the “load data” command • Before Loading, you should: • 1. Log in MySQL • 2. Use your own database • 3. Create 5 tables: Author, Coauthored, Authored, Paper, Cites

  8. Step 1: Loading the data Table Cites: • ID (INTEGER) • paper1ID (INTEGER) • paper2ID (INTEGER) • is_self_citation(INTEGER) Table Authored: • ID (INTEGER) • AuthorID(INTEGER) • paperID(INTEGER) • Email (VARCHAR) • rank_in_author_list(INTEGER) • original_name_str(VARCHAR) • email_domain(VARCHAR) • email_country(VARCHAR) • affiliation_str(VARCHAR) • affil(VARCHAR) Table Paper: • ID (INTEGER) • paper_id(INTEGER) • title_str(VARCHAR) • authors_str(VARCHAR) • area (VARCHAR) • num_abstract_wds(INTEGER) • num_authors(INTEGER) • num_kb(INTEGER) • num_pages(INTEGER) • num_revisions(INTEGER) • num_title_wds(INTEGER) • comments_str(VARCHAR) • submit_date(DATE) • submitter_email(VARCHAR) • submitter_name(VARCHAR) Table Author: • ID (INTEGER) • author_name(VARCHAR) • first_name(VARCHAR) • last_name(VARCHAR) • preferred_name(VARCHAR) Table CoAuthored: • ID (INTEGER) • author1ID (INTEGER) • author2ID (INTEGER) • paper_ID(INTEGER)

  9. Step 2: Running some easy queries • Write queries that return the answers to these questions: • “Give me the author_name of all the Authors with first_name ‘Kevin’.” • “Return author_name and preferred_name of all the Authors who have different author_name and preferred_name.” Sort your results first by author_name then by preferred_name.

  10. Step 3: Applying some constraints • Add a unique key constraint to the CoAuthored table in which the combination of (author1ID, author2ID, paper_ID) should be unique. • Add foreign key constraints for author1ID and author2ID. • More details in project description.

  11. Step 4: Join operation • Write queries that return the answers to these questions: • “Return the author_name of all co-authors of the author with ID ‘42673’.” • “Return the author_name of all authors who have more than 10 co-authors.”

  12. Step 5: A more complicated query • Write one query that returns the answer to following question: • “Give me the author_name of all the authors with the number of papers they co-authored in, in the decreasing order of the number of paper.”

  13. Step 6: Putting all together • Create a script named P1 that shows every step in this part of the project. You can use the ‘--' tag to make comments within your SQL script. Make sure you give adequate comments documenting each part of each step. • Execute the script and save all outputs in a file call P1_Output. • Add one README file, which includes you and your partner's name, UID, email, and any other information you think is useful. • Make a zip file and submit through courseweb.

More Related