1 / 13

Stylometry System Midterm Presentation

Stylometry System Midterm Presentation. Team 2 – Gregory Shalhoub, Robin Simon, Jayendra Tailor, Ramesh Iyer http://utopia.csis.pace.edu/cs691/2009-2010/team2/index.html. Seidenberg School of Computer Science and Information Systems. Team Introduction Description of project Team cadence

ilori
Télécharger la présentation

Stylometry System Midterm Presentation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stylometry SystemMidterm Presentation Team 2 – Gregory Shalhoub, Robin Simon, Jayendra Tailor, Ramesh Iyer http://utopia.csis.pace.edu/cs691/2009-2010/team2/index.html Seidenberg School of Computer Science and Information Systems

  2. Team Introduction Description of project Team cadence Project management tools Work Breakdown Structure Gantt Chart Analyses accomplished (tool assessment) Decision – Change in project scope Remainder of semester What has been easy/difficult to date? Agenda

  3. Stylometry team 4 members 2 United States 2 India All working professionals AOL, Bloomberg, IBM All final course to complete MS at Pace Team Introduction

  4. Stylometry - is the study of the unique linguistic styles and writing behaviors of individuals in order to determine authorship Description of Project • Part I • Search to determine an interesting and unique application of stylometry for Research • Part II • Feasibility study on existing tools/applications for email authorship (250 words or less)

  5. Team cadence Meet once a week on Fridays Conference call Team found Skype to be better than AOL VoIP Correspondence Email, Pace Blackboard, team website, phone, file exchange Set agenda and keep meeting minutes on Blackboard Customer cadence Met first time via a conference call Weekly updates via email Dr. Tappert copied on all customer communication Team Cadence

  6. Microsoft Excel Work Breakdown Structure (WBS) Used to track all aspects of key tasks Gantt Chart Used to track project schedule Start and finish dates of all tasks Uses tasks from WBS Project Management Tools

  7. Existing tool/application search Yielded 5 potential tools Initial tool evaluation to eliminate inferior tools 3 tools remaining to conduct feasibility study StyleTool – Ruby https://launchpad.net/styletool Signature – C (need confirmation) http://www.philocomp.net/?pageref=humanities&page=signature JGAAP –Java http://server8.mathcomp.duq.edu/jgaap/w/index.php/Main_Page Delivered evaluation paper to Dr. Tappert and customer Analyses accomplished (tool assessment)

  8. Website Developed team website Interesting and unique application of stylometry for Research Military/Terrorism Email Social Networking (Twitter, Facebook) Chat Developed use case template Additional accomplishments

  9. Original project scope Build a Stylometry system to help identify authors of short messages (less than 250 words) Change in project scope Team found 3 existing powerful stylometry tools Feasibility study to see if any of the 3 tools/applications can identify email authorship Existing tool on open platform (Java) may be foundation for future modification Decision – Change in project scope

  10. Tool/Application feasibility study Experiment Gathered 100 email samples Built testing template for testing and data consistency Match check and No match testing input 9 samples from the subject as known and the 10th sample as an unknown input 10 samples from a subject as known and 1 from another subject as unknown, and the system should respond with a no-match. Run sample emails through each tool Determine if any of the tools can successfully identify email author(s) What tool seems to be a logical candidate for future research and/or modification Detail write up of experiment Final Technical Paper Remainder of semester

  11. Difficult Initially team thought conference calls Using AOL VoIP Team changed to Skype Much better voice quality, no latency issues Currently experiencing an issue with one of the tools Issue running under Linux Easy Team seems to have a good chemistry Customer has been involved and responsive to communications Dr. Tappert has been available when team needed guidance What has been easy/difficult to date?

More Related