1 / 23

Graduate School of Information Science and Technology, Osaka University, Japan

20th Workshop on Methodologies and Tools for Complex System Modeling and Integrated Policy Assessment (CSM2006) 2006.8.28-30: Laxenburg, Austria. A support tool for composing questionnaires in social survey data archive SRDQ. Graduate School of Information Science and Technology,

macey-eaton
Télécharger la présentation

Graduate School of Information Science and Technology, Osaka University, Japan

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 20th Workshop on Methodologies and Tools for Complex System Modeling and Integrated Policy Assessment (CSM2006) 2006.8.28-30: Laxenburg, Austria A support tool for composing questionnaires in social survey data archive SRDQ Graduate School of Information Science and Technology, Osaka University, Japan Norihisa Komoda, Shingo Tamura, Yoshitomo Ikkai, Koichi Higuchi

  2. 1.Social Survey Data Archive The social survey data archive is an archive that collects, storages and disseminates lots of social survey data such as “Social Network Survey”. Each survey data contains various types of items such as question items, dataset (answers of respondents), sample design, and papers/repots about the survey. Objectives • Maintaining the quality of social surveysWhen composing questionnaires fornew surveys, it is imperative to review question items and dataset of existing surveys for maintaining the quality. • Effective use of existing dataIt reduces the need to conduct repetitious surveys for similar purposes. Thus large amounts of survey costs can be eliminated. • EducationThe archive makes it possible to develop social survey methodology lessons using high quality survey data.

  3. 1.Social Survey Data Archive: “SRDQ” “SRDQ”: the Social Research Database on Questionnaires One of the most advanced social survey data archive in Japan. (http://srdq.hus.osaka-u.ac.jp/en) developed by Graduate School of Human Science, Osaka University in 2003.

  4. 1.Social Survey Data Archive: “SRDQ” “SRDQ”: the Social Research Database on Questionnaires • Hierarchical textual data • Searching system (string search) • Dataset analysis system (crosstab,etc.) • Subjects, sample designs, papers & reports of each survey are also stored Specifications 17,232 items 119 surveys Question item 1 “Information Society Survey” Qeestion item 2 Subjects: Information, Social Psychology, etc. Question Items, Dataset, Papers & Reports. ・・・ Infor-mation ・・・ “Social Network Survey” Subjects: Information, Family, etc. Question Items, Dataset, Papers & Reports. Simple string search of question items or surveys ・・・ Class

  5. 1.Social Survey Data Archive: “SRDQ” In this window, the row and column items of the crosstab are selected. SRDQ allows the direct analysis of dataset over the web pages Example: Crosstab analysis Crosstab analysis Alternatives of column item alternatives of row item A B male female 30 male answered “A”.

  6. 1.Social Survey Data Archive: “SRDQ” Finally, the analyst pushes “crosstabs” for starting analysis. How to execute Crosstab analysis by using SRDQ An analyst selects the variables he/she wants to use, then push “>>”.

  7. 1.Social Survey Data Archive: “SRDQ” The Result: Column: PC Row: Gender 47% of male use PC, while only 29% of female do.

  8. 2. Purpose of the Study Summarize existing question items Search for related surveys A tool to support this process has been developed. To make SRDQ more useful, we planed to add a new function to help researchers in composing new questionnaires. Procedures to compose a new questionnaire: Intermittent discussion by research group members (approx. 10 researchers), Continue 3 – 12 months. Decide the purpose and the design 3 man-month 0.5 man-month 0.75 man-month 0.75 man-month 0.25 man-month Summarize existing question items Select exiting surveys or question items to compare with new ones Create new question items Decide the order of question items Survey containing 200 - 300 questions

  9. 2. Purpose of the Study Procedures to compose a new questionnaire: Decide the purpose and the design In this process, “Summary of Question Items” is used. Summarize existing question items Select exiting surveys or question items to compare with new ones Create new question items A tool to support this process has been developed. Decide the order of question items

  10. 2. Summary of Question Items SRDQ Searched with keywords and name of surveys “Summary of Question Items” is a synopsis of similar question items included in particular surveys. Information Society Survey 2001 Information Society Survey 2002 q3.Do you use e-mail on your cell phone or PC q1. Do you use the following items? a. E-Mail b. Fax 1. yes 2. no ・・ q45.Do you use Home Page on your cell phone or PC ・・ f.Home Page 1. yes 2. no surveys ISS 2002 JGSS 2003 ISS 2001 Question Items Do you use the following items? E-Mail Break down the question items to the minimum units (red underlined). And summarize the similar items/units. q22a q1a q3 Do you use the following items? Fax q1b Do you use the following items? Home Page q1f q45 q22b ・・ Trends of the question items and differences between the surveys become clear.

  11. 2.Support System for the Summarization Summary of Question Items Surveys ISS 2002 JGSS 2003 ISS 2001 Question Items Do you use the following items? E-Mail q22a q1a q3 Do you use the following items? Fax q1b Do you use the following items? Home Page q1f q45 q22b It takes approx. 1week to process only 3 or 4 surveys manually. The automatic creation of the summary that is sufficiently accurate to meet the demands of social survey specialists. And, the provision of the editing interface to correct the errors and to produce a final, completed summary in less time. Goal Evaluation of accuracy: E = W * Non-Detection items +Miss Detection items (W > 1 ) Number ofrows includes detection errors should be under 10%

  12. 3.Overview of the System Search Target (several surveys) InputSurveys + Keywords (ex. Survey A, B, C, D + “mail” ) A B C D For similarity judgments of question items, “Jaccard Coefficient” is used. Output 19 question items about “mail” D A B C 1. How often do you use e-mail for each of the purposes listed below? q2 q2 q2 1.1 business communication a. every day b. 3 or more days a week 1.2 Personal communication with friends q3 q3 a. every day b. 3 or more days a week 1.3 Personal communication with family q15 q23 a. every day b. 3 or more days in a week

  13. 3. Similarity Judgments by Jaccard Coefficient : Maximum similarity Q.A1 Q.B1 Original Method Jaccard Coefficient:J = a / (a+b+c) a: number of common words between 2 question items b, c: number of words which appear in only 1 question items Similarity Judgment • Calculate similarity for all combination of question items in target surveys • The pair which has maximum similarity value will be judged as “similar”. (Repeat this step while similarity values are higher than the threshold) Q.C1 Q.A1,Q.B1 Q.A1 Q.B1 Q.C1 ・・ Q.B2 Q.C2 Q.B2 Q.A2 Q.A2 Q.C2 ・・ ・・ ・・ ・・ ・・ ・・

  14. 3.Difficulty of Similarity Judgments 1. Treat juxtaposed words as a group 2. Apply a penalty if corewords don’t match Non Detection 3.Apply “neighborhood bonus” for word matches 1. Partial match in juxtaposed words • How often do you do the things on this list? Practice flower arranging, tea ceremony, or calligraphy Survey A Non- Detection Survey B • Do you practice cooking, sewing, or calligraphy? 2.Almost all words are same except one core word, but the intended purposes of the questions are different. • How often do you use e-mail for personal communication with friends? Survey A Miss Detec- tion Survey B • How often do you use e-mail for personal communication with family? 3. Different expression, but asking the same thing. • Do you perform following actions in your everyday life? Survey A Reuse bathwater for laundering to conserve water. • Do you try to do things in this list? Saving resources such as water. Survey B

  15. 3.Similarity Judgments (1/2) New similarity measure which uses structural characteristics of surveys Under specific conditions, values of existing Jaccard coefficient are adjusted. 1. Treat juxtaposed words as a group Juxtaposed words can be viewed as a group If one or more words matches in juxtaposed words, treat those words as a group and ignore unmatched words when calculating similarity 2. Apply penalty if core words don’t match For pairs of similar question items within one survey, if only a few words differs, that words are recognized as core words. If a pair within one survey has similarity value higher than 0.6, un-matched words are recognized as core words. Detect core words before calculating similarity, and decrease similarity value if core words don’t match. Survey B Survey A Q1-a. How often do you use e-mail for each of the purposes? communication with family Q1-a. How often do you use e-mail for each of the purposes? communication with friends ・・ Q1-a. How often do you use e-mail for each of the purposes? business communication Don’t Match Penalty ・・ core words

  16. 3.Similarity Judgments (2/2) 3.Apply “neighborhood bonus” for word matches • There is significance to the order of the question items.Question items having the same meaning tend to be arranged in the same order. Increase similarity values if highly similar pairs are found in the neighborhood Question items in the same hierarchical positions Survey A Survey B Q2. Do you try to do things in this list? 1. Always turn off lights not in use. a.yesb. no   2. Saving resources such as water. a. yesb. no Q7. Do you perform following actions in your daily life? High similarity value 1.Turn off lights not in use 2. Reuse bathwater for laundering to conserve water. ・・ Bonus ・・ Survey C Survey D High similarity value 1. Do you use e-mail on your cell phone or pc? 1.Do you use e-mail on your pc? 2.How often do you use e-mail? 2. How many times do you send/receive e-mails?   2-a.Gathering info for daily life 2-1. To get info about everyday life ・・ ・・

  17. 4.Evaluation of Similarity Judgments (1/2) • Compare correct result manually preparedwith result using proposed measure and result using Jaccard coefficient only • 36 question items about environmental protection (from 3 surveys) Penalty: 0.5, “Neighborhood bonus”: 0.3 Threshold value of similarity judgments T = 0.6 T = 0.5 • Non-detections are more problematic than miss detections Evaluation:E = W (3 = number of surveys) * Non-Detection +Miss Detection • Non-detection: a pair was judged as not similar while it should be judged as similar • Miss detection: a pair was judged as similar while it should be judged as not similar

  18. 4.Evaluation of Similarity Judgments (2/2) • 113 question items about Leisure (from 10 surveys) Penalty: 0.5, “Neighborhood bonus”: 0.3 Threshold value of similarity judgments T = 0.6 T = 0.5 Evaluation:E = W (10 = number of surveys) * Non-Detection +Miss Detection • Non-detection: a pair was judged as not similar while it should be judged as similar • Miss detection: a pair was judged as similar while it should be judged as not similar • Non-detection & miss detection are reduced, and thus E is improved • Number of rows containing detection errors is under 10% The efficiency of the proposed method has been confirmed.

  19. 5.Editing Interface Select item to move Specify the destination • The prototype tool has been developed. • The editing interface is build as CGI script.(Perl). scrolling total 10 surveys scrolling Click to open an editing window Possible miss detection: exceeds the threshold but the value is close to the threshold value 0.5~0.6 Possible non-detection:does not exceed the threshold but the value is close to the threshold 0.4~0.5

  20. 5. Editing Interface Moved an item to a new row (the last row) to correct a detection error. total 10 surveys Survey E scrolling scrolling Possible miss detection moved Possible non-detection

  21. 5.Evaluation Test of the Editing Interface • Evaluation test: compare the time taken to create the summary by hand with the time using the proposed system / interface. • Material: 113 question items about Leisure ( from 10 surveys )Contains 1 non-detection and 9 miss detection ( T = 0.5 ). ManualProposed System Time taken to create a correct summary 3 hours20 minutes view & check the question items 15 min. move the items to correct errors 5 min. 3 rows contain detection errors 10 question items are moved Possible miss detection Possible miss detection: 6 items Possible non-detection: 22 items(All detection errors are displayed as these “possible error”) Possible non-detection

  22. 6. Conclusions • Using structural characteristics of social survey questionnaires, we have developed a support tool for generation of the “summary of question items”. • The proposed method is capable of automatically creating the summary that is sufficiently accuracy to meet the demands of specialists. • With the man-machine interface system, final and completed summaries can be generated in less time than manual means.

  23. 20th Workshop on Methodologies and Tools for Complex System Modeling and Integrated Policy Assessment (CSM2006) 2006.8.28-30: Laxenburg, Austria Thank you for your kind attention. Really? Great ! Doubtful ! Check detail.

More Related