1 / 39

Problems with Non-roman Character (Korean) Searching

Problems with Non-roman Character (Korean) Searching. Prepared by Young Ki Lee Senior Cataloging Specialist

yana
Télécharger la présentation

Problems with Non-roman Character (Korean) Searching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Problems with Non-roman Character (Korean)Searching Prepared by Young Ki Lee Senior Cataloging Specialist Korean/Chinese Team RCCD Library of Congress

  2. Topics to be covered 1. Non-roman script (Korean) searching under CJK data fields without spacing 2. No Unified index (Normalization) between Hangul (Korean) and Hancha (Chinese character) 3. Microsoft Korean IME 4. Display of search results 5. CJK Compatibility Database

  3. Title Word Search for 국경 Search 국경 (國境: the border): -the number of hits on this ‘ti:’ search is 363 -the ratio of relevant hits only 13 % (13 out of 99) in the 1st group (Books 1970-1993) -the records which have the word ‘국경’ in any position in the title fields (includes between subfields) are picked up by System, such as ‘한국경제’ : 한국/경제 ‘중국경기공’ : 중국/경기공 ‘미국경제’ : 미국/경제 ‘한국경문대전집’ : 한국/경문/대전집 ‘약국경영,’ : 약국/경영, etc. -In Voyager (currently with space), same search (tkey 국경) retrieves only 9 hits

  4. Search9

  5. Title Word Search for 국경 Search 국경 (國境: the border): -the number of hits on this ‘ti:’ search is 360 -the ratio of relevant hits only 13 % (13 out of 99) in the 1st group (Books 1970-1993) -the records which have the word ‘국경’ in any position in the title fields (includes between subfields) are picked up by System, such as ‘한국경제’ : 한국/경제 ‘중국경기공’ : 중국/경기공 ‘미국경제’ : 미국/경제 ‘한국경문대전집’ : 한국/경문/대전집 ‘약국경영,’ : 약국/경영, etc. -In Voyager (currently with space), same search (tkey 국경) retrieves only 9 hits

  6. Title Word Search for 국경 Search 국경 (國境: the border): -the number of hits on this ‘ti:’ search is 360 -the ratio of relevant hits only 13 % (13 out of 99) in the 1st group (Books 1970-1993) -the records which have the word ‘국경’ in any position in the title fields (includes between subfields) are retrieved, such as ‘한국경제’ = 한국/경제 ‘중국경기공’ = 중국/경기공 ‘미국경제’ = 미국/경제 ‘한국경문대전집’ = 한국/경문/대전집 ‘약국경영,’ = 약국/경영, etc. -In Voyager (currently with space), same search (tkey 국경) retrieves only 9 hits

  7. Title Word Search for 국경 Search 국경 (國境: the border): -the number of hits on this ‘ti:’ search is 360 -the ratio of relevant hits only 13 % (13 out of 99) in the 1st group (Books 1970-1993) -the records which have the word ‘국경’ in any position in the title fields (includes between subfields) are retrieved, such as ‘한국경제’ = ‘한국 경제’ ‘중국경기공’ = ‘중국 경기공’ ‘미국경제’ = ‘미국 경제’ ‘한국경문대전집’ = ‘한국 경문 대전집’ ‘약국경영,’ = ‘약국 경영,’ etc. -In Voyager (currently with space), same search (tkey 국경) retrieves only 9 hits

  8. Title Word Search for 국경 Search 국경 (國境: the border): -the number of hits on this ‘ti:’ search is 360 -the ratio of relevant hits only 13 % (13 out of 99) in the 1st group (Books 1970-1993) -the records which have the word ‘국경’ in any position in the title fields (includes between subfields) are retrieved, such as ‘한국경제’ = 한국/경제 ‘중국경기공’ = 중국/경기공 ‘미국경제’ = 미국/경제 ‘한국경문대전집’ = 한국/경문/대전집 ‘약국경영,’ = 약국/경영, etc. -In Voyager (currently with space), same search (tkey 국경) retrieves only 9 hits

  9. Title Word Search for 국경 Search 국경 (國境: the border): -the number of hits on this ‘ti:’ search is 360 -the ratio of relevant hits only 13 % (13 out of 99) in the 1st group (Books 1970-1993) -the records which have the word ‘국경’ in any position in the title fields (includes between subfields) are retrieved, such as ‘한국경제’ = 한국/경제 ‘중국경기공’ = 중국/경기공 ‘미국경제’ = 미국/경제 ‘한국경문대전집’ =한국/경문/대전집 ‘약국경영,’ = 약국/경영, etc. -In Voyager (currently with space), same search (tkey 국경) retrieves only 9 hits

  10. Title Word Search for 국경 Search 국경 (國境: the border): -the number of hits on this ‘ti:’ search is 360 -the ratio of relevant hits only 13 % (13 out of 99) in the 1st group (Books 1970-1993) -the records which have the word ‘국경’ in any position in the title fields (includes between subfields) are retrieved, such as ‘한국경제’ = 한국/경제 ‘중국경기공’ = 중국/경기공 ‘미국경제’ = 미국/경제 ‘한국경문대전집’ =한국/경문/대전집 ‘약국경영’ = 약국/경영, etc. -In Voyager (currently with space), same search (tkey 국경) retrieves only 9 hits

  11. 국경7

  12. Title Word Search for 국경 Search 국경 (國境: the border): -the number of hits on this ‘ti:’ search is 360 -the ratio of relevant hits only 13 % (13 out of 99) in the 1st group (Books 1970-1993) -the records which have the word ‘국경’ in any position in the title fields (includes between subfields) are retrieved, such as ‘한국경제’= 한국/경제 ‘중국경기공’ = 중국/경기공 ‘미국경제’= 미국/경제 ‘한국경문대전집’= 한국/경문/대전집 ‘약국경영,’= 약국/경영, etc. -In LC Online Catalog: (currently with space), title word search retrieves only 9 hits

  13. Title Word Search for 어학 Search 어학 (語學 : philology): -In OCLC, the number of hits on ‘ti:’ search is 308 -the ratio of relevant hits is only 37% (36 out of 95) in the first group (Books 1900-1991) -Includes ‘국어학적’= 국어학적 ‘단어학습’= 단어/학습 ‘언어학’ = 언어학 ‘일본어학교’= 일본어/학교 ‘영어학원,’= 영어/학원, etc. -In Voyager (currently with space), same search (tkey 어학) retrieves 32 hits

  14. Title Word Search for 고조선 Search 고조선 (古朝鮮 : name of ancient Korean country) retrieves irrelevant records, such as ‘일본가와사끼쇼와댄고조선인숙사’ =일본/가와사끼/쇼와/댄고/조선인/숙사 ‘CD-ROM을타고조선에가다’ = CD-ROM/을/타고/조선/에/가다 ‘중국그리고조선족’ = 중국/그리고/조선족 ‘하멜일지그리고조선국에관한기술’ = 하멜/일지/그리고/조선국/에/관한/기술 ‘조선도자명고’ = 조선/도자/명고 ‘조선로동당제5차대회에서한중앙위원회사업총화보고’ = 조선/로동당/제/5차/대회/에서/한/중앙/위원회/사업/총화/보고 ‘조선아동문학문고’ = 조선/아동/문학/문고, etc.

  15. 고조선2

  16. 고조선4

  17. 고조선7

  18. Kochoson8

  19. komunso1

  20. Komunso2

  21. Komunso3

  22. Title Word Search for 한국경제 한국경제 (韓國經濟 : Korean Economy): ‘ti:’ search -search 한국경제 : the number of hits 300 -search 韓國經濟 : the number of hits 652 -search 한국經濟 : the number of hits 3 -search 韓國경제 : the number of hits 0 -search ‘Hanguk kyongje’ : the number of hits 1,490 Title Phrase search for 한국경제: ‘ti=’ search

  23. Title Word Search for 韓國經濟 한국경제 (韓國經濟 : Korean Economy): ‘ti:’ search -search 한국경제 : the number of hits 295 -search 韓國經濟 : the number of hits 652 -search 한국經濟 : the number of hits 3 -search 韓國경제 : the number of hits 0 -search ‘Hanguk kyongje’ : the number of hits 1,490 Title Phrase search for 한국경제: ‘ti=’ search

  24. Title Word Search for 한국經濟 한국경제 (韓國經濟 : Korean Economy): ‘ti:’ search -search 한국경제 : the number of hits 295 -search 韓國經濟 : the number of hits 652 -search 한국經濟 : the number of hits 3 -search 韓國경제 : the number of hits 0 -search ‘Hanguk kyongje’ : the number of hits 1,490 Title Phrase search for 한국경제: ‘ti=’ search

  25. Title Word Search for 韓國경제 한국경제 (韓國經濟 : Korean Economy): ‘ti:’ search -search 한국경제 : the number of hits 295 -search 韓國經濟 : the number of hits 652 -search 한국經濟 : the number of hits 3 -search 韓國경제 : the number of hits 0 -search ‘Hanguk kyongje’ : the number of hits 1,490 Title Phrase search for 한국경제: ‘ti=’ search

  26. Title Word Search for 韓國경제 한국경제 (韓國經濟 : Korean Economy): ‘ti:’ search -search 한국경제 : the number of hits 295 -search 韓國經濟 : the number of hits 652 -search 한국經濟 : the number of hits 3 -search 韓國경제 : the number of hits 0 -search ‘Hanguk kyongje’ : the number of hits 1,499 Title Phrase search for 한국경제: ‘ti=’ search

  27. Title Phrase Search for 한국경제 한국경제 (韓國經濟 : Korean Economy): ‘ti:’ search -search 한국경제 : the number of hits 295 -search 韓國經濟 : the number of hits 652 -search 한국經濟 : the number of hits 3 -search 韓國경제 : the number of hits 0 -search ‘Hanguk kyongje’ : the number of hits 1,490 -search 한국#경제 : the number of hits : 461 (ti: 한국 AND ti: 경제) Title Phrase search for 한국경제: ‘ti=’ search

  28. Search ti: nodongja or 勞動者 or 노동자 or 로동자

  29. Search ti: nodongja or 勞動者 or 노동자 or 로동자

  30. Korean IME Problems 1. Personal name search with invalid character from Korean IME -Search 李光洙in ‘pn:’ : 0 hit. 李 (F9E1) is invalid character from Korean IME -Search 李光洙in ‘pn:’ : 157 hits. 李 (674E) is valid MARC21 character 2. Title search with invalid character from Korean IME -Search 論文集 in ‘ti:’ : 0 hit. 論 (F941) is invalid character from Korean IME -Search 論文集 in ‘ti:’ : 21,393 hits. 論 (8AD6) is valid MARC21 character 3. Korean Family name “曺” -No MARC 21 equivalent

  31. Display Order 1. Browse search: sorted by Unicode value number – roman – Japanese – Hancha – Hangul 2. Keyword search: sorted by alphabet order of Romanization form number -- Romanization 3. Display order : character by character on designated value

  32. sort2 Unicode total strokes radical (# : stroke) 銀: 9280: 14 167 (gold) 8 門: 9580 : 8 169 (gate) 8 養: 990A: 15 184 (eat) 6 魂: 9B42 14 194 (ghost) 10 가: AC00

  33. sort3

  34. Display Order • Browse search: sorted by Unicode value number – roman – Japanese – Hancha – Hangul 2. Keyword search: sorted by alphabet order of Romanization form number -- Romanization 3. Display order : character by character on designated value NOT word by word

  35. sort1 진: C9C4 침: CE68 중: C911 인: C778

  36. Display Order 1. Browse search: sorted by Unicode value number – roman – Japanese – Hancha – Hangul 2. Keyword search: sorted by alphabet order of Romanization form number -- Romanization 3. Display order : character by character on designated value NOT word by word

  37. CJK Compatibility Database • The CJK Compatibility Database includes more than 450 non-MARC21 Chinese, Japanese and Korean characters, Hangul syllables and diacritic marks, matched with their MARC21 equivalents. • The database is intended to enable catalogers to quickly and conveniently replace a non-MARC21 character with its MARC21 equivalent. • The list of characters in the database was initially identified by LC staff, and was supplemented by entries in a similar database at Yale University. • The database is a cooperative undertaking, and is intended for the use of all CJK catalogers. If you encounter a non-MARC21 character in the course of your work, please report it to us so that it can be added to the database. Notify Young Ki Lee, Senior Cataloging Specialist, Korean/Chinese Team, Library of Congress, at ylee@loc.gov.

  38. Thank you

More Related