1 / 70

Terminology Organization in Terminology Management Systems

Terminology Organization in Terminology Management Systems. Angela Boll, Marina Kaneva, Claudia Himmler, Chiara Huber, Annika Meinhardt, Patrick Johnson. COMPILATION OF TERMINOLOGY. the most practical way to process lexical data is by computer

rosalie
Télécharger la présentation

Terminology Organization in Terminology Management Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Terminology Organization in Terminology Management Systems Angela Boll, Marina Kaneva, Claudia Himmler, Chiara Huber, Annika Meinhardt, Patrick Johnson

  2. COMPILATION OF TERMINOLOGY • the most practical way to process lexical data is by computer • benefits: speed, flexibility and storage capacity • growing trend towards the automation of terminological data processing • from now on, all aspects of terminology compilation, storage and retrieval will be assisted by or directly carried out by computers

  3. PRINCIPLES OF COMPILATION • automation fundamentally affects the compilation of terminology • necessity to evolve completely new principles for compilation

  4. PRINCIPLES OF COMPILATION • systematic terminology compilation is now firmly corpus-based • text corpora reinforce the principle that terminology compilation is an ongoing and repeated activity

  5. PRINCIPLES OF COMPILATION • many technical texts can now be preserved in or converted into a suitable format for terminological analysis • texts which are to be processed by translators can be analysed and compared with current machine-readable terminology holdings and a machine-readable general dictionary in order to produce a listing of items not contained in either

  6. PRINCIPLES OF COMPILATION • running text can be used totally independently of user requirements • terminology compilation is becoming increasingly text-oriented

  7. PRINCIPLES OF COMPILATION • the second major innovation affecting principles of compilation is the division which is now possible between • the raw data as they are found in the corpus, • the database which contains all the information that is collected in suitably structured form, and • all the various subsets of information which are created for specific purposes and uses

  8. PRINCIPLES OF COMPILATION

  9. PRINCIPLES OF COMPILATION • the terminologist now has appropriate tools which lift his work from a craft to a scientifically supported activity • automatic processing and computer-assisted terminology compilation is therefore qualitatively superior to conventional methods • terminologist is freed from the limitations of the past with respect to size of individual records and total quantity of records

  10. PRINCIPLES OF COMPILATION • however, there is also a danger: private term collections of individual translators can become widely known • instead, there should be only one major database of terminological information for each language community, to which all users would refer and contribute • communication across all industrial and institutional barriers would be facilitated

  11. The nature and type of terminological information • Information for the construction of a terminological record is various and subject to changes • This affects the nature of database system • Information in the database must be considered independent of each other • Information can be entered at different times and from different sources

  12. The nature and type of terminological information • Full bibliographical information for each item is provided separately • Limitation of human manipulation of lexical data to the specific interpretative tasks the computer cannot perform • Concept is explained by indication of linguistic forms: antonyms, broader and narrower generic terms (refer to a whole class of terms), broader and narrower partitive terms (relate to a part of a whole)

  13. The nature and type of terminological information • Exemplification of the usage of technical terms: example sentences (context) and usage notes • Terms meaning is semantically more changeable than items of the general lexicon of a langauge

  14. The nature and type of terminological information • In conceptually-based terminological data banks definitions are given in one language only • Bilingual terminology is directional and non-reversible > translation equivalents cannot be converted into entries of the source language • Translation equivalents do not refer to an authentic concept because they introduce new concepts

  15. Methodological considerations • Terminologists don’t need to be concerned how the data is stored in the computer thanks to the modern techniques of computational linguistics • Computer can store a multi-dimensional semantic network • No physical limitation of the size of any non-magnetic medium • Definitions can be as long as is necessary to properly define the term

  16. Methodological considerations • Terminology compilation can be distributed physically and temporally • Information can be collected and stored in stages • As long as each item of data satisfies the controls (e.g. bibliographical reference) > as much data as available can be entered at any time • Information can be collected on a distributed basis > work can be distributed among various people and locations > it is particularly important for the compilation of multilingual terminology

  17. Quality of data • Computer usage for input control and validation resulted in a trend to terminology of a higher quality • Increased dangers of spreading terminology of low quality • Increase in quality is very important • Far-reaching effect of computerised terminology processing on terminology spreading

  18. Quality of data • Distinction between original source texts and translated texts • Terms taken from texts in their original language – genuine terms and as such have full validity • Terms taken from translated texts may either be valid terms or translation equivalents

  19. Quality of data • Trend towards the use of genuine original texts for extraction of terms and contexts • There is no exact match of concepts for many terms across languages • Several possible equivalents together with context and usage information are needed for a correct choice

  20. Principles of data collection • Set of basic principles for the compilation of terminological data: • Certain consistency of criteria • Sources must be stated • Distinction between original and translated texts • Linguistic behaviour of terms should be documented by contexts so that all relevant textual variants are covered

  21. Terminological Data Banks-A Definition- • Automated collection of vocabularies of special areas that serve a particular user group • Used for large translation services • Enhanced but still conventional glossaries transferred to a new medium

  22. Terminological Data Banks-A Definition- • Designed to give response to the same questions a good dictionary is supposed to answer • But these questions only elicit direct responses from the various parts of the conventional dictionary

  23. Terminological Data Banks-A Definition- • Examples: ENTRY PART QUESTION ANSWER • equivalent what is the French word imprimante for ‘laser printer’? laser • gender what is the gender of feminine ‘imprimante’?

  24. Terminological Data Banks-A Definition- • These responses are not sufficient for a wide range of dictionary users • Answers may be ambiguous • Full potential of a lexical database was not exploited by existing term banks

  25. Terminological Data Banks-A Definition- • Reasons: • Information was not unified in a suitable manner in order to retrieve it • Lack of coherent structure • Existing system failed to exploit new and additional techniques for ordering and representing the data

  26. Terminological Data Banks-A Definition- • There was an increasing demand for a system that allows to answer complex queries • Example: QUERY SEARCH OF FIELD „what do you call a machine definition or that performs X?“ conceptual links

  27. Terminological Data Banks-A Definition- „a collection, stored in a computer, of special language vocabularies, including nomenclatures, standardised terms and phrases, together with the information required for their identification, which can be used as a mono- or multilingual dictionary for direct consultation, as a basis for dictionary production, as a control instrument for consistency of usage and term creation and as an ancillary tool in information and documentation.“

  28. Terminological Data Banks-A Definition- • Term banks are supposed to be used by people with varying degrees of expertise and different purposes

  29. Semantic Networks • Complex storage of data to represent terminological relationships • First developed in artificial intelligence research for formal representation of the human knowledge • Have no intrinsic meaning – they are basically directed graphs • They have superficial similarity

  30. Semantic Networks • Example:

  31. Semantic Networks • The relationships between concepts are expressed through abbreviations • Generic relationship=“is a type of“=“isa“ • Partitive relationship=“is a part of“ / “consists of“=“ispart- of“/“has-part“ • Nodes = different concepts • Arcs = labelled links

  32. Semantic Networks • A wide variety of relationships between concepts • To create semantic networks it is necessary to define a specific number of relationships and a coherent internal structure • System must allow only one single method of description for each type of relationship • Networks have to be subject field-specific

  33. Semantic Networks • In order to get a perfect result the end-user poses questions to the system • The fragments are matched against the network data base • Variable nodes in the fragments are bound to the value they must have in order to make the match perfect

  34. Semantic Networks • The success of term banks depends on several factors: • The semantics of the network arcs must be carefully defined • System must be easy to implement and user-friendly • Danger of over-complicated system that is too detailed

  35. Compilation of TerminologyTerminological information What terms are used in a terminological tool? • The selection of the most effective terms is assisted by reference to terminological information which is collected in dictionaries/glossaries/term banks • Principal factor of effectiveness : type and quality of information

  36. Terminological information • International consensus on basic categories for terminological records: • entry term • a reference number • a subject field • a definition • an indication of the usage

  37. Terminological information • Customary to add indication of the sources of the term(s), definition, context or any foreign language equivalents • It is up to the user to decide on appropriateness of terms

  38. Corpora of raw data containing definitions, terms, contexts Source information origin type origin type origin type origin type No. No. page No. page No. page page Conceptual Specification Linguistic Specification Pragmatic Specification FL equivalent Specification language language Equiv. term definition term context language Grammatical information Grammatical information links to other concept Usage note or example synonyms synonyms scope notes abbreviation usage abbreviation subject field variants usage variants date type date type date type date type pool number record number terminologist Housekeeping information

  39. Terminological informationBasic data categories What information is included in a multifunctional term record? • Information: complex and consists of a number of subsets which can be compiled and processed quite separately.

  40. Terminological informationBasic data categories In which categories is the term record structured? • 1. source information: links the term record to the raw data files • 2. entry term: either linguistic item or a label of a concept, or both • 3. semantic and conceptual specification: definition, a subject attribution, scope notes, set of links to other concepts • 4. linguistic specification: e.g. variants, abbreviations

  41. Terminological informationBasic data categories • 5. pragmatic specification: examples of the context in which term occurs, usage notes • 6. housekeeping or administrative information: record number, name of terminologist, dates of first processing, up-dating of the record • 7. foreign language equivalent specification in translation-orientated databases

  42. Terminological informationBasic data categories Now let us take a closer look on the information categories • Entry Term - most common search item - presented in the most relevant form (e.g. sing. for nouns) - because the distinction between concept- or term-orientation affects the treatment of homographs/synonyms  decision, whether entry term represents concept or is the linguistic form

  43. Terminological informationBasic data categories - In concept-orientated term banks: primary importance on the definition of the concept and all terms matching the definition are grouped together  imposes difficult choice of the order in which terms are listed - Exclusive concept orientation (e.g. NORMATERM) is doable in mono- and bilingual term banks which deal with subject fields of similar conceptual structures

  44. Terminological informationBasic data categories - For multilingual term banks explanatory notes are required which indicate in every case the scope and degree of matching a term with the concept defined in another language - Three types of entry: 1. simple compound or complex terms 2. phrases regardless of lexicalisation 3. sentences

  45. Terminological informationBasic data categories Conceptual Specification • Definition - first item that links entry term to the concept - can be in a style specific to the term bank, or extracted from an authoritative source - term banks can be classified by the way an entry is identified or explained - there are two major schools of thought:

  46. Terminological informationBasic data categories - The first can refer to a definition which is strictly limited in its validity to the range of texts which represent the source material for the term collection - In the second there is no restricted corpus  no single valid definition in the first place

  47. Terminology informationBasic data categories • Relationships - most controversial and least defined category of information, it may indicate no more than the most obvious broader term - information could be a reference to another record • Subject Field - terminology is divided by subject field before ordered in another way - because of the large quantities of terms it is advisable to introduce a classification of terms by subject areas

  48. Terminological informationBasic data categories • Scope Note - can be considered a further specification of subject or register - is intended to indicate a special field in application • Linguistic specification • Grammatical Information - can consist of: spelling, pronunciation, gender for nouns, parts of speech (e.g. n, v, adj.), principal parts of verbs (e.g. infinitive, past)

  49. Terminology informationBasic data categories • Language: - is important in term banks where it is combined with an indication of the country where it is used • Parallel information categories to the entry term: - has usually no separate record but is listed in an index with a reference to the record of the entry term - comprises information as: spelling, expanded forms or reduced forms or synonyms - several overlapping categories exist: variants, full synonyms abbreviated forms

  50. Terminology informationBasic data categories • Pragmatic specification ▪ Context • Gives examples of the way that the entry term is used in a language • Is considered a successful way of showing any unusual features of wordform, inflection or collocation • The context should make the definition and the usage note complete

More Related