310 likes | 426 Vues
Project CLiMB C omputational Li nguistics for M etadata B uilding. Columbia University Funded by the Andrew W. Mellon Foundation 2002-2004. Using Computational Linguistic Techniques to Harvest Image Descriptors. Photograph courtesy of the Council of Industrial Design's Design Archive.
E N D
Project CLiMB Computational Linguistics for Metadata Building Columbia University Funded by the Andrew W. Mellon Foundation 2002-2004 Using Computational Linguistic Techniques to Harvest Image Descriptors
Photograph courtesy of the Council of Industrial Design's Design Archive.
CLiMB: Interdisciplinary Research at Columbia University • Libraries • Computer Science Department • Center for Research on Information Access (CRIA) Funded by the Andrew W. Mellon Foundation 2002-2004
CLiMB Project Members Judith Klavans, PI Stephen Davis Angela Giral Patricia Renfro Bob Wolven Roberta Blitz Rebecca Passonneau Veronika Horvath David Elson
Problems in Image Access Traditional approach: labor intensive expensive
Project CLiMB Help image catalogers provide subject access? Harvest image descriptors from existing literature?
Can we harvest image descriptors? angled porch v-shaped plan sandstone boulders
CLiMB Technical Contribution • CLiMB will identify and extract • proper nouns • terms and phrases • from text related to an image: By September 14, 1908, the basis of the Greenes' final design had been worked out. It featured a radically informal, V-shaped plan (that maintained the original angled porch) and interior volumes of various heights, all under a constantly changing roofline that echoed the rise and fall of the mountains behind it. The chimneys and foundationwould be constructed of the sandstone boulders that comprised the local geology, and the exterior of the house would be sheathed in stained split-redwood shakes. — Edward R. Bosley. Greene & Greene. London: Phaidon, 2000. p.127.
CLiMB Overall Goals The essence of CLiMB: • Use scholars themselves as “catalogers” by employing scholarly publications • Enhance existing descriptive metadata The CLiMB project: • Research: Development of richer retrieval through increased numbers of descriptors • Practice: Development of CLiMB ToolKit
Squeezing Metadata out of Scholarly Texts • Image collection • Associated text • Target object identification (TOI) • CLiMB ToolKit
Greene& Greene Architectural Records and Papers Collection Drawings and Archives Avery Architectural and Fine Arts Library Columbia University Libraries
NYDA.1960.001.00023 All Saints Episcopal Church (Pasadena, Calif.). Alterations1902-1903
Greene & Greene Catalog Record Author: Greene & Greene. Title: [Mrs. Dudley P. Allen house, 1188 Hillcrest Avenue (Pasadena, Calif.). Alterations.] Residence of Mrs. Dudley P. Allen, 1188 Hillcrest Ave., Pasadena, Cal. [graphic] : Alteration / Greene & Greene, Architects. Published: [1917] Physical Details: 4 sheets : various media ; 87.8 x 57.3 cm. (34 5/8 x 22 5/8 in.) Location: Columbia University, Avery Architectural Drawings Other Authors: Greene, Charles Sumner, 1868-1957. Greene, Henry Mather, 1870-1954. Subjects: Houses Alterations Architecture--Designs and plans--United States. Mrs. Dudley P. Allen house, 1188 Hillcrest Avenue (Pasadena, Calif.) Component Item: [1] Item no. NYDA.1960.001.03224. [AVERYimage]. Electric lighting -- floor plan, part plan of basement : Sheet no. Component Item: [2] Item no. NYDA.1960.001.00073. [AVERYimage]. [Electric lighting] floor plan, part plan of basement.
Greene & Greene Bibliography (associated texts) • Bosley, Edward R. Greene & Greene. London : Phaidon, 2000. • Current, William R. Greene & Greene: architects in the residential style. Fort Worth [Tex.] : Amon Carter Museum of Western Art, [1974] • Makinson, Randell L. Greene & Greene: architecture as fine art. Salt Lake City : Peregrine Smith, c1977. • Makinson, Randell L. Greene & Greene: the passion and the legacy. Salt Lake City : Gibbs and Smith, c1998. • Smith, Bruce. Greene & Greene masterworks. San Francisco : Chronicle Books, c1998. • Strand, Janann. A Greene & Greene guide [Pasadena, Calif. : G. Dahlstrom, 1974]
Squeezing Metadata out of Scholarly Texts • Image collection • Associated text • Target object identification (TOI) • CLiMB ToolKit
Target Object Identification (TOI) • “Authority” list • Varies from collection to collection • Greene & Greene – Project Names • North Carolina Museum – Creator/Title
North Carolina Museum of Art Museum Catalog (Associated Text) Images (Catalog Records) image descriptors North Carolina Museum of Art: Handbook of the Collections. Ed. Rebecca Martin Nagy. Raleigh, NC: North Carolina Museum of Art, Hudson Hills Press, 1998.
Georgia O'Keeffe (American, 1887-1986) Cebolla Church, 1945 Oil on canvas, 20 1/16 x 36 1/4 in. (51.1 x 92.0 cm.) Purchased with funds from the North Carolina Art Society (Robert F. Phifer Bequest), in honor of Joseph C. Sloane, 72.18.1 North Carolina Museum of Art <http://ncartmuseum.org/collections/highlights/20thcentury/20th/1910-1950/038_lrg.shtml>
MARC format 100 O’Keeffe, Georgia, ≠d 1887 -1986. 245 Cebolla church ≠ h [slide] / ≠ c Georgia O’Keeffe. 260 ≠c2003 300 1 slide : ≠ b col. • Object date: 1945. 500 Oil on canvas. 500 20 x 36 in. 535 North Carolina Museum of Art ≠ b Raleigh, N.C. 650 Painting, American ≠ y 20th century. • Women artist ≠ z United States 650 Church buildings in art.
MARC format with CLiMB subject terms 100 O’Keeffe, Georgia, ≠d 1887 -1986. 245 Cebolla church ≠ h [slide] / ≠ c Georgia O’Keeffe. 260 ≠c2003 300 1 slide : ≠ b col. 500 Object date: 1945. 500 Oil on canvas. 500 20 x 36 in. 535 North Carolina Museum of Art ≠ b Raleigh, N.C. 650 Painting, American ≠ y 20th century. 650 Women artist ≠ z United States 650 Church buildings in art. CLiMB New Mexican highlands CLiMB village of Cebolla CLiMB adobe Church of Santo Niño CLiMB sagging, sun-bleached walls CLiMB rusted tin roof CLiMB isolation CLiMB human endurance CLiMB window
Squeezing Metadata out of Scholarly Texts • Image collection • Associated text • Target object identification (TOI) • CLiMB ToolKit
The CLiMB ToolKit • Software prototype • For large image collections • Semi-automated metadata • Subject access terms • Human intervention at all steps • Iterative development cycle
The CLiMB ToolKit • Web Browser • Help Menus • Projects A Graphical User Interface (GUI)
CLiMB TOOLKIT: Process Flow 5. Review 4. Select Subject Access Terms 3. Analyze Text 2. Load TOI List 1. Load Text
CLiMB DocViewer http://www1.cs.columbia.edu/~delson/cni/
Thank you! Any further questions? www.columbia.edu/cu/cria/climb