The Computational Linguistics for Metadata Building (CLiMB) research project produced a Cataloger’s Toolkit for enhancing subject access to digital image collections. CLiMB-2, under the leadership of Judith Klavans, was funded by the Mellon Foundation from 2005-2008. CLiMB-1, also funded by the Mellon Foundation, was based at Columbia University under Dr. Klavans' direction, from 2002 to 2004.
Addressing the pervasive problem of inadequate subject indexing for electronic images, the CLiMB Toolkit applies computational linguistic techniques to mine scholarly texts for metadata terms. Part-of-speech taggers, noun-phrase finders, and disambiguation techniques are combined to identify potential subject terms and match them to normalized terms in the Getty Vocabularies.
For a full description of the CLiMB Toolkit’s functionality and the evaluatoins conducted on cataloger satisfaction, please see the Final Report.
Through related research initiatives, the CLiMB Toolkit continues to undergo refinements and experiments with extended functionality. The National Digital Information Infrastructure Preservation and Planning (NDIIPP) program through the University of Illinois, Urbana-Champaign and OCLC is incorporating the CLiMB Toolkit with a Named Entity Recognizer for enhancing consistency across collective catalog records. The Institute for Museum and Library Services is currently funding a research and development project that will integrate the CLiMB Toolkit with the social tagging applications developed under the Steve.museum and trust inferencing techniques implemented under the FilmTrust project. More information on these cross-disciplinary, integrated technologies will be posted on this page as they become available.