作者
Corinne Jörgensen,Alejandro Jaimes,Ana B. Benitez,Shih‐Fu Chang
摘要
This article presents exploratory research evaluating a conceptual structure for the description of visual content of images. The structure, which was developed from empirical research in several fields (e.g., Computer Science, Psychology, Information Studies, etc.), classifies visual attributes into a “Pyramid” containing four syntactic levels (type/technique, global distribution, local structure, composition), and six semantic levels (generic, specific, and abstract levels of both object and scene, respectively). Various experiments are presented, which address the Pyramid's ability to achieve several tasks: (1) classification of terms describing image attributes generated in a formal and an informal description task, (2) classification of terms that result from a structured approach to indexing, and (3) guidance in the indexing process. Several descriptions, generated by naive users and indexers, are used in experiments that include two image collections: a random Web sample, and a set of news images. To test descriptions generated in a structured setting, an Image Indexing Template (developed independently over several years of this project by one of the authors) was also used. The experiments performed suggest that the Pyramid is conceptually robust (i.e., can accommodate a full range of attributes), and that it can be used to organize visual content for retrieval, to guide the indexing process, and to classify descriptions obtained manually and automatically.