Sunday, August 8, 2010

Controlled Vocabularies and Representation

An information space is defined by a set or collection of documents. A list of terms is built to represent the information elements in the information space so that terms, either alone or in combination, can be used to represent all of the documents in the set. The list of terms is known as the controlled vocabulary. A short list of the terms can be used as a surrogate of each document. The list of terms is assumed to convey an accurate representation of the information space, but also of the particular properties of the document set implying that a finite list of descriptors can be used to represent all the information in the documents, or at least the most relevant information. In addition, the representation will have to accurately capture the purpose, scope, audience, and level of expertise found in the documents.

The parallel with the use of letters is clear. After all, words and their myriad of meanings are the result of combining a finite number of letters. Likewise, a finite number of carefully selected terms can represent the universe of ideas and concepts in the document collection. This powerful argument is behind the creation of lists of keywords as representational building blocks of complex information concepts.

But, is it true that words can represent everything?

No comments:

blogger logo