Zen of Information: Document description and categories

Categorization of information, and the use of categories as a form of surrogates of objects appeared in early history. Documents of one type, such as bills of sale, would be organized in generally the same location. They would also be separate from other documents, such as cargo inventories. This principle can still be seen libraries all over the world. Organizing documents into categories facilitates access to the documents.

As the number and variety of documents grew, the number of categories also grew in size and complexity. The creation of additional categories and subcategories made sense, which in turn, increased the specificity of descriptiveness of that particular scheme. Categories had always been considered access points to the documents but increased the specificity of new schemes affected the perception of what their role could be. In time, from being access or entry points to documents by describing what the documents were about, or aboutness, the subcategories were promoted to being used as representations of the information in the documents.

If documents are analogous to capsules of information, information is inside of documents. Categories and subcategories were useful to describe the capsules, or documents. Today, their usefulness has expanded to describe the capsule’s contents. The problem is that these representation constructions cannot capture all but only a limited set of all the information in the document set.

Users of systems never know what information is not being included in the set of surrogate representations.

This problem is a fundamental failure of document retrieval systems that use this type of representation alone. If some information is not represented, it will not be found.

Zen of Information

Monday, August 9, 2010

Document description and categories

No comments: