Taxonomies, Categorization, Classification, Categories, and Directories for S

最新推荐文章于 2024-03-13 09:56:23 发布

原创最新推荐文章于 2024-03-13 09:56:23 发布 · 863 阅读

0 ·

本内容遵循CC 4.0 BY-SA版权协议

标签

#Web #Yahoo #Go

search 专栏收录该内容

1 篇文章

订阅专栏

本文探讨了信息组织中的关键术语，如分类法、目录、聚类等，并解释了它们在信息检索中的应用。文章还讨论了自动化的挑战及人类编辑的重要性。

http://www.searchtools.com/info/classifiers.html

Taxonomies, Categorization, Classification, Categories, and Directories for Searching

The terms taxonomy , ontology , directory , cataloging , categorization and classification are often confused and used interchangeably. These are all ways of organizing information (or things or animals) into categories.

There are a number of applications that can help people create taxonomies and place information objects within their categories, although the amount of automation can vary. Some programs simply allow anyone to manually add a URL to a specific category by submitting a site. Others allow human catalogers to create sophisticated rules to specify certain words and phrases which will place a page in a category. Others accept a "training set" within an existing taxonomy, and will place documents in categories based on similarities. Still others attempt to automate the entire process, grouping pages into topics based on programmatic evaluation of the contents.

When evaluating these applications, remember that they are simply software. No matter the elegance of the algorithms, a computer program can never truly understand the concepts involved in a page , as a human can do, and will sometimes place pages in the wrong categories. For example, one very automated system had an "Arts and Humanities" category which includes links to an Internet services consulting company and a singer-songwriter's personal home page (along with many more appropriate pages). To serve your site or intranet users, plan for a significant amount of human cataloging and editing.

Glossary and Definitions

A directory is an organized sets of links, like those on Yahoo or the Open Directory Project, which allows a web site to display the scope and focus of its content. A directory can cover a single host, a large multi-server site, an intranet or the Web. At each level, the category names provide instant context information to users. Rather than a simple list, such as the results of a search, drilling down into the more and more specific categories (for example Shopping > Clothing > Footwear > Athletic ) explains how the pages fit into the larger set of information.

Categorization is the process of associating a document with one or more subject categories. So the entry for a page on cross trainer shoes could go into Running , Manufacturing , Sports Medicine , or Rushkoff, Douglas ! All of these are legitimate, depending on the context.

Cataloging and Classification come from libraries, where specialists enter the metadata (such as author, date, title and edition) for a document, apply subject categories to it, and place it into a class (such as a call number) for later retrieval. These tend to be used interchangeably with Categorization.

Clustering is the process of grouping documents based on similarity of words, or the concepts in the documents as interpreted by an analytical engine. These engines use complex algorithms including Natural Language Processing, Latent Semantic Analysis, Bayesian statistical analysis, and so on.

A Thesaurus is a set of related terms describing a set of documents. This is not hierarchical: it describes the standard terms for concepts in a controlled vocabulary . Thesauri include synonyms and more complex relationships, such as broader or narrower terms, related terms and other forms of words.

Taxonomy is the organization of a particular set of information for a particular purpose. It comes from biology, where it's used to define the single location for a species within a complex hierarchic. Biologists have arguments about where various species belong, although DNA analysis can resolve most of the questions. In informational taxonomies, items can fit into several taxonomic categories.

Ontology is the study of the categories of things within a domain. It comes from philosophy and provides a logical framework for academic research on knowledge representation. Work on ontologies involves schema and diagrams for showing relationships in Venn diagrams, trees, lattices and so on.