Knowledge Management Briefs for Your Business
LWM Technology Services
Home Page
SUBSCRIBE
TO NEWSLETTER
SEND
COMMENTS/QUESTIONS ABOUT THIS TOPIC
LWM Technology Services of Harvard, Mass. publishes and distributes this bimonthly newsletter by e-mail to individuals and organizations having an interest in any aspect of Knowledge Management processes. Each issue presents a mini-report on a KM topic and links to interesting reading on the featured topic.
See KMPro, Knowledge and Innovation Management Professional Society Newsletter for recent activities
Why do You Need a Taxonomy Anyway? And How to Get Started
Have you ever used a search engine, typed in a company or person's name, only to see results and a suggestion that you might be looking for something else instead? You may be relieved that you don't have to try a dozen variants to get at what you were seeking. On the other hand, you may want the version you typed and the suggestions are just an annoyance. Or, you may find that automatic cross-references only exist selectively and you can't depend on all synonyms to be included. There are times when the ability to browse through a list would be welcomed. Perhaps you prefer the option of picking a letter and seeing all the terms that begin with a particular word, as in the phone book index of last names where you can browse the list to find the person you want, whose first name you have forgotten. When the phone book shows you the variant spellings of a name like Crouse (Kraus, Krause, or Krauss) that helps, too, particularly if you have only heard the name spoken. Cross-references like these are a helpful tool when searching.
Taxonomies are really just specialized indexes of terms to guide you to the information you seek. What distinguishes a taxonomy from an index like the phone book, or the index in the back of a book has to do with controls or limits on permitted terms coupled with the possibility for relationships among the terms in the list. Expanding opportunities for searches on the Web prompt me to share some information about why the concept of taxonomy is important. Librarians have long used taxonomic lists to define how to organize information content to make it easier to retrieve.
In this article, I will define several types of taxonomies, why they are built, and when and where it would be useful for you to be savvy about their existence. I will also address a few basic ideas about how knowledge managers should approach building, maintenance, and application of taxonomy. References to Web sites and articles to expand your understanding are included.
There
are many terms that express concepts similar to or related to taxonomy. I may
use some of these in the article, or you may encounter other writings that seem
to be talking about similar concepts. So that you can understand how I use the
terminology in this article, here are some very simple definitions that I use,
backed up by the unabridged Random House Dictionary of the English Language:
Top of Page
Philosophers,
linguists, and semantic theoreticians give much more depth to the importance
and use of the terms listed, plus their many offshoots. For your interest, here
are two authoritative Web sites that give more information.
http://www.geocities.com/pribond/bioinfo/glossary/information.htm
http://www.jelem.com [for
Jessica Milstead thesaurus expert]
Why
Build a Taxonomy?
In the knowledge
management field, this is best answered in relationship to the context
of some body of content that you want to make searchable. For example,
to organize a collection of documents produced by a working group in an organization
(e.g. technical call center reports, or scientific research), it would be best
to classify them uniformly to facilitate retrieval. Otherwise, if content can
only be retrieved by the language of the author, or by any possible synonymous
term, retrieval becomes lengthier and more problematic. Building a list of terms
that control the way the documents will be classified or indexed is the first
step to bringing efficiency to searching. The net content of these documents
will help determine the uniform terminology that is going to be selected, and
simultaneously, where cross-references belong. If five authors use the term
cell phone in a telecommunications company and one uses wireless phone
to mean the same thing, the first term is the obvious choice based on frequency
of use. It is also preferable because wireless phone may also refer to
a cordless phone, which refers to a different type of device in the industry.
This suggests the need for the taxonomist to have semantic competency in the
specialty the taxonomy will cover.
Deciding
what term to use, instead of another, also depends on another form of context,
that of the population that will be doing the searching. In building
the taxonomy, the professional must think about the context of where the indexable
content originates, as well as the target searching audience. Trying to help
the searcher find material more efficiently and reliably implies a need for
business productivity improvement or outcome. The "why" must be defined
partially by understanding the audience of searchers. If the audience is internal
knowledge users, you may justify the effort based on timesaving, better
quality of work due to accurately finding all relevant materials, or
even eliminating replicated work by insuring that a researcher finds
the answer to large or small problems without repeating work already documented
by others. However, if the audience is external (e.g. customers), justification
may be the ability to scale back your total support staff over time by
providing an easy-to-use and reliable self-help knowledgebase for common support
questions, thus decreasing the need for human intervention.
Top of Page
How
is the taxonomy going to be applied?
Building
a taxonomy is not useful alone. It must be used and applied consistently to
be reliable for searchers. If it does not consistently return all the documents
on a particular topic, searchers will quickly lose confidence in the tool and
will again waste time by looking for alternative ways to insure that their search
is comprehensive. Here are some fundamental questions to consider about application
before you expend effort to build.
Novice searchers may be using a taxonomy list without knowing it. When they go to Yahoo, the home page gives a list of topics under Web Site Directory; these topics define the top tier of the taxonomic classification tree. If you click "Health," you will see a new list of about 50 topics each with a count showing the number of citations for that topic. If you choose "Education" with just 60 citations, the content list immediately appears also with six sub-categories for narrowing. If you choose "All Diseases and Conditions" with over 10,000 citations, an alphabet appears to browse more subcategories; in addition to the alphabet about 50 general subcategories is presented. To by-pass subcategories, you can click again to force the selection of the entire topic of over 10,000 disease and condition resources. This is an example of a hierarchical browsable taxonomy that is revealed in stages for the benefit of the searching audience. Through the category tree you will find information about high blood pressure under hypertension. However, to be automatically directed to this term, you can first do a keyword search for "blood pressure," which leads you to the same content as that classified under hypertension, plus other material on blood pressure measuring devices.
Methods
of Building: Human vs. automated
There
are numerous software application products and commercial search engines that
index a body of electronic content and, through proprietary technology, parse
the content into taxonomic entries, and then give terms some form of relationship
structure. Yahoo and Alta Vista are two that attempt to cover the Web for the
entire corpus of the Worldwide Web content. Other products are intended for
a much smaller domain, such as a corporation or academic institution. I have
already suggested some reasons for building a taxonomy, advising you to be aware
of both context of the content and searching audience. Context is also important
when thinking about the technology that will be used to build it and whether
substantive human intervention is appropriate and supported.
Top commercial search engines use a mixture of automatic processing to determine the index terms for the content, while also employing numerous professional thesaurus developers to refine and normalize the language to a consistent and reliable standard. This was a hybrid solution. Even if you employ a software application designed for your local body of internal content, you will need the option of being able to change and refine terminology, to add new terms not appearing explicitly in the content, and to remove inappropriate language for your industry. You will also want to define the rules for when a term is used to index material. I once reviewed the automatic indexing of thousands of documents on project management. Needless to say, the term "project management" was applied to every document. In the context of the organization and the searchers, this was totally inappropriate.
Scope of content must be analyzed for quantity and diversity. Hundreds, or even a few thousand documents would benefit little from a totally automated process. With a small amount of content, human intervention is necessary to define subtleties and nuances, which will enable them to organize the material for very precise retrieval. If a small and highly specialized collection has high value, because of the unique knowledge it represents, it deserves skilled human interpretation by people with expertise in the environment of creators and potential searchers. It may be that no automatic processing will be needed until the volume becomes excessive for human indexers. At that point, the foundation taxonomy should already be defined for further expansion and enhancement.
Availability of human and technological resources must always be considered. Cost is important. Going back to reasons for building a taxonomy, you must reexamine your business justifications. Business reasons can justify both the human and technology resources you need to do a credible job. Building taxonomies is clearly a case where there is little advantage for attempting the project without an expert to execute it, or if high quality searching is absent.
What
categories of terms are candidates for a taxonomy?
Most
of us think of taxonomy in terms of topical categories but there are many other
ways that people in organizations seek content. For example, I may remember
talking to a prospect in New Jersey last year. If the Date of all contact calls
and the State are both indexed, my search is easy. At most I will need to scan
a few hundred entries. While some of the ideas on this list are candidates for
simple index entries (not taxonomy controlled), others clearly need vocabulary
controls; many also require cross-references. When considering the following
list, think about your audience, how they look for things and the types of questions
they ask.
Top of Page
Where
do you start?
Starting
is the difficult part but experts recommend beginning small to get a feel for
the methodology and effort needed. If you are developing a taxonomy for an enterprise,
you may want to begin working with one group, project or product line, which
may eventually evolve to a more substantive effort organization-wide. I find
it useful to look at the organization chart, job titles and job descriptions
of content producers to get a feel for the scope of content that the vocabulary
must cover.
Next, look at the departments in your organization, both from the perspective of what they might produce in the way of content and what they might be researching. Gather lists of all products and product components your organization researches, designs, builds, manufactures or sells. If you are part of a service organization, do the same for the services you offer.
To understand the terminology that you have begun to accumulate, seek understanding of the context, and get a sense of the major topics around which you might group the important language in the taxonomy. Finally, seek an understanding of how the language used in one part of the organization might be related to other groups. For example, in a drug company scientists doing pure research will be steeped in compound names, but in production, manufacturing and sales compounds will have become product names.
Having done this internal research, you now have a feel for the subject matter and disciplines represented by content producers and content users. One of my favorite reference tools is Gale's Encyclopedia of Associations. Use it to find professional associations that publish glossaries, thesauri, and maintain Web sites. These will be valuable resources for verifying and adding to the taxonomy you plan to build. Collect term list publications from associations that match your content audiences' interests. You may want to explore the possibility of acquiring through licensing electronic versions as a starting point for your own taxonomy, if the list is relatively small. If it is more than a couple thousand terms, you will spend entirely too much time stripping terms to make it worthwhile to use a published list. It is far easier to add terms to a taxonomy when you begin to see the need for narrower or more specific concepts.
Finally, review the bibliography at the end of this article for other writing that will point you to products you might want to use for building a taxonomy, categorization, and search. Milstead's (JELEM) Web site, cited earlier, has a list of products for building thesauri to consider.
Ongoing
maintenance
You need a commitment
to ongoing maintenance because two types of change are inevitable:
New terms come in and others are diminished in importance in the world of commerce and within organizations. The authoritativeness and worth of taxonomic content depends on vigilance, care and feeding. A program of ongoing edits, additions and modifications is crucial to build the trust of your audience in the reliability of search. This means that someone must be in charge - an authoritarian, if you will. It should be part of someone's regular job, a person with knowledge of the industry, good communication practices for gathering user knowledge, and attention to the details and nuances of language. A person trained in information science and indexing is often a fine candidate for this position.
Another suggestion is that the authoritarian or a person involved in Web site content management, invest time in routinely evaluating logs of searches to build and sustain a sense of what people seek, what they find or not. It is also good to have a contact link clearly sited on each search page that enables transmission of both suggestions and search frustrations, a search assistant. Both of these tools will be invaluable to the authoritarian in maintaining the taxonomy and adding cross-references to aid searchers. It is also useful to be able to communicate with users who have a point of view about what would be helpful to them.
SUMMARY
Taxonomy building, support
and maintenance is perhaps one of the most intellectually challenging tasks
related to searching infrastructure. It requires a committed person or team
if it is to be of any value. Basically, it comes down to a belief in advance
effort to insure that when information is needed it can be found in the least
amount of time, and with search results that are comprehensive and accurate.
The expense is usually justified because the labor to build and maintain such
a system is small compared to the number and level of people whose productivity
will benefit. Couple adding productivity to hundreds of employees, with avoiding
the repeat of work already done elsewhere in the past and you have a winning
reason for doing it.
- Lynda W. Moulton ©2003
LWM Technology Services
Top
of Page
Related Readings of Interest
Hapgood, Fred. Skills. Sleuthing Out Data; Categorization software helps search-tool users find what they seek. CIO 05/01/2003, 3p. http://www.cio.com/archive/050103/et_article.html
McCloskey, Paul. Knowledge Management Building Blocks, Tech Briefing. FCW.COM 04/14/2003, 4p. http://www.fcw.com/vendorsolutions/km/spec-building-04-14-03.asp
Milstead, Jessica. NISO Z39.19: Standard for Structure and Organization ofJELEM Information Retrieval Thesauri Jessica L. Milstead © 1998. 9p. http://research.calacademy.org/taf/proceedings/milsteadtaf.html
Rao, Madanmohan. A Decade of KM; a Report on "Real-World Best Practices"Destination KM from American Productivity and Quality Center's 8th KM Conference. Destination KM 06/11/2003, 3p. http://www.destinationkm.com/articles/default.asp?ArticleID=1065
Turocy, Pat. No More Information Overload;Companies must consider how they classify data so employees can find it fast. Information Week 12/16/2002, 2p. http://www.informationweek.com/story/IWK20021212S0007/
Warner, Amy. A Taxonomy Primer. Lexonomy ©2002, 6p. http://www.lexonomy.com/publications/aTaxonomyPrimer.html
- Lynda W. Moulton ©2003 LWM Technology Services
Return to LWM Technology Services Home Page