HighWire

ASBMB News, March 2003
Concept Search: Cutting Large Keyword Searches down to Size

[In early 2002, ASBMB Today introduced the new "portal" site from Stanford's HighWire Press, which allows you to search all of Medline plus 350 journals' full-text at once -- including the JBC, of course! We began a monthly series of short articles highlighting tools or features of this new site for researchers' sore eyes. The new site is at http://highwire.stanford.edu ]

Searching for an article that is about several topics in combination is one of the hardest things to do in most keyword-search systems. And when you search on a keyword and find that it describes astronomical features as well as biological ones (e.g., "mercury") you would like to be able to select only the portion of your result that has to do with your topic. It just got easier to do these things with the HighWire Portal's new concept search feature, called Topic Search.

How Concept Search Works: Start with a Keyword as a "Seed"

Searching for topics can be hit or miss in some systems. After all, how do you know what concepts or topics to search for? In the HighWire Portal you start a topic search with a keyword search that will find some of the articles you'd like: whenever you do a keyword search, your search results show you what topics the resulting articles are indexed under. You can then easily use the topics shown - individually or in combination -- in a topic search with just a few clicks. You can subset your keyword search to retain only articles about certain concepts, or you can start a new search based entirely on combinations of concepts in an article.

Example: Concept Search at Work

Suppose you are interested in ubiquitin-mediated degradation by the proteasome. You begin a concept search by planting a seed: you could do a keyword search for articles that have all the words "ubiquitin-mediated degradation by the proteasome"; zero result is not surprising. But a keyword search for any of those words finds almost 5 million items! The top items in the result - thanks to "relevance ranking" - are good ones to use as seeds in a concept search, though. But, perhaps best, a simple keyword search on the word "proteasome" retrieves "only" about 12,000 items. Let's start with this last result as the seed for searching by concept.

Notice the blue, right-most column of the search result page shown here - you can narrow or widen your browser window depending on whether you want to see the topics or not. Here we see a selection of the topics that each article is filed under. We're going to check the boxes for the topics that match the concepts we're interested in: Ubiquitin, Protein Degradation, and Proteasomes.

Then click on the Search button toward the top of this right most column. First choose options for ALL topics (meaning that each article in the new result must contain all three checked topics) and Within Current Result (meaning that our keyword search result on "proteasome" will be reduced, refined and limited to the three topics we've chosen.

The new result is "only" 233 articles, but each of these articles has something to say about all of these three topics.

FIGURE 1

Notice that the first article is a review article covering these topics. With the "one click" options described in an earlier article in this series, you can quickly limit the result to review articles only, or to the top-ranked HighWire-hosted articles (for which full text is likely to be online), or sort the results so that the newest articles are first. You can even further narrow your topic search by checkmarking more topics and clicking the Search button again.

The Limitations of Concept Searching

Concept search is a good way to "find what you are missing" when you have been relying on the more traditional author and keyword searches. It is good to pair it alongside other exploration tools like Citation Map and Instant Index (both described in previous articles in this series), and MatchMaker (described in an upcoming article).

You should not use concept search as your only tool when you need to conduct an exhaustive review of a topic. While the "taxonomy" of concepts is extensive - there are almost 30,000 concepts we have indexed - and was developed and tested by working scientists and editors using a real-world scientific and medical vocabulary, the actual assignment of individual articles to specific topics is done by computer programs. (The programs analyze text in articles and extract concepts by looking for frequent phrases that match the phrases that editors have said are associated with the topics.) The computer assignment is generally very reliable, but not perfect: a few topic assignments are made that shouldn't be, and a few are not made that should be. We're sure you'll spot some of each!

We'd welcome suggestions for improvement from you whenever you see an error or omission. Just click the Contact button to send us a pointer to an article we should analyze further.

The 2002 and 2003 issues of ASBMB Today covered topics about the new HighWire Portal. The articles are online at

http://highwire.stanford.edu/inthepress/asbmb/index.dtl