The loud clapping you may have heard yesterday coming from the direction of San Francisco had nothing to do with us proudly supporting our favorite football team after a tough loss, but rather with my reaction to this interview at last week’s LegalTech New York conference with Judge Andrew Peck about keywords following a session title, “Are Search Terms Dead.” Not only did Judge Peck give a resounding “no” to the demise of keywords, but articulated the key elements that go into developing a strong and defensible keyword list.
Of course keywords aren’t dead! I often refer to the search for evidence in electronic documents as a linguistic treasure hunt. It would be absurd to argue that our knowledge and intuitions about how other humans talk about things weren’t useful tools in that hunt. But if you want keywords to work well, you have to get them—and the process around them—right.
Testing 1-2-3, testing 1-2-3…if you don’t test your keywords before settling on a final list, you WILL be in for unwelcome surprises. (By the way, I use “keyword” as shorthand to mean something more like “list of words, phrases, Boolean expressions, and proximity-restricted word sets.”) You don’t need access to a full population to do this—a genuinely representative sample of even a few thousand documents is far better than nothing, although I generally like to have at least 30,000 to play with. Here’s what you may discover:
• Keywords that don’t hit anything, or hit far less than what you would intuitively expect. This may be due to something as straightforward as a typo or faulty logic in a search term, or as complicated as a realization that people don’t actually talk about your targeted subject matter in the way you predicted.
• Keywords that hit far more than you expected and/or far more than can possibly be precise or useful. This too can have several causes: overzealous wildcards or Boolean operators (AND at the document level barely restricts; OR is a broadness troublemaker, since it does the opposite of restrict); too-large proximity operators; polysemy (that word that seems so obviously on-target might have three other common meanings, or might be a last name of several employees); etc. Note that you will get far better—and more easily actionable—information about the performance of your keywords if you look at results on the individual keyword level.
Furthermore, the documents hit by an initial set of keywords can be a rich source of information in the development and refinement of your keyword list as a whole. Don’t just look at numbers of hits, because that information per se means almost nothing (though it can be a red flag suggesting either of the problems noted above). Browse through the content of the documents you’re hitting, because this is where you’ll discover things like relevant jargon and code names. It’s how you’ll discover if one of your keywords is actually hitting mostly irrelevant public newsletters. It’s how you’ll start to get a better sense of which players have most zealously been discussing the topic of interest. And so forth.
I think Judge Peck would back me up on this one, too: if you want robust, defensible results, hire a search consultant to help you. Keywords done right can be a beautiful thing.