Search: One Size Does Not Fit All

Some colleagues were asking me how I approach search recently and I found that I was repeating myself. The light bulb went off so I wrote this blog.

1. Utopian Search Model

We don’t live in Utopia so why did I include this? At first pass many people will build a search interface with the hidden assumption that they are using the Utopian Search Model – and get frustrated when search doesn’t work. The hidden assumption that kills us is that we assume that a user can enter the exact keywords needed to retrieve the specific documents they need to find every time. Search works “auto-magically.”

The Utopian Search Model can have accidental success. Rarely does a user know how to enter the right keywords to map exactly to the documents required.

2. Item Search Model

An effective user interface supporting Item Search Model provides a map between what the user knows and how the system describes the objects. Examples include searching for articles, books, and specific information objects. Often user knows enough information to uniquely identify the desired object. This may seem very similar to the Utopian Search Model but there are key differences. The main difference is that while a user may know specific information about a document – but the information might not be unique to that document. For example, there may be different versions of the same document or different documents by the same title or author.

Some assumptions:

    System has uniquely identified specific item either through a single field or a matrix of fields

    User has knowledge of how to submit a description that sufficiently narrows the result set so the specific item can be identified.

3. Untagged or Hidden Information Search Model

This model works when people search for information without knowing whether it is actually exists or not and without knowing for certain what form it might take. The effective search tool in this space provides the user with a means for “chunking information” in meaningful ways and subsequent browsing.

It is an iterative process

This is NOT the model followed with Google except at the most coarse of levels.

Occurs when a information repository system does not uniquely/adequately identify “knowledge” or “information” or the end-user is unable to derive the right matrix of attributes and corresponding descriptions to retrieve desired information

A Methodology for Locating “Hidden” information

Basic problems faced by people seeking “hidden information:”

    How to find information not explicitly/adequately described by system?
    How to effectively describe what is only partially understood or seen?

A user needs to divide information into meaningful chunks and explore the resulting “Klondike Space” to see if desired info is available in that space.

    The phrase “Klondike Space” comes from the book The Eureka Effect. In the book the phrase evokes the image of a prospector looking for gold. The gist is that a prospector could look at topographical clues and identify places more likely to contain gold than others. Some prospectors were good; many were not. Here I use the phrase to compare a person searching with the prospector and the full set of documents as the area being mined. A good searcher looking for “hidden” or untagged information knows how to chunk the documents in a way that creates a set of documents where an answer is more likely to be found.

    Doing a keyword search is just one way of creating a chunk of documents. Many times that works but when it doesn’t then more sophisticated ways of “chunking” needs to be employed – beyond guessing at a series of keywords. There are other ways to create potentially rich chunks that increase the likelihood of finding the information needed.

Creating ways that allow a user to meaningfully chunk information for their business needs is the goal of search architecture. Different information types often require different chunking strategies. That shouldn’t be a surprise. If you have ever went into a store that sells hats and watch people try them on, you’ll know that although the tag reads “One Size Fits All” – it doesn’t.

What’s the Next Step in the process? Identifying what metadata is required to support Item Search and Untagged Search requirements.

And the step after? Identifying what metadata is available and developing a strategy to create additional metadata (Spoiler alert: Crowd-sourcing).


Why Custom Search Interfaces?

I am an advocate for custom search interfaces because it lets me place development of the search interface more within the context of the users rather than the search platform.  

Depending on the search platform this can also enable me to create multiple interfaces, each tuned to a specific business task or scenario while reusing much of the code used to talk with the search service.    An additional benefit is that I can use the business context to add important contextual information to query objects before they are submitted to the search platform.   This query enrichment improves the quality of the query which positively impacts the quality of the result set.

Who to Ask if Search is Working?

If you want to know if the application components are working, ask the IT department

If you want to know if the right results are coming back, ask a line of business user.

Why bother line of business users? 
      It’s possible for application components to be functioning correctly and search still be broken.    There is more to setting up search than installation and configuration.   Well designed search contributes towards a well-run company.      You are losing time and money if you wait for the  complaint box to fill before evaluating whether your search is effective.
I can’t point to intranet enterprise search driven sites on the web but we can point you to a retail site that uses well-designed search to drive category navigation.  The principles used to design search functionalities and related metadata for this site are the same ones used to build enterprise search applications.

Pragmatic First Steps for Implementing Search

Starting a new search project can by a trying experience. There are many pieces to consider (see my blog on the layers of search). In other posts and conversations I say that the best place to start is to identify your business objectives and always keep them in mind. That advice still holds but those goals might be expressed in abstract terms and be hard to connect with what you want to accomplish with search.

A pragmatic way to begin is to start with designing the search query interface and the corresponding search results page. These two pages will define a) what you expect the end user to be able to do and b) what metadata is needed to drive the search experience. As you look at these pages you can begin to get a sense of what data and metadata you need and then you can see what you have available. The gap will need to be filled throush some data entrichment process. As you begin to focus on the organic result set portion of the search response page you can start to see what special filters are need – filters such as security, pricing, access. You may also be able to start to identify search boosting scenarios based on user role, context and state.

There is more to implementing search than designing the initial query interface and the search results page – but starting there is a way that can help new projects get off the ground — and get old projects back on track.

Let me know your experiences in this area and send me an email at

Blog Moving from DiscoveryArchitecture

When I started my blog on Discovery Archtecture, I was emphasizing that search is so much more than merely typing in one or two keywords and scrolling through a list of results (read Internet Search Engines). Search is more of a Discovery process.

But, alas! The world has decided to use the word “Discovery” for the process of legal discovery.

So, I have created a new blog called Search Architecture — again to emphasize that there is more to search than entering keywords, blah, blah blah and for search to work a solution has to be architected intentionally — and that involves quite a number of different layers. Youcan read an entry I wrote identifying different layers.

Managing Search Complexity through Simplicity

At the heart of any search solution is a good understanding of the business problem to be solved as well as knowledge of the available content and metadata. You have to work within the confines of the content you have (or can add with content enhancement). You have to analyze that content and be able to describe how you can identify some documents as being relevant for solving a business problem and why other documents are not relevant. That is just the beginning.

Developing an effective search solution is a complex space by it’s very nature. It brings many different pieces together to accomplish specific business objectives. Managing the complexity of search is no simple matter. One of the top challenges faced by businesses is managing this complexity. The key is managing solution complexity and providing an effective solution is to simplify the scope of the solution by dividing the space into meaningful layers:

Business Problem to Solve
     The end goal is to provide a space where employees, customers and contractors can search for and find accurate and timely information. After all, their job is not to search but to complete some other business process or transaction. Search is a tool for helping people complete their tasks successfully. Understanding the business problem to be solved and how it relates to the larger strategic plan of the business is vital for creating a successful search space.

User Environment
    Understanding how many people will be using the system, how they will access it (web, intranet, internet, mobile devices, embedded applications, etc.), and the business processes to be supported.

Search Application
    Consists of the Query and Results sub-layers. It is a user interface which consists of various search tooling functionalities. People’s impression of the effectiveness of a search solution is often based on the search interface alone. After all, that is what people see and use. Too often businesses use an out-of-the-box interface without evaluating whether it is designed to solve the business problems driving the need to update search.

Search Tooling
    Search tooling is the tool set that a search platform provides to build search applications. Tooling may include search word boosting, relevance tuning, thesaurus, synonyms, stopword lists, facets, taxonomies, search analytics, knowledge extraction tooling, and more. Also included are how content sources are crawled, parsed and indexed. Different search platforms provide different tool sets. Understanding what tooling is available and how they work is key to being able to architect an effective search solution.

Search Platform/Engine
    The software that provides the search tooling; whether it is IBM OmniFind Enterprise Edition, Endeca, Autonomy, FAST, Lucene/SOLR or some proprietary solution. In addition to search tooling you also want to pay attention to a platform’s scalability, fail-over, disaster recovery, system management, configuration management, system security and availability.

Content Enhancement
    Content enhancement or enrichment is sometimes required in order to develop a search solution that will solve a specific business need. This may mean that third-party data needs to be added to existing data. It may mean that knowledge extraction tools need to be used for unstructured (that is, non-fielded data) data like that found in emails, reports and memo fields in databases.

Content & Metadata
    It is important to know the number of documents, content types (email, reports, database records, etc.), average size of documents, annual growth of your data stores, multi-lingual requirements, governance strategy. It is also important to know the types of information that is explicitly available or can be extrapolated via content enhancement. This information will be the basis for building an effective search solution.

    Security is a vital piece of any enterprise project. In the case of search, there is user authentication & authorization for accessing the search application and for the data that will be displayed in search results lists. (You don’t want just anybody to view HR data.) Search engine crawlers also require authentication and authorization for access to different data stores at the collection level. And then there is security digital rights management at the individual document level within a given data store. In many cases this is the most complex IT piece of the search puzzle. However, solving this layer alone does not guarantee an effective search solution.

Storage Environment
    Knowing the number and types of data stores (portal, file system, FileNet8, Domino, Quickr, Documentum, etc.) is vital. Knowing how frequently the stores are updated and will need to be searched is another key piece of information. It is also know the format that data is stored in (PDF, database, flat files, etc.).

IT Infrastructure
    All of the above occurs within an IT framework of many layers in its own right and can include the following layers: network, hardware, operating system, system software and more depending on the environment.

Search and WorkForce Integration Initiatives: Two Paths

Increasingly I am seeing more projects focusing on “Search and WorkForce Integration“.    Search plays a key part in these initiatives.   The goal of search then is to provide accurate and timely information.  The gist of the business problem is this:
               If workers can’t find the information needed, 
               they either have to reinvent it or decide without it.
At some stage along the process of implementing a WorkForce Integration initiative, a company may find itself at a point where people are not using search because it “just doesn’t work right.”
At this point, companies respond by taking one of two paths. One path leads them out of the woods and the other gets them lost deeper in the forest.
Path 1.  The first path is a common IT response and that is to thow more hardware and software at the problem. This response can be valid if search is slow and unable to handle indexing multiple file formats, meeting security needs, and similar issues.   However, the problem is often not an IT problem but a business problem.   If you treat search as an IT problem, then search will likely to never work right.  You will just get lost deeper in the forest.
Path 2.  The second path is to understand that people’s jobs is NOT to search for information.   Their jobs are to complete various tasks such as analysis, evaluation, support, processing, etc.   They search for information to find out latest information, identify resusable resources, solve problems, provide answers.  If you want to know why people aren’t using search, the answer is straight-forward:
       An information retrieval system will tend not to be used whenever it is
       more painful and troublesome for a customer to have information than
       for him not to have it.”  Calvin Mooers  (aka Mooers’ Law, 1959)
To solve the search problem, you must understand the business problem and search can be used to meet those needs – and make it easy for people to use the tool.
This is the area in which Davalen shines.  Our experience goes much further than the ability to install, configure and manage the OmniFind Enterprise Edition search application.  We know how to use the tooling to solve business problems.