Friday, October 12, 2007

Low bar vs. high bar usability tests

Here are three usability test situations that I've encountered in the past.

  • An interaction designer looks at the graphical page headers she's been sent by a graphics designer and something looks wrong: the words on the images seem blurry and hard to read. She tries to enhance the images in Photoshop and asks me for me opinion of the enhanced images. We quickly design a comparative usability test of the images. The original images (A) are placed next to the enhanced (B) images, and three questions appear below each image: (1) on a 1-5 scale which image appears sharper (1=image A is much sharper, 5=image B is much sharper)? (2) Which image is easier to read? and (3) Which do you prefer? The test materials are put in a Word document and sent to 12 office co-workers. The tests are returned and tabulated in less than three hours-the enhanced images are judged sharper, easier to read, and are preferred by a clear margin. The results are sent to the business lead with the recommendation to consider using the enhanced images until further testing is completed. The business lead who employed the graphic designer rejects the recommendation, saying that the data weren't valid because the test participants weren't real customers.


  • I'm designing a main menu for a touchtone IVR. I'm not familiar with the terms used by the business for the menu options, but the client assures me that callers will understand the options, even though callers have no more knowledge about that business than I do. I type up some alternative wordings for the menu options then mail them out to some coworkers with the question below each set of options: please list the types of services or products you would expect to find associated with each option. I collect my responses for the next two days, and the results aren't encouraging. I show my results to the client and suggest a different strategy for naming the menu options, with follow up testing on the proposed menus. The test results are rejected because the methodology is too dissimilar to an actual IVR experience.


  • I'm designing a menu for a speech IVR and I'm not sure the menu options the business wants are going to be discriminable to the speech recognizer. I ask that a one-menu prototype be created so I can test the discriminability of the options. The request is refused by a manager because, I'm informed, I'm not a real customer so it's not a valid test.

In each case the designs of the tests that I've described barely qualify as usability tests. They don't follow the procedures that you might find in classic texts on usability testing like the Handbook of Usability Testing or A Practical Guide to Usability Testing. The first example looks more like the procedure for an eye exam: "which do you prefer, A or B? A or B?" The second example looks like a poorly conducted card sort. The third looks more like a QA procedure. So what's an experimental psychologist like me doing running these poorly designed tests?

I call these sorts of usability tests low bar usability tests. They are simple, easy to design and execute usability tests that answer the question, "is it OK to proceed with this design or idea?" Since many interface designs take a good deal of time and effort to complete, and usability tests themselves can take a good deal of time to perform, a low bar usability test can tell the designer if he or she is on the right track. If the design passes the low bar test, design continues, and the next usability test, the high bar usability test, is run according to the canonical texts. If the design fails the simple low bar test, there's no point in taking the design in that particular direction and running a high bar test. It's time to step back and re-design.

To take one example, if I test the goodness of some menu options in an IVR I'm designing and I can't get the recognizer to reliably distinguish my utterances, should I continue with this design and get data from real customers? Of course not. I've worked with IVRs for years and I'm very good at getting them to do what I want them to do. If I can't use the menu no one else will be able to. If the menu passes this low bar test, does it show that the menu is ready for prime time? Of course not. I still need to run a high bar usability test. But at least I can move forward with the knowledge that the menu has a chance of working.

You should add low bar usability testing to your discount usability bag of tricks, remembering to explain the rationale for running a less-than-perfect test procedure to those who learned usability testing by the book.

No comments: