End-user development (EUD) research has yielded a variety of novel environments and techniques, often accompanied by lab-based usability studies that test their effectiveness in the completion of representative real-world tasks. While lab studies play an important role in resolving frustrations and demonstrating the potential of novel tools, they are insufficient to accurately determine the acceptance of a technology in its intended context of use, which is highly dependent on the diverse and dynamic requirements of its users, as we show here. As such, usability in the lab is unlikely to represent usability in the field. To demonstrate this, we first describe the results of a think-aloud usability study of our EUD tool “Jeeves”, followed by two case studies where Jeeves was used by psychologists in their work practices. Common issues in the artificial setting were seldom encountered in the real context of use, which instead unearthed new usability issues through unanticipated user needs. We conclude with considerations for usability evaluation of EUD tools that enable development of software for other users, including planning for collaborative activities, supporting developers to evaluate their own tools, and incorporating longitudinal methods of evaluation.