I don’t know about you, but even though my work revolves around the digital, I am still an old school pen-and-paper kind of notetaker.
Which means my desk is host to the ever-shifting piles of notebooks, post-its, and sheets of doodled paper worthy of their own modern art exhibit.
Dig into these piles and you’ll find an endless source of half-finished thoughts, jumbled ideas in the making, and even an abandoned to-do list or two. To the outsider, it probably looks like I have the organizational capabilities of a lunatic. To me, everything is exactly where it should be.
Now, imagine if I asked you to look into the piles and find the specific client requests for my next speaking engagement. Although I swear there is a method to my madness, this request would most likely end in your own loss of sanity as you tried to sort through the haystack of notes to find the needle of information.
As is evident by the piles, I do clearly document critical information. However, it’s in accordance with how my own brain ticks, and so the strategy for what information goes where is not necessarily an obvious feature to my organizational style.
Thankfully, you will never have to sort through my notes.
However, let’s imagine that instead of shifting through notebooks, your boss has asked you to shift through hundreds of lines of code to find a single line that someone, at someone point, had made a critical decision in writing and was now impacting the performance of your model.
There is very little model documentation, zero indication of who wrote the original code, and seemingly no standard of organization whatsoever. You know that the success of the entire AI system depends on fixing this model, and yet it feels like you would have better luck finding the post-it note with last week’s to-dos on my desk.
Maddening, right?
(P.S. Tune in Monday for a conversation with Jeff Gothelf on the unexpected metric for AI success. Register for the livestream here.)
TL;DR - Read the Document case study here
It may not be glamorous, but standard documentation practices are an essential component to a successful AI technical infrastructure. Standard documentation of data, algorithms, models and experiments that go into a single AI project is essential to the success of that project, as without the documentation you are crippling your ability to iterate and refine your AI systems.
Sometimes this is even as simple as documenting what models are deployed within your own company.
The second of the Technology elements, this week we are covering Document in the Values Canvas case studies series
In this element, we are looking to fill the need to create transparency of the ethical decisions being taken during the AI development lifecycle. A Document solution is the documentation of all ethics related decisions taken during the lifecycle of an AI model.
Typically, I’ve been writing brief introduction stories to the Values Canvas case studies in this newsletter, but I thought I’d shake it up this week and give you a peak into the case study itself.
For this edition I had the opportunity to collaborate with Karin Golde, founder of West Valley AI and the brains behind the newsletter Good Judgment. I’ll let her writing speaking for itself, as we dive straight in…
Excerpt from Document Case Study:
“NewsLens is a company that leverages cutting-edge AI technology to provide nuanced media categorization and analysis solutions. Their tools empower media outlets, educational institutions, and public sector agencies to access insights that enhance research and decision-making. Part of this involves classifying news content, enabling researchers to narrow their focus to specific areas of interest.
The leaders at NewsLens have set high standards for excellence, including industry-leading product performance and ethical product development. For the product owners, this means crafting clear specifications that align with the needs of the end user. For the data scientists on the software development team, this means creating highly accurate classification systems, and engaging with ethical vendors for any work that needs to be outsourced.
When classifying news content, some categories in the NewsLens platform are relatively straightforward to implement, such as various types of sports, technologies, or health issues. But NewsLens started getting customer requests for more political categories, particularly a category for terrorism. They recognized that this requires careful consideration, since what is considered “terrorism” can vary depending on perspective.
The product owner researched the issue, and landed on US law as the basis for the definition: “premeditated, politically motivated violence perpetrated against noncombatant targets by subnational groups or clandestine agents.” Following standard practices in software development, the product manager included this definition in the design document that the developers would follow. The data scientists then set off on their work to build a machine learning model that would assign the appropriate articles to the “terrorism” category.
The model needed to be trained on thousands of articles, each labeled as either “about terrorism” or “not about terrorism”. A standard practice for labeling at this scale is to outsource the work to a vendor who manages large teams of annotators, who are given instructions on how to read the articles and choose the correct label. The data scientists engaged a vendor known for ensuring fair pay for annotators, and provided the news data along with labeling instructions that included the definition of terrorism provided by the product manager.
Following another standard practice, the data scientists requested that at least two or three individual annotators read and label each article. This is a method for quality assurance, because if multiple annotators independently choose the same label, it is an indication that they are each performing well on the task.
As the annotators worked through the first batch of articles, they frequently assigned conflicting “terrorism” labels, prompting data scientists to clarify the initial guidelines to better help annotators choose the correct label. Unfortunately label consistency did not improve as much as the data scientists had hoped, but with pressure to meet hard deadlines, they moved forward to train the model with the data they had collected so far.
Once the model was in production, NewsLens users began to complain about poor quality, and pointed out some specific issues. For example, when they searched news using the “terrorism” category, they were not seeing some important articles on a recognized terrorist organization’s recent bombing of train tracks on a commuter rail line. What’s worse, the category included a number of articles about Islamic culture that clearly had nothing to do with terrorism.
The team knew they needed to comb through their processes to see how this disconnect arose between user expectations and product design, and how they could fix it. Unfortunately when they reviewed the project documentation, there were no clear answers.
NewsLens’ Needs:
Immediate: Uncover the root of the problem, and reassure customers with a plan to fix it.
Medium term: Rerun the project with improved processes and release a new model.
Long term: Create a new system for documenting model building policies and decisions.”
To familiarize yourself with the Values Canvas, gain insight into my book Responsible AI, and get a visual on the categories of case studies to come, be sure to download your very own open-access copy below.
When it feels like everyone is doing AI, the fear of being left behind can lead to feeling the pressure to rush adoption of these tools. However, there's a big difference between doing AI, and doing AI well. So how do you know when your AI initiatives are successful?
Tune in Monday the 29th at 10am PST / 7pm CEST for a discussion with Jeff Gothelf on how to cut through the AI noise and get to the ultimate root of success.
It was a pleasure working with you, Olivia, and contributing to your valuable initiatives! The problems described in this case study are real ones, even if the company is fictional. I'm always happy to share lessons learned from the trenches, and hope this contributes to a more ethical future for AI!