The Washington Post Is Experimenting With Generative AI, But Setting Clear Boundaries

Playing around with generative AI is serious business for The Washington Post.

It knows there’s an inherent tension in what it’s trying to achieve with its gen AI projects.

Most people think of ChatGPT as an experimental tool and are growing accustomed to its eccentricities and blunders. But large language models (LLMs) like ChatGPT have an unfortunate track record of making up stories – a no-go for any reputable publisher concerned with journalistic integrity and accuracy.

“For us, as a news organization, that’s not acceptable,” said Sam Han, The Post’s director of AI and machine learning and head of Zeus Technology.

Still, as readers adapt to new ways of getting information, including through conversational interfaces, “we want to provide similar options,” he said.

Staying in bounds

To keep LLMs in check, The Post creates guidelines and boundaries.

For instance, if you ask ChatGPT about the impact of climate change on the economy, it could scan reams of information online and produce an answer – with a chance of hallucinating. Instead, The Post selectively feeds snippets or phrases from its own articles into the model so it can be sure of generating a trustworthy response.

The Post is experimenting with commercial LLMs, such as Open AI’s ChatGPT, Google Bard and GPT deployment through AWS. It’s also trying out open-source LLMs, including varieties of Meta’s LLaMA, to see if they can be refined “for our purposes,” Han said.

“We don’t want to be technically tied to a particular model,” he said.

But The Post has another reason for looking into open-source LLMs: “So we can have our version of it for confidential processes,” Han said.

His understanding is that, if a user accesses ChatGPT’s web interface, OpenAI can crawl the conversation and use the data for training purposes. But if someone uses an API, OpenAI keeps the data for 10 days for debugging purposes, then discards it. Still, better safe than sorry, particularly when it comes to The Post’s content.

But “I’m talking as a technologist,” Han said. Legal concerns, such as whether LLMs are training on The Post’s data without permission, are policy questions that fall under the purview of The Post’s AI Task Force.

No innovation without experimentation

In late May, The Post announced it was creating two cross-functional teams focused on AI.

The AI Task Force establishes AI policy guidelines and priorities. This steering committee might say a human has to be in the loop before The Post publishes any AI-generated content or that AI-generated content must include a clear disclaimer or notification declaring itself as such. Han leads The Post’s AI Hub, an operational team that collects AI-related ideas from across the organization and spins up proofs of concept (POCs) for the most promising ideas.

The team showcases the POCs to the AI Task Force. If there’s consensus around pushing certain POCs to production, the AI Hub assigns it to the appropriate teams.

The AI Hub has held a few AI-themed hackathons that yielded viable ideas, including a chatbot that could field reader questions, an automatic documentation tool and a headline generation tool. A small subset of newsroom editors is presently evaluating the feasibility of AI-powered headline generation.

But The Post is no stranger to AI.

Previously, it built machine learning models to execute a number of tasks, such as predicting subscription propensity and churn, moderating comments, recommending articles to readers and performing sentiment analyses.

And during the 2016 election cycle, The Post created an automatic content generation system called Heliograf that would grab real-time data from the Associated Press to automatically create updates for hundreds of governor and state races.

Heliograf subsequently expanded to other coverage areas, such as local sports, before The Post pulled the plug because “the technology was not there,” Han said. “The language was not good enough.”

The Post is currently testing generative AI models against its traditional machine learning models. Its sentiment analysis model, which The Post uses to fuel reader recommendations and match advertiser needs, is a good example.

Historically, a data scientist would spend three or four months building a model, after which The Post would collect data using Amazon Mechanical Turk. Between three and five reviewers would then manually review and rate the articles for sentiment.

Now, The Post hands an API key to a software developer, who can share an example of a positive article and a negative article with an LLM and ask the model to classify a new article’s sentiment.

The jury is out, however, on whether the new model can outperform the old one, because the testing is just getting started.

AI aspirations

Still, The Post has big ambitions.

In the newsroom of the future, Han said, each reporter could have an AI assistant that gathers, analyzes and summarizes information for stories. Throughout the writing and editing process, the AI agent could provide suggestions on copy or headlines, generate different summaries and translate the article into multiple languages.

“In the distribution phase, I see great potential as well,” Han said, such as generating different versions of an article for different audiences. It could also repackage the original content in different formats for TikTok or Facebook.

Another area where AI might shine is facilitating more personalized interactions between readers and reporters. An avatar of a reporter that adopts the reporter’s voice could interact with readers in real time, collect information from these conversations and bring it back to the reporter, according to Han.

Han acknowledges that many practical and technical hurdles lie ahead for news organizations like The Post when it comes to AI technologies. Just as social media changed how people consume information, ChatGPT and its kin will upend reading habits and tastes in ways that are difficult to anticipate.

“A lot of tech companies right now, when they build models, are trying to filter out bad information in the training [process] – [and] that’s a good effort, but it’s not complete,” Han said. “You cannot read every article and fact before you feed it into the training set.”

Tagged in: