Artificial Intelligence is already being used in data journalism. For a field which is obsessed about trying to automate tedious tasks, AI is custom made.
Artificial Intelligence in data journalism projects often showcases some of the most imaginative aspects of how to use new tools to perform analyses that just weren’t possible before.
Often the AI is used to categorise images and texts, maybe social media posts or thousands of news reports. A human couldn’t possibly have time to read through those and would make mistakes. And mistakes can be made as AI can be dumb too, but the point is that it is getting better all the time.
My favourite quote from British journalist James Cameron really tells the story of the time we’re in:
Once upon a time the world was a realm of unanswered questions and there was room in it for poetry. Man stood beneath the sky and he asked “why?”. And his question was beautiful. The new world will be a place of answers and no questions, because the only questions left will be answered by computers, because only computers will know what to ask. Perhaps that is the way it has to be.
James Cameron, 1969
What Cameron didn’t know was that data journalists would be the ones to answer those questions now; they just weren’t able to find out the answers before AI was there to help them. That human factor leads to some really powerful work. Here is a global selection, many of them Sigma Data Journalism Award winners. Which ones would you highlight?
What happens when journalists use AI to investigate TikTok’s algorithms and how they affect videos of the war in Ukraine? The result is this project by NRK, which sent robots to investigate. They used AI to look for specific keywords from images they had collected and then programmed a bot to look through videos for the images that the AI recognised. You can read more about the project here.
This project explained the horror of the Tulsa Massacre of 1921 when an entire black community was burnt to the ground by white rioters. The team used AI to reconstruct what this thriving community had looked like before from vintage maps and building height data to create a a powerful experience. You can read more about how it was done here.
How do you search for something that isn’t there? That is the issue El Universal wrestled with this project, which used natural language processing to analyse thousands of news stories and work out where the coverage gaps were in Mexico’s reporting of drug cartel murders. Full disclosure: this is a project I worked on.
OjoPúblico in Peru built their own algorithm to identify the potential for corruption among public contracts in the country. The system was built by statisticians, developers and programmers to comb through thousands of public records looking for risk factors. There’s a detailed methodology here.
Texty have become innovators in using AI for journalism – and you can hear journalist Anatoly Bonderenko interviewed here by us for the Data Journalism Podcast live from the front lines of the war there. The team conducted natural language processing to analyse the propaganda war being fought in Europe by looking at over 3,000 pages or pieces of content a week. They built the tool themselves – and made it publicly available as an open source download.
This massive open data project takes 4m+ Mexican government contracts and then puts them through a hefty algorithmic analysis to create a searchable dataset analysed by their very own ‘Groucho’ analysis engine. You can read more about the project here.
Organized Crime and Corruption Reporting Project (OCCRP)
You have 1.3 million leaked transactions from 238,000 companies – that is a hefty and almost impossible dataset. So, what do you do? The answer if you’re the OCCRP is to build your own AI data management system to look for patterns among the thousands of PDFs, CSVs and Excel files to revieal more than €26 billion in transfers out of Russia tracked over a 7 year period to expose a a complex financial system. The project brought together the OCCRP plus The Guardian – UK, Süddeutsche Zeitung – Germany, Newstapa – South Korea, El Periodico – Spain, Global Witness and 17 other partners who can be viewed here. You can find out more about the project itself here. You can play with the OCCRP’s Aleph system yourself here.
Data journalist, writer, speaker. Author of 'Facts are Sacred', from Faber & Faber and a range of infographics for children books from Candlewick. Edited and launched the Guardian Datablog. Now works for Google in California as Data Editor and is Director of the Sigma awards for data journalism.
Discussion
No comments yet.