Explaining the Prosecraft drama: This is why every author is mad at a website that counted words
People in 2023 just aren't going to be cool with someone feeding all of their work into a computer without permission
If you follow any authors on social media, you may have recently seen them railing against a platform called Prosecraft, a website that creator Benji Smith claimed was “dedicated to the linguistic analysis of literature, including more than 25,000 books by thousands of different authors.” What that basically meant was that Smith had created a database of book that could tell you all sorts of information based on the “vividness” of its language and the total amount of words in it. What that meant was that Smith had apparently fed 25,000 books into a database without the knowledge or consent of the original authors, allowing the text to be cataloged and analyzed and broken down into parts, treating the work less like art and more like… carburetors. (No offense to carburetors.)
A version of Prosecraft has been around for years, but this week an enormous backlash swelled up against the site after authors began to notice the staggering amount of data—their data—that Smith had fed into it, seemingly training an algorithm to recognize their work and they kind of language they use so it could pick out the “most vivid” passages and compare them other books. Little Fires Everywhere author Celeste Ng went through it earlier today and counted 20 Stephen King Books, 20 Jodi Picoult books, and plenty of other things, and author Hari Kunzru noticed that Prosecraft was not only taking books without permission but framing it as a service, asking for authors or publishers to get in touch to have their books included (as if the site wasn’t doing that already anyway).
But it only took one day of pissing off the entire literary community for Benji Smith to back down and take Prosecraft offline, but a blog post announcing that decision seems to have raised a lot more red flags than it lowered. In it, Smith explained that he launched Prosecraft to help him write a memoir, figuring that the first step to writing a book was determining how many words are in most books. So he started making a spreadsheet, making a point to say how “precious” it was to him, and from there he expanded to tracking the kinds of words in a book to chart the “emotional ups and downs of any story,” and when he ran out of books he owned, he “used web crawlers to find more books” on the internet.
Smith doesn’t really elaborate on that, but 25,000 books is a lot of books, and most books aren’t uploaded in the entirety on the internet by the author or publisher for free, but Smith said that he figured he was “honoring the spirit of the Fair Use doctrine” by only including “snippets” of the text” and his statistics. And since he wasn’t sharing the text of these books with anyone, just his Prosecraft system, he “believed” he was “in compliance with the relevant laws.”
Finally, the last few paragraphs of a post with dozens of paragraphs, Smith says “I hear your objections” to “the community of authors,” adding, “I hope you’ll accept my sincerest apologies.” He closes his post by saying that he wants to someday “rebuild this library with the consent of authors and publishers,” which writers on social media have taken as an implication that he has not actually deleted his database of books, meaning the data could still be used to hypothetically train A.I. programs to churn out phony books with the same number of words and “vividness” of real writers.
This may seem like a niche situation that impacts one community, but it’s just going to get worse until we as a society stop people and companies from doing stupid shit like this to art and culture. If not, it’s just a matter of time before every movie is a regurgitated series of “vivid” scenes from other movies written by an A.I. “writer” and starring A.I. “actors,” every book is just a repeat of the same plot points from other book, and every news article is just based around a list of words that people like to click on.