WASHINGTON — Language detectives say the key clues to who wrote the anonymous New York Times opinion piece slamming President Donald Trump may not be the odd and glimmering “lodestar,” but the itty-bitty words that people usually read right over: “I,” “of” and “but.”
Experts use a combination of language use, statistics and computer science to help figure out who wrote documents that are anonymous or possibly plagiarized. They’ve even solved crimes and historical mysteries that way. Some call the field forensic linguistics, others call it stylometry or simply doing “author attribution.”
The field is suddenly at center stage after an unidentified “senior administration official” wrote in the Times that he or she was part of a “resistance” movement working from within the administration to curb Trump’s most dangerous impulses.
“My phone has been ringing off the hook with requests to do that analysis and I just don’t have the time,” says Duquesne University computer and language scientist Patrick Juola.
Solving Murders by Examining Language
Robert Leonard, a Hofstra University linguistics professor who has helped solve murders by examining language, says if experts could get the right number of writing samples from officials whose identities are known, “an analysis could certainly be done.”
“Language is a set of choices. What to say, how to say and when to say it,” Juola says. “And there’s a lot of different options.”
One of the favorite techniques of Juola and other experts is to look at what’s called “function words.” These are words people use all the time but that are hard to define because they more provide function than meaning. Some examples are “of,” ”with,” ”the,” ”a,” ”over” and “and.”
Using Words in the Same Frequency
“We all use them but we don’t use them in the same way,” Juola says. “We don’t use them in the same frequency.” Same goes with apostrophes and other punctuation.
For example, do you say “different from” or “different than?” asks computer science and data expert Shlomo Argamon of the Illinois Institute of Technology.
Women tend to use first- and second-person pronouns more — “I,” ”me” and “you” — and more present tense, Argamon says. Men use “the,” ”of,” ”this” and “that” more often, he says.
There’s even a website that is based on Argamon’s research that tries to determine whether a writer is male or female: http://hackerfactor.com/GenderGuesser.php. Argamon calls it just a toy and the site says isn’t perfect. In fact, several female writers at The Associated Press were called male, as was the writer of the Times’ opinion piece
“You look for clues and you try to assess the usefulness of those clues,” Argamon says. But he is less optimistic that the Trump opinion piece case will be cracked for various reasons, including the New York Times’ editing for style and possible efforts to fool language detectives with words that someone else likes to use such as “lodestar.” Mostly, he’s pessimistic because to do a proper comparison, samples from all suspects have to be gathered and have to be similar, such as all opinion columns as opposed to novels, speeches or magazine stories.
Trying to Throw off Investigators With Words
Rachel Greenstadt at Drexel University studies when people try to throw off investigators with words they don’t normally use or purposeful bad spellings. She says her first instinct is that the word “lodestar” — one Vice President Mike Pence has used several times — is “a red herring.” It seems too deliberate.
Greenstadt says language analysis “could kind of contribute to the picture” of who wrote the Times’ opinion pieces, but she adds “by itself, I’d be concerned to use it.”
Still, with the right conditions words matter.
Juola testified in about 15 trials and handled even more cases that never made it to court. His biggest case was in 2013, when a British newspaper got a tip that the book “The Cuckoo’s Calling” by Robert Galbraith was really written by Harry Potter author J.K. Rowling. In about an hour, Juola fed two Rowling books, “The Cuckoo’s Calling” and six other novels into his computer, analyzed the language patterns with four different systems and concluded that Rowling did it.
A couple of days later, Rowling confessed.
It was far from the first time that language use fingered the real culprit. The Unabomber’s brother identified him because of of his distinctive writing style. Field pioneers helped find a kidnapper who used the unique term “devil strip” for the grassy area between the sidewalk and road. The phrase is only used in parts of Ohio.
Words Are Poker Tells
Even in politics, words are poker tells. In 1996, the novel “Primary Colors” about a Clintonesque presidential candidate set Washington abuzz trying to figure out who was the anonymous author. An analysis by a Vassar professor and other work pointed to Newsweek’s Joe Klein and he finally admitted it.
Juola says experts in the field can generally tell introverts from extroverts, men from women, education level, age, location, almost everything but astrological sign.
“The science is very good,” Juola said. “It’s not quite DNA. It’s actually considered by some scientists to be considered the second-most accurate form of forensic identification we have because it is so good.”