Texts as networks: How many words are sufficient to identify an author?

phys.org | 11/30/2016 | Staff
bethtetleybethtetley (Posted by) Level 4
Click For Photo: https://3c1703fe8d.site.internapcdn.net/newman/gfx/news/hires/2019/textslikenet.jpg

People are more original than they think—this is suggested by a literary text analysis method of stylometry proposed by scientists from the Institute of Nuclear Physics Polish Academy of Sciences. The author's individuality can be seen in the connections between no more than a dozen words in an English text. It turns out that in Slavic languages, authorship identification requires even fewer words, and is more certain.

The researchers sought a solution to the problem of verifying the authorship of historical texts known only from fragments, the identification of plagiarism, and similar problems. In many cases, traditional stylometric methods fail or do not lead to sufficiently reliable conclusions. In Information Sciences, scientists from the Institute of Nuclear Physics of the Polish Academy of Sciences (IFJ PAN) in Cracow now present their own statistical tool for stylometric analysis. Constructed with the use of graphs, it analyzes the structure of texts in a qualitatively new way.

Conclusions - Research - Hand - Individuality - Person

"The conclusions of our research are, on the one hand, encouraging. They indicate that the individuality of any person manifests itself clearly in the way they use a surprisingly small number of words. But there is also a dark side. Since it turns out that people are so original, it will be easier to identify individuals by their statements," says Professor Stanislaw Drozdz of Cracow University of Technology.

Stylometry, the science dealing with the statistical characteristics of the style of texts, is based on the observation that each person uses the same language in slightly different ways. Some have a broader vocabulary, others narrower, some prefer certain phrases and make mistakes, others avoid repetition and are linguistic purists. And in written text, they also differ in the way they use punctuation. In the typical stylometric approach, the basic features of a text are usually examined, including the frequency of...
(Excerpt) Read more at: phys.org
Wake Up To Breaking News!
Sign In or Register to comment.

Welcome to Long Room!

Where The World Finds Its News!