Interesting data

Scholarship info



PhD Position Open for International Students at Nislabs, Norway

Computational approaches to authorship analysis usually focus on structural characteristics and linguistics patterns in a body of text (DeVel et al. 2001). While these approaches provide some forensic capabilities, there remains an urgent need for some way to discern the ideas and intentions conveyed in the text and use those qualities to help determine authorship (Stokar and Franke 2008). Recently developed text analysis techniques may offer a feasible way to automatically compute a semantic representation of text. Generative probabilistic models of text corpora, such as Latent Dirichlet Allocation, use mixtures of probabilistic “topics” to represent the semantic structure underlying a document (Blei et al. 2003). Each topic is a probability distribution over words and the gist or theme of a document is represented as a probability distribution over those topics. Studies suggest that topic models give a better account of the properties of human semantic memory than latent semantic analysis models, which represent each word as a single point in a semantic space (Griffiths et al. 2007). When considering how to use this capability for authorship analysis, it is important to recognise that many factors influence the ideas present in a document, even when that document has just a single author. In particular, factors related to social identity, e.g., age, gender, ideology, beliefs, etc., play an important role in communication behaviours. If the attributes of these factors could be teased out of a document, they might provide a valuable “fingerprint” facilitating author analysis. With the proposed research we aim to extend our previous qualitative research (Stokar and Franke 2008). We will study a computational approach to extracting social identity group fingerprints and to further adopt it to forensic author attribution.

The proposed research will study how social identity group “fingerprints” can be extracted from a document collection and qualitative author attribution can be automatically performed in order to identify a writer/distributor of a questioned document. The research involves partners from forensic investigations services, i.e. Økokrim in Oslo (Dr. Thomas Walmann) and the Netherlands Forensic Institute (NFI) in The Hague (Dr. Cor Veenman).