January 2, 2010 – 8:18 pm | No Comment

It’s been a slow couple of weeks here because we moved house – but I hope you’ll enjoy this pick of our content this week and let me know if there’s anything you’d like to …

Read the full story »
Forensic Linguistics

Practical application

Theory and structure

sociolinguistics

niche language

Home » Featured, Forensic Linguistics

What Is a Corpus and Why Should I Care?

Submitted by on December 7, 2009 – 7:18 pm3 Comments

Key to Forensic Linguistics is the idea that there’s an identifiable set of words in everyone’s language – and those identifiable features are basically unique to ourselves.
An example is that I spell certain words wrong, and reverse several letters – My i’s and my e’s are always the wrong way round, so I have to spell check before posting.  But, if you see information that I’ve posted ‘on the fly’, you may find that I’ve spelled because ‘becuase’ or their ‘thier’ among other things.
You might say that it’s simply a spelling mistake, and a very common one at that, but if you identify that as an element of someone’s written style, and they choose not to correct via spell-checking, you can sometimes identify people by simply that.

Other ways include using substituted words – mixed up words with similar definitions, or just completely the opposite words.  That’s a basic idea anyway ;)

Corpus = the internal dictionary we all use?

In some ways, you could consider the corpus as your internal dictionary.  Each of us should have a unique one, or at least identifiably unique features in our corpus.

A more accepted definition of corpus is one of a wider context – a body of texts that make up a sample of the language that it’s supposed to represent, or similar.  But I believe each writer has their own body of work, and therefore, their own comparable ‘corpus’ in some ways.
My first paper on the concept is coming soon, but hopefully this basic definition will help ;)

3 Comments »

Leave a comment!

Add your comment below, or trackback from your own site. You can also subscribe to these comments via RSS.

Be nice. Keep it clean. Stay on topic. No spam.

You can use these tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

This is a Gravatar-enabled weblog. To get your own globally-recognized-avatar, please register at Gravatar.

CommentLuv badge