Research Guides: Introduction to Text Mining: Home

Text Mining Overview

What is Text Mining?

Text mining is the process of using specific data mining methods to search textual data for patterns and connections that can lead to actionable insights. There are several commercial and open-source text mining tools and applications available on the web. This guide will focus mainly on text mining using the R statistical programming language. Tools like programming languages are especially useful in working with large bodies of text, commonly referred to as a corpus

Why Mine Text?

Throughout human history, information has mainly been recorded in the form of text. The imagery of mining, distilling, or refining is often used to illustrate text mining processes. The general idea behind this imagery is that large amounts of data are intrinsically meaningless until some kind of action has been performed on the data to give it value. Data mining, like mining for precious minerals, is the act of sifting through a lot of unnecessary material to find the valuable parts. In the case of text mining, the tremendous amount of information present in the ever-expanding amounts of text being generated every day presents a huge challenge for researchers and to sift through using traditional knowledge-seeking methods. Text mining enables us to work with huge amounts of text at once while pulling the desired information from a corpus without having to sift through the text by hand. This speeds up the process exponentially, making it possible for miners to analyze textual data on a scale previously impossible.

How is Text Mining Used?

Text mining methods are often used in conjunction with machine learning processes to analyze large volumes of text. These techniques can be used to improve every-day conveniences like spam filters and recommendation systems. Businesses use text mining techniques to inform their decision-making processes as well as optimize their customer service platforms. Law enforcement and intelligence agencies use text mining applications in the prevention of cybercrime.