Text and Data Mining (TDM) was initially defined as “the discovery by computer of new, previously unknown information, by automatically extracting and relating information from different (…) resources, to reveal otherwise hidden meanings” (Hearst, 1999), in other words, “an exploratory data analysis that leads to the discovery of heretofore unknown information, or to answers for questions for which the answer is not currently known” (Hearst, 1999).
Today's information is growing at an incredible rate to the point and it has vast potential in various economic fields.
TDM algorithms harbour vast potential for nearly all scientific fields and for a wide diversity of practical industrial and societal applications. However, this broad potential and variety of TDM applications complicates the landscape view, as even the technology itself is referred to using different terms within different fields. Within the business and managerial context, TDM can be referred to as Business Intelligence Solutions or Qualitative Data Analysis.
To highlight the central position of vast quantities of data, the term Big Data (Processing) is used. When the focus is on the discovery of hitherto undiscovered information, Content Mining appears to be a term of choice, while Data Analytics and Text Analytics emphasise the data-driven aspect of performing analyses and a focus on seeking analytical solutions to challenges. In the academic world, TDM is often considered to be composed of Machine Learning methods coupled with Exploratory Data Analysis methods such as visualisation. Machine Learning, in turn, arose from the post-war fields of Artificial Intelligence, Information Theory, and Pattern Recognition.