In the Information Age, it's not how much information is maintained but how it is managed, manipulated, and exploited that can make or break a firm. Data mining is the practice of ferreting out useful knowledge from the wealth of information stored in computer systems, databases, communications records, financial and sales data, and other sources. A staple in the so-called Information Economy, data mining has evolved into a standard—and often requisite—business practice, and is often as valuable to firms as their underlying products or services. With competition heating up and making use of the mountains of new information technologies, those most able to exploit data mining to derive insights for use in a business model or strategy are often those with a competitive edge.
Data mining combines expertise in data analysis with sophisticated pattern-searching software to crunch diverse mountains of data and churn out information designed to capture market share and boost profit margins. As the sheer wealth of information available escalated through the 1990s and early 2000s, such techniques assumed paramount importance. The focus of data mining is on organizing data and identifying patterns that translate into new understandings and viable predictions. Companies thus try to use data mining to discover relationships between data and phenomena that ordinary operations and routine analysis would otherwise overlook, and thereby identify squandered opportunity, redundancy, and waste.
Data mining combines features of various disciplines, particularly computer science, database management, and statistics, to map low-level data into more advanced and meaningful forms. In its truest form, data mining is part of the broader knowledge discovery from data (KDD) process, although the terms are often used interchangeably. KDD refers to the entire process of data warehousing, organization, cleansing, analysis, and interpretation. Colloquially, however, data mining stands for this entire process of deriving useful knowledge, using computational systems, from massive amounts of data.
Data-mining software systems are generally based on a combination of mathematical algorithms designed to seek out and organize information by variables and relationships. For instance, one common algorithm is called recursive partitioning regression (RPR). RPR processes all the variables chosen for a particular set of data and parses them for their explanatory power, that is, for the degree to which they account for variations in the data. In sifting through customer profiles, for example, the algorithm would isolate information such as personal incomes, education levels, sex, and so on.
The data-mining process is divided into three stages: data preparation, data processing, and data analysis. In the first stage, the data to be mined is selected and cleared of superfluous elements in order to streamline mining. In the second stage, the data is run through the algorithms at the heart of data mining, and characteristics and variables are identified and categorized, thereby transforming the data into broader, more meaningful pieces of information. In the final stage, the extracted information is analyzed for useful knowledge that can be applied to a business strategy.
Data-mining software was first developed in the late 1960s and 1970s as a way of tracking consumer-purchasing habits. Over the years, the application of data mining extended beyond retail to encompass larger-scale business practices, and was combined with advances in database management, artificial intelligence, computers, and telecommunications to constitute extremely powerful tools for knowledge extraction.
Traditionally, data mining was used primarily for categorized information; in other words, techniques and tools were designed to find relationships and patterns in masses of data that were already segmented into different categories via structured databases, such as a customer's age and residence. Later techniques greatly expanded the power of data mining by allowing for mining of unstructured text documents, such as e-mails, customer requests, and Web pages. In this way, data mining applies structure to loosely organized data, and highlights valuable information that might otherwise be missed. Moreover, this allows for the relevant extraction of information from documents that were assembled for any purpose, rather than specifically for the issue at hand, thereby increasing the efficiency of data flow and preventing the waste of potentially valuable information. This technique, known as text mining, creates a database of words that can be categorized and a sophisticated search engine to seek out those words and related alternatives.
Many times the first step toward data mining is building a data warehouse, or a vast electronic database to contain and organize the wealth of information collected. Without a data warehouse, companies lack the infrastructure to mine useful knowledge out of the data available. Like word processing programs and computer operating systems, data mining has grown more user-friendly and graphics-based as its application has spreads throughout society to less technically inclined users. Software programs increasingly feature visualization techniques to dramatize specified data, relationships, and patterns.
Data mining has become a crucial component of customer management. The most common form of data mining begins with the accumulation of various kinds of customer profiles. These can take the form of simple names and addresses derived from other firms' customer lists and used for purposes of mass mailing, or they can constitute more sophisticated and comprehensive reports on consumer tastes and buying habits. Over time, firms amass great quantities of customer profiles through their own sales and through arrangements with other firms, and apply data-mining techniques to sift through them for clues as to how to adjust their strategies.
Whether to attract, service, or maintain customers, businesses position data mining at the cornerstone of customer relations. Using advanced data mining techniques, companies can determine what level of spending can be expected from a particular customer, the range of his or her tastes, the customer's likeliness to churn, and a range of other information useful for customer relations. In these ways, companies are better able to assess the value of its individual customers, and adjust its resources accordingly. More broadly, they can derive comprehensive information on demographic patterns, like distinctions in purchasing patterns between age groups, income levels, and ethnic backgrounds, to discover additional retention and cross-selling possibilities. In this way they can segment their customer bases into specialized marketing focuses. By shifting outreach, advertising, and service resources to effectively capitalize on their diverse clientele, firms can realize cost savings, better conversion rates, and higher margins.
In the e-commerce world, data mining carries an additional range of benefits. In particular, as e-commerce merchants worked to create the maximum amount of value out of what the Web has to offer, they moved to personalize products and services. The extraction of personal information allowed by data mining greatly facilitated this process. By plugging data-mining analysis into customer-service databases and their Web applications, companies can tailor products and services to accord with individual customers' habits and preferences, thereby maximizing value.
Companies use such technology to mine data from within their own ranks as well. Company computer systems and intranets were increasingly searched as a method of retrieving information on key subjects that may have passed between employees at an earlier date, via email transmissions, word processor files, and Web page searches. In addition to harnessing the knowledge buried in these communications, sifting software can also be used to evaluate employees' strengths and weaknesses over a period of time for a comprehensive assessment of the employee's performance. However, while such techniques are attractive to companies, they make privacy advocates nervous with their implications for the retrieval of personal communications and their possible review out of context.
The software industry responsible for data-mining programs was enjoying solid sales growth, which was expected to remain brisk in the early 2000s. The market research firm International Data Corp. estimated that the market for analytic application software would grow from $1.9 billion worldwide in 1999 to $5.2 billion in 2003, while specifically data mining applications from $343 million in 1999 to $1.4 billion in 2004.
Meanwhile, more and more of the world's leading businesses were implementing data mining into their core operations in one way or another. Companies may perform data mining and analysis internally or outsource the job to the growing number of data mining solutions providers. Forrester Research reported that the percentage of Fortune 1,000 firms that planned to incorporate data mining into their marketing strategies grew from 18 percent in 1999 to 52 percent in 2001. Forrester's findings also indicated that the most successful applications of data mining were realized by those firms that most thoroughly embedded data mining into their daily operations.
FURTHER READING:
Cahlink, George. "Data Mining Taps the Trends." Government Executive, October, 2000.
Drew, James H., D.R. Mani, Andrew L. Betz, and Piew Datta. "Targeting Customers with Statistical and Data-Mining Techniques." Journal of Service Research, February, 2001.
Fielden, Tim. "Text-Mining Promises to Cull Answers from Random Text." InfoWorld, October 16, 2001.
Le Beau, Christina. "Mountains to Mine." American Demographics, August, 2000.
Lesser, Eric, David Mundel, and Charles Wiecha. "Managing Customer Knowledge." Journal of Business Strategy, November/December, 2000.
Liddy, Elizabeth D. "Text Mining." Bulletin of the American Society for Information Science, October/November, 2000.
Masi, C.G. "Data Mining Can Tame Mountains of Information." Research & Development, November, 2000.
Murphy, Victoria. "You've Got Expertise." Forbes, February 5, 2001.
Ruquest, Mark E. "Planning is Key to Exploiting Technical Data." National Underwriter, November 27, 2000.
Sullivan, Tom. "Picture This: Data Analysis Becomes More Graphic." InfoWorld, October 16, 2000.
User Comments Add a comment…