Introduction :Establishing a business case for introducing and developing a data quality management program is often predicated on the extent to which data quality issues impact the organization and the return on the investment in data quality improvement. Today, most organizations use data in two ways: transactional/operational use (“running the business”), and analytic use (“improving the business”). When the results of analysis permeate the operational use, the organization can exploit discovered knowledge to optimize along a number of value drive dimensions.
Both usage scenarios rely on high quality information, suggesting the need for processes to ensure that data is of sufficient quality to meet all the business needs. Therefore, it is of great value to any enterprise risk management program to incorporate a program that includes processes for assessing, measuring, reporting, reacting to, and controlling different aspects of risks associated with poor data qualitysummary :While we often resort to specific examples where flawed data has led to business problems, there is frequently real evidence of hard impacts directly associated with poor quality data. Anecdotes help to motivate and raise awareness of data quality as an issue. However, developing a performance management framework that helps to identify, isolate, measure, and improve the value of data within the business contexts requires correlating business impacts with data failures and then characterizing the loss of value that is attributable to poor data quality. This requires some exploration into assembling the business case, namely: • Reviewing the types of risks and costs relating to the use of information, • Considering ways to specify data quality expectations, • Developing processes and tools for clarifying what data quality means, • Defining data validity constraints, • Measuring data quality, and • Reporting and tracking data issues. Given these aspects of measurement, one can materialize a data quality scorecard that measures data quality performance.
conclusion :The objective of designing a business impact hierarchy for data quality issues is two-fold. First, incrementally classifying impacts into small pieces for analysis makes determining how poor data quality impacts our business processes a much more manageable task. Second, the categorical hierarchy of impact areas will naturally map to future performance reporting structure for gauging improvement. As one identifies where poor data quality impacts the business, one also can identify we will also be identifying where data quality improvement will improve the business, and this provides a solid framework for quantifying measurable performance metrics that will eventually be used to craft key data quality performance indicators.
When a new discipline emerges, it usually takes some time and a great deal of academic discussion before concepts and terms become standardized. Text mining is one such new discipline. In a groundbreaking article, Untangling text data mining, Hearst (1999) tackled the problem of clarifying text-mining concepts and terminology. This article is aimed at building on Hearst’s ideas by pointing out some inconsistencies and inaccuracies and suggesting an improved and extended categorization of data-mining and text-mining approaches. Until recently, computer scientists and information system specialists concentrated on the discovery of knowledge from structured, numerical databases and data warehouses. However, much, if not the majority, of available business data are captured in text files that are not overtly structured, for example memoranda and journal articles that are available electronically. Bibliographic databases may contain overtly structured fields, openUP (July 2007) such as author, title, date and publisher, as well as free text, such as an abstract or even full text. The discovery of knowledge from database sources containing free text is called ‘text mining’.
Web mining is a wider field than text mining because the World-Wide Web also contains other elements, such as multimedia and e-commerce data. As the Web continues to expand rapidly, Web mining becomes more and more important (and increasingly difficult). Although text mining and Web mining are two different fields, it must be borne in mind that a great deal of the content on the Web is text-based. ‘It is estimated that 80% of the world’s online content is based on the text’ (Chen 2001:18). Therefore, text mining should also form an important part of Web miningReferences : Albrecht, R. and Merkl, D.
1998. Knowledge discovery in literature data bases. Library and Information Services in Astronomy III. ( ASP conference series , vol.
153.) Online. Available WWW: http://www.stsci.edu/stsci/meetings/lisa3/albrechtr1.html (Accessed 20 August 2002).
Berson, A. and Smith, S.J.
1997. Data warehousing, data mining, and OLAP. New York: McGraw-Hill.
Biggs, M. 2000. Resurgent text-mining technology can greatly increase your firm’s ‘intelligence’ factor. InfoWorld 11(2):52. Chen, H. 2001.
Knowledge management systems: a text mining perspective. Tucson, Arizona: University of Arizona (Knowledge Computing Corporation). Cornford, T. and Smithson, S. 1996.
Project research in information systems: a student’s guide. Houndmills: Macmillan. (Information system series.
) Halliman, C. 2001. Business intelligence using smart techniques: environmental scanning using text mining and competitor analysis using scenarios and manual simulation. Houston, TA: Information Uncover. Han, J. and Kamber, M. 2001. Data mining: concepts and techniques.
San Francisco, CA: Morgan Kaufmann. Hearst, M.A. 1999. Untangling text data mining.
In: Proceedings of ACL’99: the 37 th Annual Meeting of the Association for Computational Linguistics, University of Maryland, June 20–26 (invited paper). Online. Available WWW: http://www.ai.
mit.edu/people/jimmylin/papers/Hearst99a.pdf (Accessed 20 August 2002). Hovy, E. and Lin, C.Y.
1999. Automated text summarization in SUMMARIST. In Mani, I. and Maybury, M.T. (eds.) Advances in automated text summarization.
MIT Press, MA:81–94. Online. Available WWW: http://www.isi.edu /~cyl/ (Accessed 24 June 2003). Kontos, J.
, Malagardi, I., Alexandris, C. and Bouligaraki, M. 2000. Greek verb semantic processing for stock market text mining. In Proceedings of Natural