The information overload problem stems from thecomputerization of information (information age) gotten from numerous datasources: Health, Insurance, Finance, Education, Security to name a few. Simplyput, information (data) gotten from everyday activity is of value to decisionmakers, analysts and individuals. Recently, we are faced with more informationthat we can handle.
In the face of this information flood, we become lessinformed; this is because the amount of data produced and stored has become increasinglygreater than our ability to extract meaningful information. According toEndsley (2000) there seems to exist an information gap as more data does notnecessarily equal more information. To combat (confront) the informationoverload problem, there is need to build systems which harness the incredibleperceptual and cognitive abilities of the human with the computational powersof modern computers.
However, the information overload problem is notinevitable. One way to support this is through the use of visual analytics. The basic idea of visual analytics is to visually representinformation, allowing the human to directly interact with the information, togain insight, draw conclusions, and ultimately make better decisions. involvingthe human in the loop as opposed to fully automatic techniques reduces the riskof errors, bias(es), improves productivity and user acceptance (Endsley -Designing for situation awareness). The use of appropriate visualrepresentations and metaphors to present information best supports the humancognitive process as it reduces complex cognitive work needed to performcertain tasks. Visual analytics is more than only visualization, rather can beseen as an integral approach combining visualization, human-computer interactionand data analysis. Visual analytics tools and techniques are used to deriveinsight from massive, dynamic, and often conflicting datasets by providingtimely,defensible, and comprehensible assessments.
For better informeddecisions, it is essential to include humans in the data analysis process tocombine flexibility, creativity,and domain knowledge with the enormous storagecapacity and the computational power of today’s computers. In general, visual analytics is defined as “the science ofanalytical reasoning facilitated by interactive visual interfaces”. To be moreprecise, visual analytics is an iterative process that involves informationgathering, data-preprocessing, knowledge representation, interaction anddecision making. The ultimate goal is to gain insight in the problem at handwhich is described by vast amounts of scientific, forensic or business datafrom heterogeneous sources. To reach this goal, visual analytics combines thestrengths of machines with those of humans. Visual analytics supports theseprocesses through its unification and integration of previously independentapproaches of visualization, statistics and data mining to solve big datachallenges in a holistic manner. The main idea of visual analytics is to createsoftware that facilitates the analytical reasoning process by leveraging thehuman capacity to perceive, understand and reason about complex information andevents.
The earliest known research following this idea was the Research Agendafor Visual Analytics – “Illuminating the Path” (Thomas, 2005) in thewake of the 9/11 terrorist attacks focusing on US Homeland Security. Sincethen, the growth of visual analytics applications have been extended to solvecomplex big data issues in various fields of health, government, astrophysics,cyber-security, education, transportation, business and finance to name a few. inputfigure of scope of visual analytics hereThe scope of visual analytics (Fig 1) combines approachesfrom information and scientific visualization, data management, statisticaldata mining and automated analysis, knowledge discovery and human factors whichaid in the communication between human and computer – as well as the decision makingprocesses. Visual Data Mining: An Introduction and Overview (PDFDownload Available). Available at accessed Jan 12 2018. section{Data Preparation}In the modern world we live in, we are faced withincreasingly rapid amounts of data. This data is mostly stored in its raw statewithout filtering or cleaning which is unsuitable and erroneous for the tasksahead. Cleaning data implies we have dirty data.
Data often comes in a varietyof forms that are difficult to work with i.e: incomplete data, various formats,duplicated rows. Using data without cleaning poses risks of data quality anduncertainty as the data is inconsistent and messy. R.
A. Fisherand C.R. Rao also stress about cross-examination of data as an important stagebefore beginning with analysis tasks. When dealing with incomplete data novelmethods are available which perform validation checks and advanced analyticaltechniques to help outline the quality of data. Considering that the plot ofthe tasks ahead is drawn from this data, it is important for an analyst tocheck these steps. These methods are needed as they help spot missing values indata, replacement of missing data (Svolba, 2015) Bad data quality is an enormous blow for an analyst. If data isconsidered unfit for further processing (often about data completeness status)it causes a postponement or cancellation of the project.
It also causes lack oftrust and high level of uncertainty in analysis results. Data quality can beimproved using analytical and statistical methods. However, providing valuesfor missing data can often hide important facts such as intentionally omitteddata or indicate faulty sensors. Tools Openrefine As a result, analystsmay be able to determine more easily when expected information is missing;sometimes the fact that information is missing offers important clues in theassessment of the situation There seems to be anoverlap between the fields in the visual analytics scope thereby enablingseamless integration of infrastructure section{DataMining} Dealing with largescales of data that more often than not exceeds the limits of what can bedisplayed on standard conventional desktop displays, to avoid issues such ascluttered displays or workspaces, data mining techniques such as filtering,aggregation, principal component analysis and reduction methods are used toreduce or compress the amount of data viewed as only a small portion can bedisplayed. Scalability in general is a key challenge in visual analytics as itdecides not only the computational techniques and algorithms but also theappropriate rendering techniques. Visual Analytics is tasked with providing anoverview of datasets, while maximizing the amounts of details at the same timeto gain insights (Keim – challenges in visual data analysis).
Using wall-sizeddisplays, it is possible to compare several hundreds of these growth matrices,However, some datasets like those of Facebook, Google, IBM and various othersectors contain billions of rows of records which cannot be viewed by amagnitude of several wall displays. Having no way to adequately explore theselarge datasets() which have been collected due to their potential usefulness,the data becomes useless and the databases become data “dumps” section{VisualizationTechniques} Visual analyticsaims at integrating the user in the exploration process by leveraging ourperceptual and cognitive abilities to exploring large datasets. Visualrepresentations translate data into various visual forms that highlight featuresin the data such as anomalies and commonalities and supports user interactionto be able to analyze and understand these data trends. The goal is to presentthe user with appropriate visual representations or metaphors of the data thatclosely match the information being represented thereby allowing for insightand hypotheses generation to aid the sensemaking process.
An important processis not just analyzing data using different algorithms and combinations but alsointerpreting which visualization components are best suited to analyze aparticular dataset subsection{Geo-spatial Analysis} This data type is in reference to movement and/orpositioning of of objects on a map or chart. These data sources includegeographical measurements, GPS data and remote tracking sensors which consistsof two dimensions; longitude and latitude plotted using x-y coordinates on amap or chart. These types of data help to create a sense of spatial andsituational awareness. Scalability also poses a risk as the number of data pointsoften to be visualized causes cluttered views. To deal with the threat of bigdata, data points are measured and aggregated as units and then depicted bytheir density (encoded with colour or size) subsection{Temporal Analysis}Temporal analysis seeks methods to exploit the temporalnature of real world data to help with the identification of patterns, trendsand correlations of data elements over time.
These data are animated againsttime to provide a narrative of a sequence of events and show how certainelements evolve over time. As temporal related data is a function of time, we facecomplexities of scale as often we may wish to look for trends during hourly,daily, monthly and others that occur on a yearly basis. subsection{Network Analysis}This set of data consists objects, called actors or nodesand connections between these called edges, which model real-life simulations.This type of data is intended to show relationships between entities and oftendisplayed in an hierarchal format. Examples range from electronic power gridconnections, e-mail exchanges, social network communications, transportationnetworks, customer shopping behavior.
Basic measures of density, centrality andproximity help with the discovery of interesting insights in the data and canbe used to make inferences. subsection{Text Analysis}Textual data consists of documents, multimedia web contents and hypertext. Textual data typesdiffer from most as they cannot be easily depicted as numbers and thereforemost standard visualization techniques cannot be applied . The use of wordclouds which help identify keywords,document snippets etc. are examples of ways to visualize information retrievalof text-based techniques. section{Interactive Techniques}The path to exploratory information discovery was expressedby Schneiderman(): Overview first, zoom and filter, details on demand.
Theseactions can also be described as the key framework of information foragingactions. Visual representations alone do not satisfy the users analyticalneeds. Interaction techniques are required to support the dialogue between theuser and the data as it reveals insightful information, for instance by zoomingin on particular subsets of the data or considering a change in the underlyingvisual metaphor. These interactive tools provide a mechanism of communicationamongst users and the visualization systems.
Common interactive tools forvisual analytics applications include: section{Sensemaking – Simon Attfield} Sensemaking is the cognitive process of developinginterpretations of the world. The use of interactive visualizations in visualanalytics is of importance in enabling the sensemaking process as userscontinually interact with informations systems to develop a mental model orpicture of a problem domain or activity. Klein (1999) developed aclassification of the human sensemaking process into two separate parts;Naturalistic and Normative.
subsection*{Naturalistic Sensemaking}This operates when the sensemaker reverses the orderperceived from one or more consequents and then in turn infers a possibleoutcome. The significance of this process is in the ability to draw inferencesfrom limited information. Observing certain visualizations can enable usersmake predictions or hypotheses based on prior knowledge. Naturalisticsensemaking is subject to false interpretations and biases. subsection*{Normative Sensemaking}N/A paragraph{}The power of the sensemaking process comes from devisingexternal aids that enhance our cognitive abilities. Visual analyticsfacilitates the reasoning process process by visually representing information and allowing human interactiondirectly with these representations to gain insights and conclusions that willultimately lead to better decision making.
Pirolli and Card (1995) explain thatthe sensemaking process takes place in two loops; information foraging andsensemaking loop. The information foraging loop is the process of manipulationand transformation of data to reveal insights whilst the sensemaking loopinvolves the reviewing and organization of these insights generated in theinformation foraging loop for effective communication and action. During dataanalysis tasks, analyst engage in derivation and confirmation of hypotheses byinteractively exploring data using various techniques listed above. Nowadays,the modern challenge is not the acquiring and analyzing of data to derive newknowledge but rather understanding and analyzing the results of our analyses.
(MGladwell, 2009 ) The visualization pipeline model and visual analytics processmodel focus on exploring and gaining insight into data. However, little supportis offered by visual analytics systems to capture findings (into evidencefiles), organize these findings (into schemas), construct arguments to validatehypotheses, and present these