Data visualization is the process of representing data visually. These representations typically take the form of graphs or charts which are used by humans to interpret and draw useful insights from data. While the practice of representing data visually has been performed for centuries, the technological advances of the late twentieth and early Twenty-First Centuries have yielded a large assortment of powerful programming tools and software applications which make visualizing data easier, less time consuming, and more effective.
This TED Talk is a great introduction to how visualization can be used for practical purposes.
Charts and graphs can make it easier for humans to interpret large quantities of information. This allows us to draw meaningful insights from data much more quickly and with less effort than by scrolling through tables of data. Data visualization makes it possible for us to perceive trends in data as well as to understand how data are related.
Data can be visualized through the use of programming languages and software applications.
Programming languages are powerful tools that allow users to control virtually every aspect of their visualizations. That control comes at a price, as programming languages require users to understand at least the fundamentals of computer programming in addition to the specialized semantics and techniques unique to each separate language used to create visualizations.
Software applications, on the other hand, come with a much lower barrier to entry, usually employing drag-and-drop and point-and-click graphical user interfaces which do not require specialized knowledge to be used effectively. Some visualization software is also quite powerful and can be used to quickly and easily create complex visualizations. Users should note, however, that visualization software applications come with predefined possibilities which will restrict the level of detail that can be influenced.
It is possible to create visualizations which obscure clarity and ultimately weaken a data scientist's ability to communicate insights drawn from data. Bad visualizations will use the wrong type of visualization to convey information when a better alternative might be used, include too many variables or data types, or use the wrong shapes and colors
It is also important to keep in mind the intended audience of the visualization. When making visualizations for people who are unfamiliar with the source data or subject material data scientists should always aim to create effective visuals that can be easily interpreted. This can be achieved with the right combination of aesthetic elements. The elements used will depend on the data being visualized and the purpose of the visualization.
Aesthetic Elements to Consider:
Examples: This example uses the popular "Iris" training data set, which contains data on the measurements of three different species of iris flowers. Below is a scatter plot generated using the ggplot2 package in the R programming language
The visualization reveals a clear trend between petal length and petal width of the iris flowers. Generally, longer petals are also wider than shorter petals. Below is an improved visualization which colors the data points by species.
As we can see, assigning a color to each species reveals that different species are more likely to have longer and wider petals than others. This is just one simple example of how effective visualization techniques can lead to new and significant insights from data.