The Role of Visualization in Cyber Intelligence
This is the first post in our blog series, “Visualizing Cyber Intelligence.”
The weapons have changed, the battlefield is different, but the war rages on. As technology has advanced, the war between cybersecurity professionals and cybercriminals has changed shape every few years. The criminals are becoming more sophisticated, forcing security professionals to continually step up their game.
In the early days, computer worms would attack thousands of systems, and make it to the news headlines simply because of the scale of the attack. They are now passe. Laser-focussed attacks on companies’ confidential data are now the norm. The recent Target data breach is one example where the personal information of about 70 million customers was compromised. In fact, no enterprise is completely secure. Every enterprise has dealt with a data breach at some time. Interestingly, it’s not just the enterprises. Symantec reports 31% of attacks are on companies with less than 250 employees. Businesses today can’t afford to compromise on cyber intelligence to equip their security professionals.
Security professionals have a hard time foreseeing new attacks because of how unique each one is. The amount of data to be analyzed has drastically increased. Not only the volume, but the variety of sources, and data types has made analysis difficult. Big data technologies are making it easier to capture and store data. However, analysis is still a challenge.
Security Visualization is the Dark Horse
Raffael Marty, author of Applied Security Visualization, sees data visualization as the solution to this problem. In his book, he begins by talking about the problems plaguing security visualization today. These visualizations are either the work of designers with no background in security, or of security professionals who don’t understand data visualization. One is beautiful but not effective in getting work done, and the other is effective, but rather clunky.
But what is it about visualizations that makes them ideal to solve the big problem of security? Let’s look at the example of Anscombe’s quartet to understand this. Below are four data series that can be analyzed by any statistics tool.
Now, these data sets have the same mean, variance, regression lines, and error rates. However, plotting them as charts makes their unique patterns become obvious.
You can effortlessly notice the different patterns in each data set. This simple example shows the power of a visual when analyzing data.
Security visualization, or SecViz as Marty calls it, lies at the intersection of four major fields of study: security, statistics, computer science, and data visualization. Security professionals are well-versed in the first three disciplines. However, their level of competence in data visualization can be surprisingly low.
Take, for example, the security visualization examples below.
Left: This happens when trying to use every shiny new feature of the visualization tool, and ignoring the purpose of the visual. Sometimes flashy looks can be such a priority, that it gets in the way of good design.
Right: As is often the case, this dashboard seems like it’s been bolted on to a security tool as a trivial afterthought.
These examples break the elementary rules of data visualization (which we’ll be covering in this series), and as a result, hamper the work of a security professional. These approaches should be avoided, and replaced with a sound understanding of how data visualization works. That’s the goal of this series, “Visualizing Cyber Intelligence.” It will equip you with vital skills in data visualization that you can use daily as you fight cybercrime.
Edward Tufte’s Concept of “Chartjunk”
Let’s begin by looking at one of the core principles behind the work of Edward Tufte, author of a classic data visualization book, The Visual Display of Quantitative Information. In it, he defines a great data visualization as one that conveys the “most ideas, with least ink, in least space, least time.” Many charts use decor, and interactive features that distract the viewer from the actual data. This approach was frowned upon by Tufte, who termed these distractions as “chartjunk.” He devotes a major part of his book to fighting chartjunk.
Tufte suggests an effective way of avoiding chartjunk is to reduce the amount of “non-data ink.” Data ink would include those parts of a chart that represent the data. Non-data ink would be the elements of the chart like textures and patterns, gridlines in the background, 3D enhancements, garish font styles, and the like. He gives the following example of a chart loaded with chartjunk, and then a better representation for the same chart without the chartjunk.
They both plot the same data, but the second chart is a lot easier to analyze. All of us come across charts like this regularly. Going forward, be sure to look out for charts with excessive chartjunk. Consider how you can prune the charts you use on a daily basis to remove chartjunk, and highlight the data it represents.
Tufte’s Sparkline Charts
Tufte is considered a pioneer in the field of data visualization. Perhaps his most significant contribution to the field has been Sparkline charts. Sparklines are tiny, word-sized charts that can be embeded within a paragraph of text. They are a great example of maximizing data ink, and minimizing chartjunk. They’re widely used in stock market dashboards to plot the trend of many stock tickers in limited screen space.
Think about how you can use sparklines to replace bigger charts in your dashboards. They can save you time by reducing clicks within a dashboard, and give you more information quicker.
To conclude, this principle of maximizing data ink, and minimizing non-data ink, can save you from making a lot of elementary mistakes with data visualization.
Stick around for the next post in this series. We’ll be discussing Ben Shneiderman’s “information-seeking mantra.”
Twain Taylor is a guest blogger for Recorded Future. You can find more insight by Twain about the intersection between data visualization and big data on the FusionCharts blog.