Visual Communication for Data Scientists

Svitlana Glibova
Analytics Vidhya
Published in
4 min readJul 18, 2021

--

Photo by Giorgio Tomassetti on Unsplash

One of the most visible roles that a data scientist can play is a liaison between their company and the company’s stakeholders; as the data science field continues to evolve, it is critical to keep in mind both the technical and human elements of collecting, transforming, and presenting data. As technically-minded as one might be, it’s important to remember that all the Python in the world might not be a great way to explain to a stakeholder what our model does, how we arrived at it, the improvements it made, and most importantly, why it’s worth adopting. Below are some important considerations to make when designing and implementing visualizations as well as some examples of ways to use (and avoid using) the more common visualization elements.

According to the Harvard Business Review, data visualization should meet the following three criteria:

  1. It is designed with a clear audience and goal in mind — a good data scientist is not only data- and code- minded, but they are people-minded as well. In order to get the most out of our data, we should have a clear understanding of context and find ways to develop some domain knowledge by answering questions such as what this data represents and at what scale, what the problem that needs to be solved is, and for whom it is being solved. When designing visualizations, the aim is to capture and communicate information to a specific group of people in a relatively short amount of time and it is important to remember that each group of people is unique and requires a unique approach in order to communicate with them as effectively as possible. Just like you don’t use the same language with your siblings as you do with your boss, you shouldn’t expect to communicate with all clients in the exact same way.
  2. It has a clear framework — the messaging of a visualization should be representative of the data it is illustrating — design elements such as lines, icons, and colors are not only stylistic elements but visual tools as well. To leverage these tools, one should consider the way people perceive visualizations with things such as scale, weight, color, placement, and symbolism. Consider the way that information is being highlighted and what the standout features of the visualization are — are they aligned with the concepts that are being presented or are they arbitrary and confusing to the viewer?
    For example, using a heart icon of various sizes may be appropriate for cardiology data but maybe isn’t the best choice for representing a company’s churn. Or presenting a bar plot with two separate elements of a feature may give off the notion that those two elements have a causal relationship when in fact, the goal was simply to consolidate space. Before presenting data, consider the potential byproducts and impressions of your chosen framework.
  3. It communicates a story/narrative — data visualization is “storytelling with a purpose.” Circling back to the concept of a goal, a data presentation and the visualizations it entails should be memorable to its audience by providing the viewer with a clear trajectory — what was the presented problem, how was it addressed, and what were the outcomes? Perhaps the story looks more like an educational tool to inform the viewer of the current state of things, or perhaps it aims to compel the viewer to take action such as adopt a product or restructure their current approach to the problem. Either way, just like in any other form of storytelling, an effective story is easily remembered by its structure and narrative quality so that it may be retold in the future.

Beyond being considerate of the above, here are several tactics to avoid in order to preserve the value of your visualizations:

  1. Intentional misrepresentation of data — no matter the case, the absolute worst thing that you can do is purposefully misrepresent data to the audience. Not only can the potential fallout cast doubt upon your own abilities and values as a scientist, it will reflect poorly on the organization that you are representing, and can compromise the integrity of the technologies being used.
  2. Clutter — unnecessary flourishes, colors, and general visual noise can not only be distracting to the viewer, but may accidentally misrepresent the data that you are presenting. Avoid adding visual elements that do not contribute to the narrative of your presentation.
  3. Bad math — visualizations such as pie charts or parts of a whole object always represent 100%, so remember to make sure that your math checks out and accurately resembles the visualization you have chosen. With this, remember that scale and metrics should be aligned for any element that is being presented in the same visual element.

Being a successful and well-rounded data scientist requires wearing many hats and harnessing the power of a thorough, clear, and thoughtful narrative is just as important as having an arsenal of algorithms — if we can’t communicate what is important, then how would we convince others that our tools are useful in the first place?

--

--

Svitlana Glibova
Analytics Vidhya

Python Engineer at Mantium | Developer Relations | Data Science | | B.S. in Mathematics | Former Certified Sommelier | Seattle, WA