Storytelling With Data: A Data Visualization Gu...
This is an interactive map inspired by the Anti-Eviction Mapping Project, a data-visualization and digital storytelling collective working to document the housing crisis in the San Francisco Bay Area in California.
Storytelling with Data: A Data Visualization Gu...
Abstract:In our digital age, data are generated constantly from public and private sources, social media platforms, and the Internet of Things. A significant portion of this information comes in the form of unstructured images and videos, such as the 95 million daily photos and videos shared on Instagram and the 136 billion images available on Google Images. Despite advances in image processing and analytics, the current state of the art lacks effective methods for discovering, linking, and comprehending image data. Consider, for instance, the images from a crime scene that hold critical information for a police investigation. Currently, no system can interactively generate a comprehensive narrative of events from the incident to the conclusion of the investigation. To address this gap in research, we have conducted a thorough systematic literature review of existing methods, from labeling and captioning to extraction, enrichment, and transforming image data into contextualized information and knowledge. Our review has led us to propose the vision of storytelling with image data, an innovative framework designed to address fundamental challenges in image data comprehension. In particular, we focus on the research problem of understanding image data in general and, specifically, curating, summarizing, linking, and presenting large amounts of image data in a digestible manner to users. In this context, storytelling serves as an appropriate metaphor, as it can capture and depict the narratives and insights locked within the relationships among data stored across different islands. Additionally, a story can be subjective and told from various perspectives, ranging from a highly abstract narrative to a highly detailed one.Keywords: image processing and analytics; labeling; captioning; extraction; enrichment; contextualized information and knowledge; storytelling with image data; curating; summarizing
In 2017, I began compiling a three-part guide, intended for beginners with next to no technical knowledge, onhow to create the sorts of data-driven, visual stories that we publish at The Pudding. This was going to consistof introductions and surveys of dataanalysis,design, and writing, and while Ithought that the first twosections would be all that our readers really wanted to see, we've consistently received word that thereremains a substantialinterest in the writing guide.I'd tried to compile this third section, which had maintained its "On Writing" heading in my mind, with littlesuccess, until I realized that the impasse lay in the fact that I was fixated on precisely that: writing.Compiling resources on how to write within the form of a visual essay while omitting the discussion ofstorytelling, in both visual and written terms, as well as the importance of story structure, is of nextto nohelp. Thus, I decided to retroactively make good and compile a primer on storytelling with data.
That I worked on this outside of any sort of journalistic relationship with a larger outlet meant that I alsohad the opportunity to learn about many tools at my leisure: front-end development (JavaScript, HTML, and CSS,as well as web technologies like Amazon's cloud computing servers), data analysis in R, geospatial analysisusing QGIS, typography, etc. (you can scroll down to the bottom of the piece for an in-depth explanation of thetools I used).
From the standpoint of story structure, building for others also prevents your story from meandering too much,and helps you prioritize working on the key takeaways. In this scenario, you've got a limited amount of time todo good work, and need to ensure that you use it as effectively as possible to create the best possible story;the bells and whistles are secondary. If you're a journalist, you may want to create a magnum opus thatencompasses myriad data sources, coupled with video interviews and shoe-leather reporting about the opioidepidemic in America, but if you're working on a deadline and your editor needs you to crunch some numbers aboutthe geographic concentrations of overdoses in your state, you'll have to prioritize those components and publishthe story on deadline.
Adopting a "minimum viable productfirst, everything else later" conception of storytelling withdata is doubly useful: honing the core skills used in your work will allow you to fulfill the basic requirementsof each project faster with every piece that you complete. This will afford you more time to iterate on thestoryif it doesn't quite come together on first attempt (check out the bottom of our guide to design for someexamples of iterating on visual work), and will provide you greater room to experiment; it's this time that youcan then use towards truly novel and innovative storytelling, be it in the form of analyses or visualrepresentation of data.
These sorts of explorations are essential to getting a good sense of data, but should be thought of as no morethan that:exploratory forays into a complicated topic, which help you situate yourself within the data set and give you asense of some of the trends and relationships contained therein. The output of this exercise is not a story, butdata and insights that you can then incorporate into the story you'll create.
Let's say that the world has returned to a more sensible place, and you're at a long-overdue evening of drinkswith friends. For weeks, you've been busy working on your data-driven project. When your friend asks you whatyou've been up to, there are generally two types of responses: In the first, you clearly and succinctly explainwhat you've done and what you've found. In the second, you take a beat, and issue a "well, it's kind ofcomplicated, but" disclaimer before you begin.
If, however, the key insights you've unearthed through your data analyses are somewhat more difficult tocommunicate, and require some contextualization, there's another approach that often helps readers getacquainted with the necessary background.
In visual storytelling, we tend to omit a focus on individual experience, but we nevertheless need a way tosubtly provide sufficient context, while drawing reader interest to the work. The simplest way to translatecharacter-driven prose narratives to the requirements of a visual story is to begin the story with a single datapoint. In our project about retraining workers from jobs that are likely to disappear due to automation, JordanDworkin and I began the story by discussing truckdrivers, and which careers were most suitable to them,considering the skills they already possessed. Rather than discussing the idea of retraining in a more abstractsense, giving readers a specific data point to follow made it much easier to understand the larger point that wesought to make.
While it's important for your readers to understand the rough method you used to arrive at your conclusions asthey're readingyour work, it's often helpful to provide an additional, more detailed look at your process and reasoning.Readers are often sufficiently savvy to consider our work critically, and those that may have reservations aboutour findings will have more faith in our analyses if they have a clear sense of where we sourced our data andhow we crunched the numbers. Most of our projects contain a Method section following the conclusion to deal withthesequestions, where we link to data sources (when available) and explain the calculations we performed in order toarrive at our conclusions.
Brain imaging techniques have become important tools in the research of chronic pain. They permit the in vivo study of complex brain structures: its variations between groups of people and within an individual over time. Although narratives explaining brain imaging exist, their focus lie on the mechanisms behind the techniques. Few visualizations communicate their clinical applications, especially in neuropathic pain.
In addition, within the fields of brain imaging, such as white matter tractography, there exist standardized and accepted methods of data visualization. Advances in imaging technology, however, are creating an increase in the resolution and density of data, resulting in increased visualization complexity and excessive clutter. Communication of depth and spatial relationships within the brain becomes difficult, and the ability to visualize important anatomical and functional structures diminishes.
The project aims to serve two purposes: 1) to develop an educational animation communicating the significance of clinical research employing brain imaging techniques to study neuropathic pain, and 2) to expand upon established visualizations of brain imaging data to help improve communication while maintaining data fidelity.
Analytics techniques include model fitting, statistical methods, visualization, and data storytelling. This course will include structured programming with the R language, statistical computing, the use of models to make forecasts, data formatting, cleaning and manipulation of data, solving statistical and time series equations, building predictive models, utilizing graphical applications, and applying applicable machine learning methods and models. It is recommended that students know multivariate calculus, linear algebra, probability, and statistics at the undergraduate level. 3 credits.
Geographic Information Systems (GIS) are used as tools for describing, analyzing, managing, and presenting information about the relationships between geographical and spatial locations, sizes, and shapes. This is known as attribute data. GIS uses techniques that can represent social and environmental data as a map, with a significant number of applications including those in engineering, architecture, public health, environmental science, and business. GIS data will be created through a variety of methods including those offered by global positioning system (GPS) technologies. This course will assume knowledge of R and Python. 3 credits. 041b061a72