All Posts

Visualizing Pathology Data: Colin Megill Interview

Main sections

Can data visualization be used to track the evolution of COVID-19, and help scientists track the spread of disease?

Yes! Nextstrain is a set of open source data tools for sharing virus genome datasets, and enabling that data to be visually analyzed and presented in context.

On March 13, Shirley Wu interviewed Nextstrain co-creator Colin Megill, with the goal of empowering those without a biology background to understand how the tool can be used, and to provide interested open-source developers with the necessary background to get involved. As I recently started contributing to the project, I was extremely interested to learn more.

To make the recording more accessible to new developers (and because I lack video editing skills), I made a React component to skip the breaks + enable viewers to jump to key segments of the raw video. Please reach out on Twitter if there are additional segments that would make sense to annotate.

Getting involved: Links




See the second dropdown for direct links to in-video discussions.

Other COVID Data Visualizations

Personal notes

  • Examine the “molecular clock” segment more closely to understand how sequencing helps estimate how many people were infected
  • Each individual sequenced result is much more valuable when contextualized by metadata from other sequenced results
  • Each tree dot represents a test, which costs ~$1000 + lab technician time.
  • The “Narratives” workflow lets scientists write interactive reports just by modifying a markdown file - this would be a useful workflow for other interactive data exploration tools. To make this integration very simple, most of the application’s configuration related state (filters, layout settings, etc) lives in URL parameters.

Technical notes

  • The playback tool was built with the react-player API.
  • I used the Chrome Invideo extension to search the video transcript for keywords when building the timestamp list.