Making friends with tidy data.

Lisa’s Public Journal – Week Six — Data Management

Tidy Data Cartoon
Mapping the meaning of a dataset to its structure.

Our team has had several meetings where we have touched on different aspects of data management.  We are using GitHUB for development, which has good version controls for the codebase.  That system is self-contained and doesn’t require that we create a system from whole cloth.  However, we have agreed to a version control system for file names which is limited to letters, numbers, and the underscore symbol and which includes the creation date.

A bigger challenge is how to collaborate on content creation asynchronously.  We have settled on using a shared Google spreadsheet, with each tab being a different aspect of the website: Splash page, timeline page, about page, et cetera.  We have also agreed on using Tidy Data* standards for the tracking of all assets.  The Tidy Data standard, when applied to a spreadsheet, uses the structure below:

  1. Every column is a variable.
  2. Every row is an observation.
  3. Every cell is a single value.

We expect to start populating the spreadsheet soon so that our developer has some content to push into our site’s wireframe.  Our expectation is to have the bulk of our research completed by 15 April.  While the spreadsheet will host information about the different data sources, assets we create will be hosted on Wikimedia Commons if a picture or on Soundcloud if an audio file.  In this way, we can extend the life of the assets without the worry of having to pay for their storage.

Our team leader is hosting all our shared Google files on their personal Google account.  Archives are being saved to the Library on our team Commons site.  It will be interesting to see how collaborating in a shared spreadsheet works over time. My biggest fear is that we inadvertently lose information.  However, my hope is that by agreeing to conform to these standards at the beginning of the project, we will avoid our developer losing time managing the uploads during the production process.

*SOURCE: https://vita.had.co.nz/papers/tidy-data.html and https://vita.had.co.nz/papers/tidy-data.pdf.

Leave a comment

Your email address will not be published. Required fields are marked *