View on GitHub

Wrangling and Versioning with OpenRefine and GitHub

Materials for workshop 2A of the Canadian Data Curation Forum: Data Wrangling and Versioning with OpenRefine and Github

Introduction

Pre-event prepartion

Step 0.1: Install git version control software

Step 0.2: Install and configure GitHub Desktop

Step 0.3: Install Openrefine

In-Class Activity

Introduction

Follow along with the introductory slides.

During today’s workshop, you will accomplish the following tasks:

flowchart of workshop activities

For Windows and Mac users

Step 1.1: Fork a GitHub repository

In this step, you will ‘fork’ an existing (original) GitHub repository (belonging to someone else) to create a branch repository in your GitHub account. This allows you to create a copy of the original repository that you can modify as desired…and perhaps, allow your changes to be merged back ‘upstream’ to the original repository.

Step 1.2: Clone to your local computer

In this next step, you will ‘clone’ your GitHub repository to your local computer so that you can work on files locally.

Step 1.3: Do things in Openrefine!

Step 1.4: Export/Save modified data and processing script from Openrefine

Step 1.5: Add and commit new files and changes to your local repository

Step 1.6: Push changes to your GitHub repository

Step 1.7: View and edit your readme file in the GitHub Editor

Step 1.8: Pull GitHub changes to your local repository

Step 1.9: Make a Pull Request to the original repository

On occasion, you may want to ask the maintainer of an ‘upstream’ branch to incorporate your changes. This can be done by making a Pull request

For Linux users

Step 1.0: Configuring your git account

Open up Git Bash and navigate to the desired directory for your repository

Step 1.1: Fork a GitHub repository

In this step, you will ‘fork’ an existing (original) GitHub repository (belonging to someone else) to create a branch repository in your GitHub account. This allows you to create a copy of the original repository that you can modify as desired…and perhaps, allow your changes to be merged back ‘upstream’ to the original repository.

Step 1.2: Clone to your local computer

In this next step, you will ‘clone’ your GitHub repository to your local computer so that you can work on files locally.

Step 1.3: Do things in Openrefine!

Step 1.4: Export/Save modified data and processing script from Openrefine

Step 1.5: Add and commit new files and changes to your local repository

Step 1.6: Push changes to your GitHub repository

Step 1.7: View and edit your readme file in the GitHub Editor

Step 1.8: Pull GitHub changes to your local repository

Step 1.9: Make a Pull Request to the original repository

On occasion, you may want to ask the maintainer of an ‘upstream’ branch to incorporate your changes. This can be done by making a Pull request

More information

git

Markdown

What is Markdown?

Borrowed shamelessly from Github’s Mastering Markdown page:

Markdown is a way to style text on the web. You control the display of the document; formatting words as bold or italic, adding images, and creating lists are just a few of the things we can do with Markdown. Mostly, Markdown is just regular text with a few non-alphabetic characters thrown in, like # or *.

Markdown uses simple notation to apply simple formatting rules. Since it’s pretty much just plain text, it’s transferrable and much simpler than marked-up text like HTML or even Word or Google documents. For much of the writing that you do for the web, Markdown is good enough. Github uses Markdown for its documents (this document was created in markdown), as does a variety of other web platforms (Trello, for example).

Using Markdown in Github lets you create readme files that can use better formatting than a plain text file, but is still readable as plain text – it’s the best of both worlds.

More information and references for Markdown: