Data Curation Training Event: Session 1 | Session 2 | Session 3

Community-Building Forum: Session 1 | Session 2

Data Curation Training Event

Workshop Session 1 (Wednesday, 16-October | 0915 - 1200)

Workshop 1a: Introduction to R for Social Scientists

Facilitator: John Simpson
Room: OJN 111

This is an introduction to R designed for participants with no programming experience. It will be a modified form of the Data Carpentry workshop “R for Social Scientists”. The full workshop is designed to take about 6 hours but we will only have 3 hours and so reasonable modifications to fit the available time will be made. The traditional workshop starts with some basic information about R syntax, the RStudio interface, and then moves through how to import CSV files, the structure of data frames, how to deal with factors, how to add/remove rows and columns, how to calculate summary statistics from a data frame, and a brief introduction to plotting.

Participants should bring their own laptop to the session and follow these setup instructions in advance:

Workshop 1b: Making Ethical and Effective Data Management Work for You

Facilitators: Chandra Kavanagh, James Doiron
Room: OJN 212

In a world where data are becoming increasingly complex and growing in magnitude, coupled with an evolving research data management (RDM) policy landscape - including impending requirement for formalized data management plans (DMPs) - it can be difficult to know how to effectively manage your data safely and securely. This workshop will highlight best practices in data management to help you employ strategies for safely and ethically managing your data throughout their lifecycle, from collection to curation, and beyond! Specific supports and services available to you, including the freely available Portage DMP Assistant platform, will be presented.

Workshop Session 2 (Wednesday, 16-October | 1300 - 1545)

Workshop 2a: Data Wrangling and Versioning with OpenRefine and GitHub

Facilitators: Lee Wilson, Jay Brodeur
Room: OJN 111

In this workshop, participants will use a hands-on approach to build experience and expertise in two important aspects of data curation: data wrangling and versioning. Participants will use GitHub as a means of acquiring, documenting, and versioning research data and code, while learning how to use Open Refine to explore, clean, and transform data into interoperable and reusable forms.
Link to workshop materials

Workshop 2b: Curating Data Sets for Reproducibility

Facilitators: Shahira Khair, Sandra Sawchuk
Room: OJN 212

This workshop will cover key considerations for preparing and sharing data and software to improve the reproducibility of published results. Participants will have the opportunity to put into practice lessons learned while exploring published data sets. Openly available tools that support reproducibility will also be examined.
Link to workshop materials

Workshop Session 3 (Thursday, 17-October | 0900 - 1145)

Workshop 3a: The DCN CURATE Model

Facilitators: Lisa Johnston, Cynthia Hudson Vitale
Room: OJN 111

Data curation skills span a wide variety of data types and discipline-specific data formats such as spatial data, code, databases, chemical spectra, 3D images, and genomic sequencing data. Each repository alone cannot reasonably account for all the curation expertise needed. One of the goals of the Data Curation Network is to create and openly share data curation procedures and best practices with the growing data curator professional community. In this workshop we will introduce a set of specialized curation procedures developed by the Data Curation Network called CURATE steps. These standardized checklists help ensure that all datasets submitted to the repository receive consistent treatment. Additionally we will show how this general framework has been put into practice by the community for specific file types/formats via “data curation primers” which are interactive, living documents that detail a specific subject, disciplinary area or curation task and that can be used as a reference to curate research data.

Workshop 3b: Curating Data in Repositories

Facilitators: Lee Wilson, Reyna Jenkyns, Meghan Goodchild, Brad Covey, Kaitlin Newson Room: OJN 212

Data repositories have tended to be viewed simply as infrastructure for data deposit, supporting discovery and appropriate access, but from a curatorial perspective, deposit into a repository is only the beginning. Curation services in support of FAIR datasets are increasingly recognized as an essential part of data deposit and the broader data management life cycle. This session will describe how domain and general repositories are meeting researcher needs. Various repository options will be discussed, along with their networks, registries and certifications. With that landscape in mind, specific data curation topics like metadata, persistent identifiers, data licensing and other interoperability considerations will be featured.

Community-Building Forum

Session 1 (Thursday, 17-October | 1300 - 1700)

Opening Remarks and Icebreaker

Facilitators: Lee Wilson, Jay Brodeur


  • Official welcome
  • Statement of goals and objectives, agenda, and expectations for participants
  • Summary of the training event
  • Ice-breaker activity: What is data curation and how does it support RDM and research, in general?

The Canadian Digital Research Infrastructure Landscape

Speaker: Jeff Moon

A summary of developments in the Canadian Digital Research Infrastructure (DRI) landscape, the state of national support for RDM, and the role of data curation in this ecosystem.

Panel & Audience Discussion: DRI, RDM, & The Role of Data Curation

Panellists: Alex Clark (UofA), Mark Leggott (RDC), Jeff Moon (CARL Portage) Moderator: Lisa Goddard


  • Each panelist will provide a 5-7 minute response to the question: “How does your organization view the role of data curation, and how is it addressing needs in this area?”
  • Audience participation (Q&A) and open discussion session

The Data Curation Network

Speakers: Cynthia Hudson Vitale, Lisa Johnston

An overview of the origination, development, and growth of the Data Curation Network – a coordinated cross-institution network of data curators and data curation services.

Day 1 Wrap-up Discussion

Facilitators: Lee Wilson, Jay Brodeur

Participants will work in small- to medium-sized breakout groups to discuss and respond to the following prompts:

  • What are the main barriers/challenges to data curation activities within your organization and in your work?
  • Articulate a vision for data curation activity and support in Canada.

Session 2 (Friday, 18-October | 0845 - 1530)

Day 2 Organization

Facilitators: Lee Wilson, Jay Brodeur

Activity Description:

  • Summary of day 1 wrap-up discussion
  • Schedule and goals for day 2

National Repository Updates

Speakers: Lee Wilson, Meghan Goodchild, Kaitlin Newson, Brad Covey, Reyna Jenkyns

A series of updates about national general- (SP Dataverse, FRDR) and domain-based (CIOOS) repositories, which consider data curation-related requirements and challenges. Audience discussion period to follow.

Breakout Session 1: Conceptualizing a National Approach to Data Curation

Facilitators: Lee Wilson, Jay Brodeur, Organizing Committee Members


  • Brief discussion of the vision articulated in day 1.
  • Participants will work in medium-sized breakout groups to discuss and respond to the following prompts:
    1. What are some approaches / models to achieving our vision on data curation in Canada?
    2. What are the challenges of, or barriers to, the identified models?
    3. What are the linkages or relationships between your curation model and other RDM services and systems? E.g., how does the plurality of Canadian data repositories fit into a curation network?
    4. Provide a short summary (3-4 sentences) and sketch of your preferred model, including a timeline for roll out (diagrams encouraged!).
  • Regroup and presentation

Breakout Session 2: Model Evaluation & Discussion

Facilitators: Lee Wilson, Jay Brodeur, Organizing Committee Members


  • Participants will work in medium-sized breakout groups to evaluate each of the proposed models using the following prompts:
    1. What are the strengths and weaknesses of this model?
    2. What resources and support would be required to implement this model?
    3. What parts should be coordinated at a national level? What should be addressed at a local or regional level?
    4. How might the model or timeline be revised based on this analysis?
  • Regroup and discussion.
    1. What were the major discussion points and general outcomes of the exercise?