Open Science: Fun & Easy!

Welcome to our Compendium Checklist!

The first couple of questions are checking whether the computer you are working on for this research project has all necessary applications to work with the Standardized Research Compendium.
Feel free to skip the first two phases, if you have worked with our Standardized Research Compendium before, or feel that you do not need any information or help for those parts. You can always go back to earlier phases.

Is it the first time you are working with Version Control?

Version control is a system that records changes to a file or set of files over time so that you can recall or restore specific versions later. Working with a version control system increases the transparancy of your project.

Yes No

I know version control and have worked with it before.

If you want to (re)visit or look up some principles of working with version control, you can find a comprehensive documentation here.

Version control

When working on vast, collaborative scientific projects, keeping track of different versions of your work can be a burden. Still, it is important to store older versions - not only because you might want to revert changes later on in the process, but also because you want to get insight into what changes your coworkers made to your precious file.

Luckily, there now exist digital version control systems, that make version control easy and understandable. Such version control systems save not only new files that you save to the system, but also the changes between the old and new version. This way, you can easily keep track of changes and revert to old versions (or even combine different "branches" of a file) as you please.

Right now, the most popular software amongst scientists for doing this is Git, a prrogram that manages "repositories" (the "folder" with your project), which stores files and each file's revision history. Github.com is an cloud web server which offers the opportunity to store your repository remotely, to easily collabore on projects with colleagues. A comprehensive documentation on getting started with Git can be found here.

Is it the first time you are working with Git, Python, and doit on your current computer?

To work with the Standardized Research Compendium, you need to have Python Version XX, the package doit, and Git installed on your current computer.

Yes No

I have all of this installed.

If you want to look into the advantages of the combination of Git, Python and doit, have a look at our GitHub page.

Working from a Terminal

Add info for Windows, Mac and Linux.

Is this the first time you are working with Git on this computer?

Git is a distributed version-control system for tracking changes in source code during software development.

Yes No

I have Git installed on this computer.

You can find a comprehensive documentation of working with Git here.

Show me how

Installing Git

Installing the digital version control system Git is easy - but works differently depending on your operating system. Visit this webpage to find the specific installation manual for your operating system: https://gist.github.com/derhuerst/1b15ff4652a867391f03

Is this the first time you are working with Python version x on your current computer?

Python has developed over time, using different version can cause conflicts with the proposed workflow of the Standardized Research Compendium.

Yes No

I have this Python version installed.

You can find elaborate documentation on working with Python here.

Show me how

Installing Python

Python is a programming language that is used by professionals from a wide variety of fields to write software. In Python, many packages (small programs) are written that offer you a specific type of functionality, like the 'doit'-package that we use in the compendium. Visit this webpage to find the specific installation manual for your operating system: https://wiki.python.org/moin/BeginnersGuide/Download

Is it the first time you will be working with the Python package doit on the current computer?

doit is a modern open-source build-tool written in python designed to be simple to use and flexible to deal with complex work-flows.

Yes No

I have run pip install doit and thereby have installed doit on my computer.

You can find more information on doit here.

Show me how

Installing doit

Doit is a powerful automation tool Python package. Simply put, it is a package that enables you to define simple tasks for a program to perform (like building and maintaining a research compendium!). Visit this page for more information about doit: https://pydoit.org/index.html To install doit, simply type "pip3 install doit" (Windows) or "sudo pip3 install doit" (Linux / Mac) in the terminal. Note: 'pip' is a package installer for Python, which mostly comes pre-installed with your Python version. If, for whatever reason, you have to (re-)install pip, visit this website for an installation guide for different systems: https://pip.pypa.io/en/stable/

Is it the first time you are creating a version control repository? Or is a version control repository for this project non-existing?

Create a version control repository on GitHub or a similar provider, such as Bitbucket. Doing this from the start and using a version control repositoty to collaborate internally is a good strategy to ensure all needed data and scripts are always accessable. GitHub and other providers allow for private repositories for academic use.

Yes No

I have created a version control repo.

You can find a comprehensive documentation of working with a Git repo here.

Show me how

The compendium structure

To make your project easily understandable and interpetable for others, we advise you to use our standardized research compendium for structuring your project. To do so, we explain you how to copy the blueprint for this structure in the next step. If you still feel like creating your own folder structure, you can visit this webpage to learn how to create a Github repository: https://help.github.com/en/github/getting-started-with-github/create-a-repo

Is it the first time you are going to fork, copy or clone the Version Control Structure proposed by the Standardized Research Compendium?

You can find the Version Control Structure proposed by the Standardized Research Compendium on the authors’ GitHub page.

Yes No

I have forked, copied or cloned the Version Control Structure from the authors’ Github page..

Here you can find a comprehensive documentation of forking, copying, or cloning.

Show me how

Forking the compendium structure

By 'forking' a repository on Github, you simply make your own copy of the repository which you can subsequently edit as you see fit. We advise you to fork the basic folder structure template from Github, to ensure that your project adheres to the standardized folder structure. You can find the template here: https://github.com/ccs-amsterdam/compendium To fork the template, go to the template page and click 'Fork' button on the top right corner of the screen.

Do you need help or would you like more information about running doit install in the command line to set up the necessary Python and/or R environment for you research project?

For the workflow, doit install is essential for the workflow of the Standardized Research Compendium. This will set up the necessary Python and/or R environment.

Yes No

I have run doit install or know how to do it.

You can find more information on installing doit here.

Show me how

Run doit install

To get your repository up and running, you have to access the doit package to install all neccessary Python and/or R components to have your project all set up. This is done simply by opening your terminal, nagivating to your repository and running "doit install".

Have you considered pre-registering your study?

Preregistration is the practice of documenting your research plan at the beginning of your study and storing that plan in a read-only public repository. For an overview of registry services, see OSF Registries.

Yes No

Pre-registration is not applicable for my study (anymore).

For research that uses existing datasets, there is an increased risk of analysts being biased by preliminary trends in the dataset. This page provides a framework for preregistering studies that use existing data.

Show me how

Set-Up Your Project's Version Control!

These questions allow you to set up the version control for the program and/or language you are working with when conducting your research. If you think you do not need any explanations regarding this part, you can just skip it.

Does your project use Python?

Python is an interpreted, high-level, general-purpose programming language.

Yes No

Do you need help with activating the environment for version control of the package?

Explain dodo.py

Yes No

I need help with it.

Does your project use R?

R is a programming language for statistical computing and graphics. If you are a beginning user of R, please visit the CCS GitHub page for tutorials.

Yes No

Is it the first you will be working with the packrat package?

packrat is a dependency management system for R packages.

Yes No

I have installed packrat or worked with it before.

You can find information about the packrat package here, or you can visit the short online tutorial here to revisit some of the packrat principles.

Show me how

Does your project use another – not open source – analysis language?

If you used SPSS/Stata/SAS/MatLab/Atlas TI or anything else.

Yes No

Other programs and Compendium

Add what to do to make the compendium reproducible

Do you install any extensions when cleaning, processing and/or analyzing the data in your research project?

Proprietary data refers to any set of statistics, information, or documentation which is controlled solely by you or a third party. The data is typically protected by copyright law, trade secret laws, and even by patent agreement, in some cases.

Yes No

I have not installed any extensions.

Set-Up Your Project's Folder Structure!

These questions allow you to set up the folder structure for your project.

Are multiple researchers involved in this project?

Regardless of whether you work alone or in a team, commit early, commit often: Don’t procrastinate with adding code or material to GitHub. At a minimum, push changes after every day, but ideally after every coherent change, even if it is work in progress.

Yes No

If you work on a project on your own, it is very crucial that you commit the changes that you make to your project early and often. This way you can keep track of what you are doing, make sure no data or code is lost and have the ability to work on your project from anywhere. You can find more information on what that is and how to do it here and here.

Do you work with proprietary data?

The data that you used for your project is owned by an individual or organization and is important enough to give them a competitive advantage. This could for example be usage data from a social network site or text from newspaper articles. This data is not open to the public and you thus need to take extra measures before you make your GitHub public to not violate any laws.

Yes No

I do not work with proprietary data.

Do you want to encrypt your data?

Data encryption translates data into another form, or code, so that only people with access to a secret key (formally called a decryption key) or password can read it. Encrypted data is commonly referred to as ciphertext, while unencrypted data is called plaintext. Currently, encryption is one of the most popular and effective data security methods used by organizations. Source: The Guardian

Your data should be encrypted if it is not possible to share your data openly, either because it is proprietary or because it includes sensitive information that cannot be made public as plain text. If you are not sure about matters of encryption and what data need to be encrypted, you should first consult the ethical board of your research institution.

Yes No

Encryption is not needed for the type of data I work with.

Show me how

Does your project involve data processing or cleaning?

Raw data is the data that you collected e.g. in your survey, your experiment, the data you scraped or retrieved from an API. There are many different forms of raw data - the important thing is that it has not been through any form of processing yet. It is crucial to preserve the raw data of your project if possible. Processed data is raw data that was modified by certain forms of data cleaning and processing such as dealing with missing values, making scales, aggregating data etc. In a research compendium, you should put the raw data in the "raw" folder (possibly encrypted).

Yes No

I work with the raw data. This means my paper/report does not produce any visualization or analysis of the data. Remember that data visualizations makes it easier to communicate your ideas and findings to other people.

Have you written functions yourself on which the data-processing or analysis depend?

Often you use already existing functions (pieces of code to be reused) from packages. This is very common for example for R or Python. Sometimes, however, you will need your own specialized functions to deal with your data or do analyses. It is then important to preserve the code that you used in the compendium so that other researchers can re-run your analyses and adapt them where necessary.

Yes No

I did not write any functions myself.

Have you saved the functions in a script in the folder src/lib?

Yes No

I did not save them.

Have you saved the data cleaning / processing sripts that generate any intermediate or temporary data files needed by the analyses in src/data-processing?

Yes No

I did not save them.

Have you saved the files that contain the actual analysis?

Yes No

I have saved them.

Have you saved the finished data sets that are of general interest in results/datasets?

Yes No

I did not save them.

Have you saved the written descriptions of results, i.e. a final paper in results/report?

Yes No

I did not save them.

Are your analyses in the paper visualized?

Graphs and charts let you explore and learn about the structure of the information you collect. Good data visualizations also make it easier to communicate your ideas and findings to other people.

Yes No

The results are not visualized.

Have you saved the figures in results/figures?

Yes No

I did not save them.

Finish Your Project's Compendium!

These four questions allow you to conduct the final checks for your online compendium before submitting your manuscript.

Does the code in your compendium produce all statistics, tables, figures, etc. in your manuscript?

Open science, i.e. “making the content and process of producing evidence and claims transparent and accessible to others” (Munafo et al., 2017, p. 5) improves the robustness of scientific research by contributing to reproducibility, replicability, and generalizability of findings.

Yes No

The code does not (yet) produce everything. Run git clone https://github.com/YOUR-USERNAME/YOUR-REPOSITORY) followed by doit and check where your code is not fully reproducible.

Does the manuscript refer to the Standardized Research Compendium article?

We appreciate your use of our checklist! Please reference our article.

Yes No

Please reference our article:

APA:
Chicago:
Bibtex:
Endnote:

Did you have another person or another computer try to run the compendium, and checked the results?

At each major milestone or release, check your compendium by cloning it and running doit on a new computer or virtual server. Doing this ensures your scripts are fully reproducible, transparent and open.

Yes No

I did not check my scripts for my compendium on another computer.

Clone your compendium (git clone https://github.com/YOUR-USERNAME/YOUR-REPOSITORY) and run doit on a new computer or virtual server.

Did you anonymize your repository before submitting?

Obviously, you are hiding your name in the text of the article. However, sending a link to your GitHub repository will make identity known to the revieweers, which, in turn, defeats the purpose of a blind review.

Yes No

Anonymous GitHub is a system to anonymize Github repositories so you can refer to your online compendium during the double-blind paper submission process.