Welcome to our Compendium Checklist!    

The first couple of questions are checking whether the computer you are working on for this research project has all necessary applications to work with the Standardized Research Compendium.
Feel free to skip the first two phases, if you have worked with our Standardized Research Compendium before, or feel that you do not need any information or help for those parts. You can always go back to earlier phases.

Version control is a system that records changes to a file or set of files over time so that you can recall or restore specific versions later. Working with a version control system increases the transparancy of your project.

I know version control and have worked with it before.

If you want to (re)visit or look up some principles of working with version control, you can find a comprehensive documentation here.

To work with the Standardized Research Compendium, you need to have Python Version XX, the package doit, and Git installed on your current computer.

I have all of this installed.

If you want to look into the advantages of the combination of Git, Python and doit, have a look at our GitHub page.

Git is a distributed version-control system for tracking changes in source code during software development.

I have Git installed on this computer.

You can find a comprehensive documentation of working with Git here.

    Show me how

Python has developed over time, using different version can cause conflicts with the proposed workflow of the Standardized Research Compendium.

I have this Python version installed.

You can find elaborate documentation on working with Python here.

    Show me how

doit is a modern open-source build-tool written in python designed to be simple to use and flexible to deal with complex work-flows.

I have run pip install doit and thereby have installed doit on my computer.

You can find more information on doit here.

    Show me how

Create a version control repository on GitHub or a similar provider, such as Bitbucket. Doing this from the start and using a version control repositoty to collaborate internally is a good strategy to ensure all needed data and scripts are always accessable. GitHub and other providers allow for private repositories for academic use.

I have created a version control repo.

You can find a comprehensive documentation of working with a Git repo here.

    Show me how

You can find the Version Control Structure proposed by the Standardized Research Compendium on the authors’ GitHub page.

I have forked, copied or cloned the Version Control Structure from the authors’ Github page..

Here you can find a comprehensive documentation of forking, copying, or cloning.

    Show me how

For the workflow, doit install is essential for the workflow of the Standardized Research Compendium. This will set up the necessary Python and/or R environment.

I have run doit install or know how to do it.

You can find more information on installing doit here.

    Show me how

Preregistration is the practice of documenting your research plan at the beginning of your study and storing that plan in a read-only public repository. For an overview of registry services, see OSF Registries.

Pre-registration is not applicable for my study (anymore).

For research that uses existing datasets, there is an increased risk of analysts being biased by preliminary trends in the dataset. This page provides a framework for preregistering studies that use existing data.

    Show me how

Set-Up Your Project's Version Control!

These questions allow you to set up the version control for the program and/or language you are working with when conducting your research. If you think you do not need any explanations regarding this part, you can just skip it.

Python is an interpreted, high-level, general-purpose programming language.

Explain dodo.py

I need help with it.

R is a programming language for statistical computing and graphics. If you are a beginning user of R, please visit the CCS GitHub page for tutorials.

packrat is a dependency management system for R packages.

I have installed packrat or worked with it before.

You can find information about the packrat package here, or you can visit the short online tutorial here to revisit some of the packrat principles.

    Show me how

If you used SPSS/Stata/SAS/MatLab/Atlas TI or anything else.

Proprietary data refers to any set of statistics, information, or documentation which is controlled solely by you or a third party. The data is typically protected by copyright law, trade secret laws, and even by patent agreement, in some cases.

I have not installed any extensions.

Set-Up Your Project's Folder Structure!

These questions allow you to set up the folder structure for your project.

Regardless of whether you work alone or in a team, commit early, commit often: Don’t procrastinate with adding code or material to GitHub. At a minimum, push changes after every day, but ideally after every coherent change, even if it is work in progress.

If you work on a project on your own, it is very crucial that you commit the changes that you make to your project early and often. This way you can keep track of what you are doing, make sure no data or code is lost and have the ability to work on your project from anywhere. You can find more information on what that is and how to do it here and here.

The data that you used for your project is owned by an individual or organization and is important enough to give them a competitive advantage. This could for example be usage data from a social network site or text from newspaper articles. This data is not open to the public and you thus need to take extra measures before you make your GitHub public to not violate any laws.

I do not work with proprietary data.

Data encryption translates data into another form, or code, so that only people with access to a secret key (formally called a decryption key) or password can read it. Encrypted data is commonly referred to as ciphertext, while unencrypted data is called plaintext. Currently, encryption is one of the most popular and effective data security methods used by organizations. Source: The Guardian

Your data should be encrypted if it is not possible to share your data openly, either because it is proprietary or because it includes sensitive information that cannot be made public as plain text. If you are not sure about matters of encryption and what data need to be encrypted, you should first consult the ethical board of your research institution.

Encryption is not needed for the type of data I work with.

    Show me how

Raw data is the data that you collected e.g. in your survey, your experiment, the data you scraped or retrieved from an API. There are many different forms of raw data - the important thing is that it has not been through any form of processing yet. It is crucial to preserve the raw data of your project if possible. Processed data is raw data that was modified by certain forms of data cleaning and processing such as dealing with missing values, making scales, aggregating data etc. In a research compendium, you should put the raw data in the "raw" folder (possibly encrypted).

I work with the raw data. This means my paper/report does not produce any visualization or analysis of the data. Remember that data visualizations makes it easier to communicate your ideas and findings to other people.

Often you use already existing functions (pieces of code to be reused) from packages. This is very common for example for R or Python. Sometimes, however, you will need your own specialized functions to deal with your data or do analyses. It is then important to preserve the code that you used in the compendium so that other researchers can re-run your analyses and adapt them where necessary.

I did not write any functions myself.

I did not save them.

I did not save them.

I have saved them.

I did not save them.

I did not save them.

Graphs and charts let you explore and learn about the structure of the information you collect. Good data visualizations also make it easier to communicate your ideas and findings to other people.

The results are not visualized.

I did not save them.

Finish Your Project's Compendium!

These four questions allow you to conduct the final checks for your online compendium before submitting your manuscript.

Open science, i.e. “making the content and process of producing evidence and claims transparent and accessible to others” (Munafo et al., 2017, p. 5) improves the robustness of scientific research by contributing to reproducibility, replicability, and generalizability of findings.

The code does not (yet) produce everything. Run git clone https://github.com/YOUR-USERNAME/YOUR-REPOSITORY) followed by doit and check where your code is not fully reproducible.

We appreciate your use of our checklist! Please reference our article.

Please reference our article:

  • APA:
  • Chicago:
  • Bibtex:
  • Endnote:

At each major milestone or release, check your compendium by cloning it and running doit on a new computer or virtual server. Doing this ensures your scripts are fully reproducible, transparent and open.

I did not check my scripts for my compendium on another computer.

Clone your compendium (git clone https://github.com/YOUR-USERNAME/YOUR-REPOSITORY) and run doit on a new computer or virtual server.

Obviously, you are hiding your name in the text of the article. However, sending a link to your GitHub repository will make identity known to the revieweers, which, in turn, defeats the purpose of a blind review.

Anonymous GitHub is a system to anonymize Github repositories so you can refer to your online compendium during the double-blind paper submission process.