Analyse PsyToolkitdata with R

This document is still under development during 2024

In short

If you have completed your survey (with or without experiments), you can use our PsyToolkit R code to quickly and easily read in all data. The R code can be downloaded (see link toward end of this document).

Why use R?

R is popular for statistical analyses in psychology
R is free of cost and easy to install (no hassles with licenses)
R is flexible

Of course, you first need to learn R coding, and that can take some time. But if you are a researcher and/or getting a degree in psychology and not afraid of statistics, this might well be right for you.

Do I need PsyToolkit raw datafiles at all?

You can also analyze your PsyToolkit data without R and just use the built-in options of PsyToolkit. This lesson is only for those interested in R and flexible data analysis.

What do PsyToolkit datafiles look like?

All PsyToolkit data files are simple textfiles that you can open with any text editor or Word. This makes allows you to process it in in any programming language you like. Here we use R, which is ideal for statistical coding.

The basic idea of PsyToolkit data is that all data is stored in a survey participant data file (starting with "s" followed by a long unique identifier string) and optionally additional files for each embedded experiment. Thus, for a survey with one experiment, each participant has a survey data files as well as one experiment data file.

Survey data files

For each participant, there is one survey data file (and if you use experiments, with an additional data file for each experiment in your survey).

Survey data file names start with the letter "s", followed by a long alphanumerical string, followed by the extension "txt". For example:

s.05060a15-d2be-4ed2-a026-73bd684b4772.txt

The long string is called a Unique Universal Identifcation (UUID). Each participant has such a file. If the participant completed the survey, the last line will be something like this (starting with the word "end"; the rest is the date in year, month, day, hour, minute format)

end 2024-07-02-14-26

The file starts with some information about the computer and time on the participant’s computer. Then for each question (starting with a line starting with "l:" followed by a space and the label name), there are several lines with the type of question. Importantly, there is the answer (a:) and a score line (s:). Some questions will have more than one "a:" line.

You do not need to fully understand this, because the R code takes care of reading these files in.

Experiment data files

Experiment data files start with the name of the experiment, followed by the date and then the UUID of the participant. The UUID is important, it is the only way of matching the experiment file to the participant.

The information in the experiment file is simply what you chose to save via the "save" lines in your PsyToolkit experiment code.

The basic idea of reading raw datafiles into R

PsyToolkit needs to have some information from you.

The path + name of the survey file. The survey file is typically called "survey.txt".
The path + name of the folder where your survey data files are
The path + name of the folder where your experiment data files are

In the template R files, there is a file called psytkConfig.r. You need to copy this to your own project and make sure that the info there is correct. The default values might be correct (in which case you do not need to change it).

For example, it can look like this:

psytkConfig.r

psytkSurveyFile    = "data/survey.txt"
psytkSurveyDir     = "data/survey_data"
psytkExperimentDir = "data/experiment_data"

We removed all the comment lines starting with a #

If you "source" the above file in R, which means you just "run" the content of the file, the three variables psytkSurveyFile, psytkSurveyDir and psytkExperimentDir are being set. If you have no experiments, you can ignore the value of the psytkExperimentDir line. You do not need to remove it.

You need to source this file with

source("psytkConfig.r")

After this, you need to run 2 more files. The first one "parses" the PsyToolkit survey file.

source("psytkParseSurvey.r")

After doing this, you will have a few more variables, the most important can be seen by typing its name on the R console.

psytkSurvey

It contains labels, the types, and the number of items for each question in your survey.

PsyToolkit now can use this information to read in all the results from your survey (and experiment) data files. This is done as follows:

source("psytkReadAnswers.r")

Now you will have a number of new variables which you can use in further analyses. Some of these variables might be empty. For example, if you have no textline questions, there will be no data in your variable tlAnswers.

radioAnswers and radioScores

For each radio question, you get for each participant an answer. Only if you use special scoring, the radioScores might be different from the radioAnswers.

setAnswers

For each set item in your questionnaire, you get a score here.

tlAnswers

For each textline, you get your entries here.

scaleAnswers and scaleScores

Similar as above, but for scales.

checkAnswers and checkScores

Similar as above, but for checks.

country

If IP data were collected, you get for each participant a country code.

startDate and startTime

For each participant, you get the start date and time. That is when the participant started the survey.

How to add to the basic R code of PsyToolkit to do more

How to remove and/or include raw data files

There are different ways to do this. Of course, if you remove the raw data files, they won’t be read. But you can also exclude raw datafiles based on specific criteria. Because each of the variables is basically an R "data.matrix" with for each participant a row, it is best to just create a "logical()" vector in R for selecting which data you want to use.

For example, imagine we only want to analyze data files collected in the month of June 2024. I would simply create a new variable called mySelection using the PsyToolkit startDate variable.

mySelection = startDate >= "2024-06-01" & startDate <= "2024-06-30"

And then you can later use mySelection to select the data you need. For example, if you have the ages of your participants in a variable called "age", you can do this:

mean( age[ mySelection ] , na.rm=T )

Example 1: Survey without experiments

We have a simple survey with a radio questions, check questions, a scale question, set items, and a textline question (for age).

All you need to do is to unzip the zip file and it comes with the PsyToolki R code.

All data are completely simulated. We only have data from 10 (simulated) participants. There is even an example incomplete data file in it. Note that it will not be analyzed.

All you need to run is the following to run all the code at once.

source("read.r")

Now, we look at some analysis options. We have examples of what you can do in "analysis.r" (not yet included). We will soon have a YouTube video explaining the analysis.

Example 2: Survey with one experiment

We are working on this …

Example 3: Survey with multiple experiments

We are working on this …

How to install

Download the zip file from here (from our Google Drive)
Unzip the file in your R project where you do your data analysis
The needed source files are in the folder source/
Copy those sources files to your own project
Check if psytkConfig.r is as you need it (you can change it)
source the files (as explained above)

Also note that the example data for Example 1 is included here.

Further information about the R code

The R code provided is as is. If you have suggestions for improving the code, please let us know via psytoolkit@gmail.com.

We prefer to stick to R-base code. If you make suggestions, please do not suggest tidyverse code.

In terms of style, we do not use the R "←" assignment code, but instead the equivalent "=".