C Appendix Chapter 3

C.1 Supplemental figures

Figure C.1: (a) Discrimination accuracy for the within-category trials, for each step size, morph series, and trial type separately (averaged across participants). (b) Standardized similarity judgments for the within-category different trials, for each step size, morph series, and trial type separately (averaged across participants). Note. In the interactive version of this document, hover over the bars to see the exact value of each bar as well the number of trials related to each bar.

C.2 Methodological information

Participants: 283 first-year psychology students from KU Leuven participated in the study (88.34% female, age between 17 and 23, mean age = 18.22, sd age = 0.86, 96.11% mother tongue Dutch).

Material: Stimuli were identifiable and non-identifiable morph series containing 11 stimuli. The identifiable morph series were based on the ones used in Hartendorp et al. (2010) and Burnett & Jellema (2013). The non-identifiable morph series were based on stimuli from Op de Beeck et al. (2003a).

For the purpose of the study, all identifiable stimuli were converted to luminance mode and squared. All identifiable and non-identifiable stimuli were made black (0) on a grey (211) background. Images for each morph series were resized based on the mean number of non-grey pixels in that series. Then, images were cropped and squared to the same size for all images. Finally physical similarity between neighboring stimuli was calculated.

For the non-identifiable morph series, the final “stepsize” used was bigger then the stepsize in the original stimuli, to approximately match the physical similarity between neighboring stimuli in the identifiable morph series. Also, the starting amplitude was sometimes changed to avoid “holes in the stimuli” at the end of the series.

The non-identifiable stimuli were generated in Matlab R2018a. Both the identifiable and non-identifiable stimuli were adapted (as described above) using Python 2.7.

The experiment itself was written in Python 2.7 and run on Windows computers with TFT screens of 21,5".

Procedure: Students participated in groups. To start the setup, they typed in a counterbalancing number assigned to that seat (numbers were redistributed after each session). This number was important for the assignment of the morph series to the different tasks. After giving informed consent, participants were asked for their participant number, gender, age, and mother tongue. Each participant then completed each main task of the study (categorization, discrimination, and similarity judgment task) for a different identifiable and non-identifiable morph series. The assignment of morph series to tasks was counterbalanced between participants (with the counterbalancing number at the start of the setup). The order of the three tasks and the order of identifiable versus non-identifiable series was randomized across participants.

Before the start of each new task, participants got instruction screens explaining what was expected from them during that task, and they got four example trials with a separate morph series:

Example stimuli.

Figure C.2: Example stimuli.

In all tasks, the exact position of the stimuli was jittered (from -20 to 20 on both x and y axes) to prevent focus on local feature changes only.

At the start of the categorization task, participants were shown four ‘clear’ exemplars per category, to give them an idea of the categories. Each trial in the categorization task consisted of (a) the presentation of a fixation cross (400 ms); (b) the presentation of the stimulus (300 ms); (c) a response screen reminding participants to press the left arrow key for category A and the right arrow key for category B. Which end of the series was labeled as category A was randomized across participants. Each stimulus in the morph series was presented 5 times (55 trials in total), and presentation order was randomized.

Each trial in the categorization task consisted of (a) the presentation of a fixation cross (400 ms); (b) the presentation of a first stimulus (300 ms); (c) intertrial interval (500 ms); (d) the presentation of a second stimulus (300 ms); and (e) a response screen reminding participants to press the left arrow key for same (different) and the right arrow key for different (same). Which key press was related to same or different was randomized across participants. All possible different trials were presented once in each direction, all possible same trials were presented 5 times (165 trials in total), and presentation order was randomized.

Each trial in the similarity judgment task consisted of (a) the presentation of a fixation cross (400 ms); (b) the presentation of a first stimulus (300 ms); (c) intertrial interval (500 ms); (d) the presentation of a second stimulus (300 ms); and (e) a response screen including a 9-point rating scale on which participants indicated how strongly the two figures resembled each other, going from 1 (very different) to 9 (very similar). All possible different trials were presented once in each direction, all possible same trials were presented twice (132 trials in total), and presentation order was randomized.

Data analysis: We used R for all preprocessing and analyses.

Preprocessing included: (a) combining separate datafiles per participant into datafiles for all participants combined; (b) changing the variable type of certain variables if needed; (c) anonymizing the data; (d) recoding key press responses in the categorization task when the series was reversed; (e) recoding key press responses in the discrimination task when the same and different buttons were reversed; (f) adding additional variables to the discrimination and similarity judgment datasets for simplifying further analyses. In addition, similarity ratings were standardized per participant per morph series to lessen the impact of a differential use of the scale across participants.

Data exclusions: We excluded categorization data from participants for a particular morph series when their probability to label the stimulus as category B was higher or equal for level -5 compared to level 5. Concerning categorization response times, we excluded response times below 200 ms and above 3 seconds.

We excluded data from participants with a mean accuracy more than two standard deviations below the overall mean accuracy. Concerning discrimination response times, we excluded response times below 200 ms and above 3 seconds.

Analyses: For now, I analyzed the data using frequentist binary logistic regression models. In the future, I plan to conduct Bayesian multilevel logistic regression models on this data set.

Data availability: The data and materials for this project are available on Open Science Framework: https://doi.org/10.17605/OSF.IO/UGCD8.

C.3 Supplemental videos

In the interactive version of this document, you find screen recordings of some trials of each task in this study.

identifiable figures

Categorization task
non-identifiable figures



Discrimination task



Similarity judgment task