Appendix A — Reference points, categorization, and discrimination

A.1 R packages used

As indicated in the Methods section, for all our analyses we used R (Version 4.0.4; R Core Team, 2021) and the R-packages brms (Version 2.16.1; Bürkner, 2017, 2018), cowplot (Version 1.1.1; Wilke, 2020), dplyr (Version 1.0.10; Wickham, François, et al., 2022), forcats (Version 0.5.2; Wickham, 2022a), ggimage (Version 0.3.1; Yu, 2022), ggiraph (Version 0.7.10; Gohel & Skintzos, 2021), ggplot2 (Version 3.3.6; Wickham, 2016), ggthemes (Version 4.2.4; Arnold, 2021), glue (Version 1.6.2; Hester & Bryan, 2022), here (Version 1.0.1; Müller, 2020), lubridate (Version 1.8.0; Grolemund & Wickham, 2011), magick (Version 2.7.3; Ooms, 2021), modelr (Version 0.1.9; Wickham, 2022b), papaja (Version 0.1.1; Aust & Barth, 2022), patchwork (Version 1.1.2; Pedersen, 2022), purrr (Version 0.3.4; Henry & Wickham, 2020), Rcpp (Eddelbuettel & Balamuta, 2018; Version 1.0.9; Eddelbuettel & François, 2011), readr (Version 2.1.2; Wickham, Hester, et al., 2022), readxl (Version 1.4.1; Wickham & Bryan, 2022), rstan (Version 2.21.2; Stan Development Team, 2020a), StanHeaders (Version 2.21.0.7; Stan Development Team, 2020b), stringr (Version 1.4.0; Wickham, 2019), tibble (Version 3.1.8; Müller & Wickham, 2022), tidybayes (Version 3.0.1; Kay, 2021), tidyr (Version 1.2.1; Wickham & Girlich, 2022), tidyverse (Version 1.3.2; Wickham et al., 2019), and tinylabels (Version 0.2.3; Barth, 2022).

A.2 Bayesian hierarchical model implementation details

A.2.1 Categorization responses

We fitted a hierarchical Bayesian binomial logistic regression model to the categorization response data with morph level as fixed effect, for each morph series separately, and participant ID within each morph series as random effect for both intercept and slope: \[ freqB \; | \; trials(n) \sim a + b * morph\_level\] \[ a \sim 0 + morph\_series + (1 \; | \; p \; | \; morph\_series:pp\_id)\] \[ b \sim 0 + morph\_series + (1 \; | \; p \; | \; morph\_series:pp\_id)\] As priors, we specified a normal distribution with a mean of zero and a standard deviation of 1.5 for the intercept and a normal distribution with a mean of zero and a standard deviation of 0.5 for the slope. For the other model parameters, default priors were used. We used 4 chains consisting of 8000 iterations, with 4000 warmup iterations per chain.

A.2.2 Categorization response times

We fitted a hierarchical Bayesian lognormal regression model to the categorization response time data with morph level and morph level squared as fixed effects, for each morph series separately, and participant ID within each morph series as random effect for both intercept and slopes: \[ catRT \sim a + b * morph\_level + c * morph\_level^2\] \[ a \sim 0 + morph\_series + (1 \; | \; p \; | \; morph\_series:pp\_id)\] \[ b \sim 0 + morph\_series + (1 \; | \; p \; | \; morph\_series:pp\_id)\] \[ c \sim 0 + morph\_series + (1 \; | \; p \; | \; morph\_series:pp\_id)\] As priors, we specified a normal distribution with a mean of -1 and a standard deviation of 0.5 for the intercept and a normal distribution with a mean of zero and a standard deviation of 0.3 for the slopes. The prior for sigma was specified as a normal distribution with mean 0.4 and standard deviation 0.3, and the priors for the standard deviations of intercept and slopes were specified as a normal distribution with mean 0.3 and standard deviation 0.1. We used 4 chains consisting of 8000 iterations, with 4000 warmup iterations per chain.

A.2.3 Discrimination responses

To investigate the presence of differences in discrimination sensitivity across stimulus pairs, we fitted a hierarchical Bayesian binomial logistic regression model to the discrimination response data with stepsize as fixed effect, for each morph series separately, and with trial stimuli and participant ID within each morph series as random effects for intercept and participant ID within each morph series as random effect for the slope: \[ freqdiff \; | \; trials(n) \sim a + b * stepsize\] \[ \begin{aligned} a \sim 0 + morph\_series + (1 \; | \; q \; | \; morph\_series:trial\_stimuli) \\ + (1 \; | \; p \; | \; morph\_series:pp\_id) \end{aligned} \] \[ b \sim 0 + morph\_series + (1 \; | \; p \; | \; morph\_series:pp\_id)\] As priors, we specified a normal distribution with a mean of zero and a standard deviation of 1.5 for the intercept and a normal distribution with a mean of zero and a standard deviation of 0.5 for the slope. We used 4 chains consisting of 8000 iterations, with 4000 warmup iterations per chain.

Figure A.1 shows the empirical proportions and posterior predictive distributions for responding ‘different’ in the successive discrimination task, per stepsize, trial type, and morph series, averaged across participants. Stepsize indicates the absolute difference in morph level between the two morph stimuli presented in a trial, with a minimum of zero (for same trials) and a maximum of eleven (when both extremes of the morph series are presented). In this supplementary figure, all stepsizes are shown.

Figure A.1: Proportion of ‘different’ responses in the successive discrimination task for each stepsize, trial type (i.e., between-category vs. within-category), and morph series separately, averaged across participants. Bars indicate the empirical proportions for responding ‘different’. The black dots indicate the mean posterior predictions from the model and the error bars indicate the 95% highest density continuous intervals (HDCI) of the posterior predictive distributions. In this figure, the difference between the darker and the lighter bars (i.e., the category boundary effect: more ‘different’ responses for between-category compared to within-category pairs, keeping stepsize equal) is on average higher for the recognizable than for the non-recognizable morph series. Note. In the interactive version of this figure, you can hover over the colored bars to see the exact percentage different responses, the mean and 95% HDCI of the posterior predictive distributions, and the number of trials related to each bar.

Figure A.2 shows the posterior distributions for the intercept (A) and the effect of stepsize (B) on the probability of responding ‘different’ in the successive discrimination task, for each morph series separately.

Figure A.2: Posterior distributions for the intercept (A) and the effect of stepsize (B) on the probability of responding ‘different’ in the successive discrimination task, for each morph series separately, in logodds units. Black dots and intervals indicate the mean, 66%, and 95% highest density continuous interval (HDCI) for each intercept or slope value. The colored dashed vertical lines indicate the estimated mean value per type of morph series (recognizable vs. non-recognizable). In this figure, the estimated effect of stepsize is larger for the recognizable than for the non-recognizable morph series. Note. In the interactive version of this figure, you can hover over the intervals to see the related mean and 95% HDCI for each distribution.

Figure A.3 shows the estimated pairwise differences between the posterior distributions for the effect of stepsize on responding ‘different’ in the successive discrimination task, for each of the different recognizable and non-recognizable morph series combinations.

Figure A.3: Estimated pairwise differences between the posterior distributions for the effect of stepsize on the probability of responding ‘different’ in the successive discrimination task for each of the different recognizable and non-recognizable morph series combinations, in logodds units. Black dots and intervals indicate the mean, 66%, and 95% highest density continuous interval (HDCI) for each slope or difference value. The black vertical line indicates a difference in slope of zero. In this figure, the estimated effect of stepsize is larger for the recognizable than for the non-recognizable morph series. Note. In the interactive version of this figure, you can hover over the intervals to see the related mean and 95% HDCI for each distribution.

To investigate the presence of an overall category boundary effect, we fitted a hierarchical Bayesian binomial logistic regression model to the discrimination response data with stepsize as fixed effect, for each morph series and trial type separately, and with trial stimuli and participant ID within each morph series as random effects for intercept and participant ID within each morph series as random effect for the slope: \[ freqdiff \; | \; trials(n) \sim a + b * stepsize\] \[ \begin{aligned} a \sim 0 + morph\_series * between\_category + (1 \; | \; q \; | \; morph\_series:trial\_stimuli) \\+ (1 \; | \; p \; | \; morph\_series:pp\_id) \end{aligned} \] \[ b \sim 0 + morph\_series * between\_category + (1 \; | \; p \; | \; morph\_series:pp\_id)\] As priors, we specified a normal distribution with a mean of zero and a standard deviation of 1.5 for the intercept and a normal distribution with a mean of zero and a standard deviation of 0.5 for the slope. We used 4 chains consisting of 8000 iterations, with 4000 warmup iterations per chain.

Figure A.4 shows the posterior distributions for the intercept (A), effect of stepsize (B), effect of trial type (C), and interaction between stepsize and trial type (D) on the probability of responding ‘different’ in the successive discrimination task, for each morph series separately.

Figure A.4: Posterior distributions for the intercept (A), effect of stepsize (B), effect of trial type (C), and interaction between stepsize and trial type (D) on the probability of responding ‘different’ in the successive discrimination task, for each morph series separately, in logodds units. Black dots and intervals indicate the mean, 66%, and 95% highest density continuous interval (HDCI) for each intercept or slope value. The colored dashed vertical lines indicate the estimated mean value per type of morph series (recognizable vs. non-recognizable). The black vertical line indicates a difference in slope of zero. In this figure, the estimated effect of stepsize is larger for the recognizable than for the non-recognizable morph series (B). The main effect of trial type is larger for the recognizable series car-tortoise and penguin-child than for all non-recognizable morph series (C). The interaction effect between stepsize and trial type is more negative for the recognizable series penguin-child than for all non-recognizable series (D). Note. In the interactive version of this figure, you can hover over the intervals to see the related mean and 95% HDCI for each distribution.

To investigate the presence of directional asymmetries, we fitted a hierarchical Bayesian binomial logistic regression model to the discrimination response data with stepsize as fixed effect, for each morph series separately, and with ordered trial stimuli and participant ID within each morph series as random effects for intercept and participant ID within each morph series as random effect for the slope: \[ freqdiff \; | \; trials(n) \sim a + b * stepsize\] \[ \begin{aligned} a \sim 0 + morph\_series + (1 \; | \; q \; | \; morph\_series:trial\_stimuli\_ordered) \\+ (1 \; | \; p \; | \; morph\_series:pp\_id) \end{aligned} \] \[ b \sim 0 + morph\_series + (1 \; | \; p \; | \; morph\_series:pp\_id)\] As priors, we specified a normal distribution with a mean of zero and a standard deviation of 1.5 for the intercept and a normal distribution with a mean of zero and a standard deviation of 0.5 for the slope. We used 4 chains consisting of 8000 iterations, with 4000 warmup iterations per chain.

Figure A.5 shows the aggregate behavioral data and posterior predictive distributions for responding “different” in the discrimination task, for each morph series, stepsize, and ordered stimulus pair separately.

Figure A.5: Proportion of ‘different’ responses in the successive discrimination task for each stepsize, ordered stimulus pair, and morph series separately, averaged across participants. Stimulus pairs are ordered per stepsize and from left to right in the morph series as presented in Figure 3.3. Bars indicate the empirical proportions for responding ‘different’. The black dots indicate the mean posterior predictions from the model and the grey error bars indicate the 95% highest density continuous intervals (HDCI) of the posterior predictive distributions. In this figure, no clear directional asymmetries (i.e., differences in discrimination performance based on the presentation order of the stimuli in the pair) are present. Note. In the interactive version of this figure, you can hover over the colored bars to see the stimuli involved in the pair (in the presented order from left to right), the exact percentage different responses, the mean and 95% HDCI of the posterior predictive distributions, and the number of trials related to each bar.

A.2.4 Similarity judgments

To investigate the presence of differences in perceived similarity across stimulus pairs, we fitted a hierarchical Bayesian linear regression model to the by-participant-standardized similarity judgments with stepsize as fixed effect, for each morph series separately, and with trial stimuli and participant ID within each morph series as random effects for intercept and participant ID within each morph series as random effect for the slope: \[ z\_response \sim a + b * stepsize\] \[ \begin{aligned} a \sim 0 + morph\_series + (1 \; | \; q \; | \; morph\_series:trial\_stimuli) \\+ (1 \; | \; p \; | \; morph\_series:pp\_id) \end{aligned} \] \[ b \sim 0 + morph\_series + (1 \; | \; p \; | \; morph\_series:pp\_id)\] As priors, we specified a normal distribution with a mean of zero and a standard deviation of 1.5 for the intercept and a normal distribution with a mean of zero and a standard deviation of 0.5 for the slope. We used 4 chains consisting of 8000 iterations, with 4000 warmup iterations per chain.

Figure A.6 shows the empirical similarity scores and posterior predictive distributions for the similarity judgments, per stepsize, trial type, and morph series, averaged across participants. Stepsize indicates the absolute difference in morph level between the two morph stimuli presented in a trial, with a minimum of zero (for same trials) and a maximum of eleven (when both extremes of the morph series are presented). In this supplementary figure, all stepsizes are shown.

Figure A.6: Standardized similarity scores for each stepsize, trial type (i.e., between-category vs. within-category), and morph series separately, averaged across participants. Grey dots indicate the raw standardized similarity scores. For the conditions that contain less trials, these grey dots are not always clearly visible. The colored dots and error bars indicate the mean posterior predictions from the model and the 95% highest density continuous interval (HDCI) of the posterior predictive distributions. In this figure, the difference between the darker and the lighter intervals (i.e., the category boundary effect: between-category pairs rated as less similar than within-category pairs, keeping stepsize equal) is on average larger for the recognizable than for the non-recognizable morph series. Note. In the interactive version of this figure, you can hover over the intervals to see the exact similarity score, the mean and 95% HDCI of the posterior predictive distributions, and the number of trials related to each interval.

Figure A.7 shows the posterior distributions for the intercept (A) and the effect of stepsize (B) on the standardized similarity judgments, for each morph series separately.

Figure A.7: Posterior distributions for the intercept (A) and the effect of stepsize (B) on the standardized similarity scores, for each morph series separately. Black dots and intervals indicate the mean, 66%, and 95% highest density continuous interval (HDCI) for each intercept or slope value. The colored dashed vertical lines indicate the estimated mean value per type of morph series (recognizable vs. non-recognizable). In this figure, the estimated effect of stepsize is larger (i.e., more different from zero) for the recognizable than for the non-recognizable morph series. Note. In the interactive version of this figure, you can hover over the intervals to see the related mean and 95% HDCI for each distribution.

Figure A.8 shows the estimated pairwise differences between the posterior distributions for the effect of stepsize on the similarity judgments, for each of the different recognizable and non-recognizable morph series combinations.

Figure A.8: Estimated pairwise differences between the posterior distributions for the effect of stepsize on the standardized similarity scores for each of the different recognizable and non-recognizable morph series combinations. Black dots and intervals indicate the mean, 66%, and 95% highest density continuous interval (HDCI) for each slope or difference value. The black vertical line indicates a difference in slope of zero. In this figure, the estimated effect of stepsize is larger (i.e., more different from zero) for the recognizable than for the non-recognizable morph series. Note. In the interactive version of this figure, you can hover over the intervals to see the related mean and 95% HDCI for each distribution.

To investigate the presence of an overall category boundary effect, we fitted a hierarchical Bayesian linear regression model to the by-participant-standardized similarity judgments with stepsize as fixed effect, for each morph series and trial type separately, and with trial stimuli and participant ID within each morph series as random effects for intercept and participant ID within each morph series as random effect for the slope: \[ z\_response \sim a + b * stepsize\] \[ \begin{aligned} a \sim 0 + morph\_series * between\_category + (1 \; | \; q \; | \; morph\_series:trial\_stimuli) \\+ (1 \; | \; p \; | \; morph\_series:pp\_id) \end{aligned} \] \[ b \sim 0 + morph\_series * between\_category + (1 \; | \; p \; | \; morph\_series:pp\_id)\] As priors, we specified a normal distribution with a mean of zero and a standard deviation of 1.5 for the intercept and a normal distribution with a mean of zero and a standard deviation of 0.5 for the slope. We used 4 chains consisting of 8000 iterations, with 4000 warmup iterations per chain.

Figure A.9 shows the posterior distributions for the intercept (A), effect of stepsize (B), effect of trial type (C), and interaction between stepsize and trial type (D) on the standardized similarity judgments, for each morph series separately.

Figure A.9: Posterior distributions for the intercept (A), effect of stepsize (B), effect of trial type (C), and interaction between stepsize and trial type (D) on the standardized similarity scores, for each morph series separately. Black dots and intervals indicate the mean, 66%, and 95% highest density continuous interval (HDCI) for each intercept or slope value. The colored dashed vertical lines indicate the estimated mean value per type of morph series (recognizable vs. non-recognizable). The black vertical line indicates a difference in slope of zero. In this figure, the estimated effect of stepsize is larger (i.e., more different from zero) for the recognizable than for the non-recognizable morph series (B). The main effect of trial type is larger (i.e., more different from zero) for the recognizable series car-tortoise and penguin-child than for all non-recognizable morph series (C). The interaction effect between stepsize and trial type is larger (i.e., more different from zero) for the recognizable series watch-seahorse than for all non-recognizable series (D). Note. In the interactive version of this figure, you can hover over the intervals to see the related mean and 95% HDCI for each distribution.

To investigate the presence of directional asymmetries, we fitted a hierarchical Bayesian linear regression model to the by-participant-standardized similarity judgments with stepsize as fixed effect, for each morph series separately, and with ordered trial stimuli and participant ID within each morph series as random effects for intercept and participant ID within each morph series as random effect for the slope: \[ z\_response \sim a + b * stepsize\] \[ \begin{aligned} a \sim 0 + morph\_series + (1 \; | \; q \; | \; morph\_series:trial\_stimuli\_ordered) \\+ (1 \; | \; p \; | \; morph\_series:pp\_id) \end{aligned} \] \[ b \sim 0 + morph\_series + (1 \; | \; p \; | \; morph\_series:pp\_id)\]

As priors, we specified a normal distribution with a mean of zero and a standard deviation of 1.5 for the intercept and a normal distribution with a mean of zero and a standard deviation of 0.5 for the slope. We used 4 chains consisting of 8000 iterations, with 4000 warmup iterations per chain.

Figure A.10 shows the posterior predictive distributions for the standardized similarity judgments, for each morph series, step size and ordered stimulus pair separately.

Figure A.10: Posterior predictive distributions for the responses to the successive similarity judgment task, for each morph series, stepsize, and ordered stimulus pair separately. Colored dots and error bars indicate the mean posterior predictions from the model and the 95% highest density continuous intervals of the posterior predictive distributions. In this figure, no clear directional asymmetries (i.e., differences in perceived similarity based on the presentation order of the stimuli in the pair) are present. Note. In the interactive version of this figure, you can hover over the mean posterior predictions to see the exact percentage different responses and the number of trials related to each data point.

A.3 Supplemental videos

In the HTML version of this Appendix, you can find screen recordings of some trials for each task that was part of this study (cf. Figure A.11).

Recognizable figures

Categorization task

Non-recognizable figures





Discrimination task







Similarity judgment task





Figure A.11: Screen recordings of some trials for the categorization, discrimination, and similarity judgment tasks that were part of this study.