Document [original]

TYPE Original Research

PUBLISHED 18 December 2023

DOI 10.3389/frai.2023.1223251

OPEN ACCESS

EDITED BY

Thomas Hartung,

Johns Hopkins University, United States

REVIEWED BY

Manuel Gentile,

Institute for Educational Technology - National

Research Council of Italy, Italy

Riccardo De Benedictis,

National Research Council (CNR), Italy

*CORRESPONDENCE

Linda Heimisch

[email protected]

Giacomo Zamprogno

[email protected]

†PRESENT ADDRESS

Giacomo Zamprogno,

Department of Computer Science, Vrije

Universiteit Amsterdam, Amsterdam,

Netherlands

RECEIVED 29 May 2023

ACCEPTED 28 November 2023

PUBLISHED 18 December 2023

CITATION

Zamprogno G, Dietz E, Heimisch L and

Russwinkel N (2023) A hybrid computational

approach to anticipate individuals in sequential

problem solving. Front. Artif. Intell. 6:1223251.

doi: 10.3389/frai.2023.1223251

Russwinkel. This is an open-access article

distributed under the terms of the Creative

Commons Attribution License (CC BY). The use,

distribution or reproduction in other forums is

permitted, provided the original author(s) and

the copyright owner(s) are credited and that

the original publication in this journal is cited, in

accordance with accepted academic practice.

No use, distribution or reproduction is

permitted which does not comply with these

terms.

A hybrid computational approach

to anticipate individuals in

sequential problem solving

Giacomo Zamprogno1,2*†, Emmanuelle Dietz2, Linda Heimisch3*

and Nele Russwinkel3,4

1Department of Computer Science and Engineering, University of Bologna, Bologna, Italy, 2Airbus

Central Research and Technology, Hamburg, Germany, 3Department of Psychology and Ergonomics,

Technische Universität Berlin, Berlin, Germany, 4Group Human-Aware AI, Institut of information Systems,

Universität zu Lübeck, Lübeck, Germany

Human-awareness is an ever more important requirement for AI systems that are

designed to assist humans with daily physical interactions and problem solving.

This is especially true for patients that need support to stay as independent as

possible. To be human-aware, an AI should be able to anticipate the intentions

of the individual humans it interacts with, in order to understand the difficulties

and limitations they are facing and to adapt accordingly. While data-driven AI

approaches have recently gained a lot of attention, more research is needed

on assistive AI systems that can develop models of their partners’ goals to offer

proactive support without needing a lot of training trials for new problems. We

propose an integrated AI system that can anticipate actions of individual humans

to contribute to the foundations of trustworthy human-robot interaction. We

test this in Tangram, which is an exemplary sequential problem solving task that

requires dynamic decision making. In this task the sequences of steps to the goal

might be variable and not known by the system. These are aspects that are also

recognized as real world challenges for robotic systems. A hybrid approach based

on the cognitive architecture ACT-R is presented that is not purely data-driven but

includes cognitive principles, meaning heuristics that guide human decisions. Core

of this Cognitive Tangram Solver (CTS) framework is an ACT-R cognitive model

that simulates human problem solving behavior in action, recognizes possible

dead ends and identiﬁes ways forward. Based on this model, the CTS anticipates

and adapts its predictions about the next action to take in any given situation.

We executed an empirical study and collected data from 40 participants. The

predictions made by CTS were evaluated with the participants’ behavior, including

comparative statistics as well as prediction accuracy. The model’s anticipations

compared to the human test data provide support for justifying further steps built

upon our conceptual approach.

KEYWORDS

cognitive modeling, hybrid (symbolic sub-symbolic) approach, sequential problem

solving, anticipation, human-robot collaboration (HRC)

1 Introduction

The ability to adapt to a developing situation is a highly desired feature in every-

day interaction. Humans are incredibly good at adapting to their environment and in

interaction with other humans or (artificial) systems. This skill enables humans to enhance

their performance and the performance of others in the environment.

Frontiers in Artiﬁcial Intelligence 01 frontiersin.org

Zamprogno et al. 10.3389/frai.2023.1223251

In recent years, various disciplines that aim at implementing

AI systems and robotic systems within an interactive environment

have realized the need for human-aware AI or human-centered

AI design (Kambhampati, 2020). Kambhampati (2020) defined

Human-Aware AI Systems as goal-directed autonomous systems

that are capable of effectively interacting, collaborating, and

teaming with humans. Challenges in designing such human-aware

AI systems include modeling the mental states of humans-in-the-

loop, recognizing their desires and intentions, providing proactive

support, exhibiting explicable behavior, giving cogent explanations

on demand, and engendering trust.

While recent advances have focused on the emotional aspect

of interaction, successful human-robot interaction requires the

machine to understand, and model the intentions and strategies

of the individual humans they interact with in order to adapt

to the human partner (Rossi et al., 2017). It has been shown

that interactions with robotic systems that do not show adequate

feedback will cause frustration for their human cooperation

partners (Weidemann and Russwinkel, 2021).

This is especially important for patients suffering from

Parkinson’s disease or other neural impairments that face

challenges living on their own or being dependent on others even

for simple tasks. 20% of the world’s population lives with some

level of cognitive impairment (World Health Organization and

others, 2011). According to Kosch et al. (2018) an individual with

such impairments may have difficulties in learning, remembering

information, or making decisions. A cognitive impairment can

impact someone’s ability to complete traditional activities of daily

living. Different facilities exist to provide specialized training to

people with cognitive impairments as they learn independent

living skills. The ultimative goal is to teach the patients to live

independently. Contextualized assistance has been shown to be

effective in helping people with cognitive impairments perform

individual work. So far only displays or augmented reality tools

have been used but with cognitive anticipative models future

systems could provide better support. An assistance system that

is adaptive to the individual and the task progress, that would

understand the task state and offer adaptive support could ensure

independence of such patients to a crucial degree. It has been shown

that self efficiency for such tasks is crucial to preserve still intact

skills and self esteem. But assistance systems that adapt to situation,

task and context would enable a new generation of human-aware

AI systems. We believe that the ability to anticipate the intentions

and cognitive processes of a person is a key skill that underlies

their adaptability. Especially the ability to anticipate high level (taks

level) and low level intentions (actions) (Gomez Cubero and Rehm,

2021) is crucial to proactively offer support or anticipate dangerous

situations in time.

According to Klein et al. (2007) anticipatory thinking is a

critical macrocognitive function of individuals and teams. It is

the ability to prepare in time for problems and opportunities.

We distinguish it from prediction because anticipatory thinking

is functional–people are preparing themselves for future events,

not simply predicting what might happen. Therefore anticipation

implies the skill to understand the task structure the partner is in

and difficulties the partner is facing. For assisting the partner in a

task where the final solution might not be known but - as a teacher

- the assistant can still be aware of general pitfalls and short term

solutions for the next step.

First steps toward such a goal would be an approach that

•is able to offer support even if the final goal is not known (or

trained) in detail,

•understands state of the task and difficulties related to it,

•understands common mistakes and the problem that the

individual partner might face,

•anticipates the partner,

•is flexible to the individual trace of actions, and

•offers appropriate support and only when needed, without

taking over from the patient or partner.

Taking these requirements as a starting point, we will present

an integrated AI system that has a computational theory of mind,

which it uses to anticipate actions of individuals for a particular

task and adapt according to previous anticipations. In real world

tasks we are often faced with developing situations - each action or

decision changes the state of the problem which is referred to as

dynamic decision making.

According to Brehmer (1992) dynamic decision making occurs

under conditions which require a series of decisions, where the

decisions are not independent, where the state of the world changes,

both autonomously and as a consequence of the decision maker’s

actions, and where the decisions have to be made in real time.

As a first step we developed an approach that involves dynamic

decision making, and is simple to understand and investigate, but

is sufficiently complex having a broad solution space where each

step changes the problem state. The sequential problem solving

task Tangram fits this requirement. This task will be explained

in more detail in Section 3. Whenever a piece is chosen and

placed the remaining pieces and remaining free spaces change -

so we have a new situation. A solution with machine learning

would need a lot of training trials for each problem and does

not seem appropriate. On the other hand, when observing other

humans solving the task, it seems straightforward for a human to

understand the other’s perspective and recognize their intent for the

next move.

Regarding the requirement above of anticipating the partner a

method is needed for this task that can trace the individual action

decisions of a participant and apply general rules to understand the

situation and possible approaching problems in order to support

but not necessarily taking over the task.

Especially for patient-robot collaboration human-aware

capabilities like non-verbal communication, shared-control and

proactive support are crucial for successful patient assistance and

therefore require new solutions how to take into account the

individual in the task. In addition cognitive principles should be

taken into account which are hypotheses for human strategies, one

such cognitive principle can be affordance. According to Gibson

(2014), the use of an object is intrinsically determined by its

physical shape. Such clues indicate possibilities for action in a task.

They are perceived in a direct, immediate way and might also lead

to wrong solutions as we will show. In order to anticipate a person

in a task it is necessary to include such cognitive principles to

understand intentions and errors.

Frontiers in Artiﬁcial Intelligence 02 frontiersin.org

Zamprogno et al. 10.3389/frai.2023.1223251

Different than purely data-driven approaches, we will present

a hybrid approach, which exploits the strengths of various

technologies. The paper’s contribution is three-fold:

(a) The results and analysis of an empirical study to a problem

solving geometrical task (the Tangram) that, while conceptually

simple, requires skills that can probably be generalized to more

complex tasks.

(b) A novel hypothesis for human strategies, so-called cognitive

principles, for the case of the sequential problem solving task

is suggested, taking as a baseline the concept of affordance and

providing different variations thereof.

object recognition and modeling with cognitive architectures to

enable an agent to reproduce the strategies and provides basic

insights toward predicting and/or anticipating human behavior

in the solution process. This system shows how the strengths

of machine learning and the inherently explainable cognitive

modeling approach can be combined. Finally, four different

evaluation methods are presented to evaluate the framework,

which, while extending on the main plan of studying the

underlying principles, we hope will provide a basic benchmark

for future studies.

2 Related work

2.1 Macro-cognition

The concept of macrocognition is a way of describing cognitive

work as it naturally occurs (Klein and Wright, 2016). It is a term

to indicate a level of description of the cognitive functions that

are performed in natural decision-making settings. According to

Klein and Wright (2016) important macrocognitive phenomena

are problem detection, situation assessment, attention management

and uncertainty management. The term comprises the mental

activities that must be successfully accomplished to perform a

task or achieve a goal. And this is needed for a human-aware

AI approach that understands the partners task state on such a

macrocognitive level. It is relevant to have tools that can achieve

“understanding” at this macrocognitive level for supporting a

human in a difficult task. This is sense making at an abstract task or

situation understanding. Understanding that a person got herself

in a dead-end or tries to find an impossible solution can provide

valuable support. Not for every situation there will be a solution

available but a macrocognitive understanding of a problem is high

relevant.

2.2 Human-aware approaches

During the last decades, theories on human reasoning that

aimed to understand, model, and eventually predict their decisions

[e.g. Johnson-Laird (1983), Rips (1994), Polk and Newell (1995),

Chater and Oaksford (1999), Oaksford and Chater (2020),

and Knauff and Gazzo Castañeda (2021)] can be seen as a

theoretical foundation on human-awareness. These approaches

differ fundamentally from the normative requirements of classical

logical reasoning. The current two most dominant paradigms

are either based on the mental model theory (Knauff and

Gazzo Castañeda, 2021) or bayesian (Oaksford and Chater,

2020). Another view is to formalize commonly agreed-to patterns

as cognitive principles and parametrize them to account for

the variety among the human population [e.g., Dietz Saldanha

and Schambach (2020)] and applied to cognitive argumentation

[e.g., Dietz Saldanha and Kakas (2019)]. Dietz et al. (2022)

integrated this approach for quantitative symbolic reasoning

including the notion of Bayesian plausibility. Cognitive modeling

in real world environments toward human-aware systems in

the context of cockpit and pilots’ mental state was done in

(Klaproth et al., 2020;Blum et al., 2022). These approaches

integrated the pilots’ neurophysiological responses from a passive

brain-computer interface within a cognitive model to trace pilots

perception and processing of auditory alerts and messages during

operations. Their work demonstrates how cognitive models can be

complemented with the neurophysiological data for adaptation and

action (sequence) anticipation. Another human-aware approach

was implemented by Scharfe-Scherf et al. (2022) in the domain of

highly automated driving. The question of how much time a driver

needs to safely transition from autonomous driving to manual

driving is still debated. Scharfe-Scherf et al. (2022) developed

a cognitive model that simulates the construction of situation

awareness depending on gazes to the relevant objects next to the

car. This way the prediction of transition times depending on

the complexity of the situation were possible. Dietz and Klaproth

(2021) proposed the cognitive modeling library txt2actr which

provides an interface between the environment specification and

the cognitive architecture ACT-R (Anderson, 2007). It automates

the construction of knowledge about the task environment and

facilitates updating new information in dynamic environments,

such as constantly changing values in milliseconds (Scharfe-Scherf

et al., 2022).

2.3 Theory of mind and cognitive

architectures

The capability to understand someone else’s purposes,

intentions and goals seems to be a task that is simple for humans

but difficult for AI systems (Lieto, 2021). One reason for this

capability is often explained by the theory of mind (TOM)

(Premack and Woodruff, 1978) stating that usually humans (from

the age of 4 years on) can imagine themselves into someone’s else

mental state, that is they can take another person’s perspective

without the actual experience. One assumption is that if we know

how to build the human mind computationally, we should be able

to rebuild and simulate human behavior. This simulation could

help AI systems to understand others’ purposes, intentions and

goals and adapt accordingly. According to Newell, a theory of

the human mind should address all aspects of cognition. For this

purpose he suggests to develop cognitive architectures (Newell,

1990), which unify different information processing structures.

ACT-R (Anderson, 2007) and SOAR (Laird, 2012) are such

architectures, allowing the simulation of cognitive processes. These

architectures have made a significant contribution on building a

baseline for formal methodologies by implementing and evaluating

Frontiers in Artiﬁcial Intelligence 03 frontiersin.org

Zamprogno et al. 10.3389/frai.2023.1223251

FIGURE 1

The seven tans (left) and their usual starting position (right).

models based on existing theories. However, these architectures

allow a high degree of freedom. Therefore the “standard model of

the mind” (or “common model of cognition”) has been proposed

by Laird et al. (2017) to “facilitate shared cumulative progress”

and align theories on the architectural level. When considering

again the challenges of human-aware AI systems, then Cognitive

architectures could help overcome challenges of human-aware AI

in understanding the human’s awareness of a given environment.

Ideally, “knowing” the human’s perspective, the AI system should

be capable to adapt accordingly.

2.4 Problem solving tasks in education and

medicine

Problem solving tasks have been repeatedly suggested

and implemented in education in order to enhance learners

mathematical and spatial skills (Lee et al., 2009;Judd and

Klingberg, 2021). This can apply to a wide range of possible

targets, from children (Bohning and Althouse, 1997) to elderly

people (García et al., 2010;Frutos-Pascual et al., 2012). A

cognitive model for the solution mechanism of puzzles

could help support and improve already present techniques:

first, it would offer a reference to compare with the results

of the learner, possibly helping to target weaker points or

understanding reasoning patterns and causes of mistakes.

Secondly, as more and more automated systems start taking the

roles of supporting agents, a system provided with some type

of TOM could offer better understanding and support to the

users.

This section has given a brief overview of the topics that

are related to the approach we will propose in this paper: a

Computational Human-Aware System for Sequential Problem

Solving guided by Cognitive Principles. Social robotics and human-

machine teaming require human aware AI, thus a system that has

the capability to anticipate and adapt to their human users. Taking

this challenge as a starting point we consider a specific sequential

problem solving task (Section 3) and present a computational

system that anticipates and adapts its predictions based on the

simulation within a cognitive architecture (Section 5). The baseline

of this system is provided by theories from macro-cognition

and human-aware task analysis which are grounded in so-called

cognitive principles (Section 4).

3 The Tangram study

3.1 The Tangram puzzle

The Tangram is an ancient Chinese puzzle in which seven

pieces, also called tans, are obtained from an original square and

need to be reorganized into different figures.

The seven tans consist of 5 square triangles (2 small, 1 medium

and 2 large), 1 square and 1 parallelogram, their relative dimensions

and sizes is shown in Figure 1 left.1

Usually players are presented with a homogeneous silhouette,

likely in the shape of some stylized figure, and are asked to

reproduce the pattern by positioning all the tans, without overlaps.

Figure 1 right shows their usual starting position as presented to the

players. Throughout the paper, we will use the symbols ,2,△B,△M,

and △Sto refer to the square, parallelogram, big triangle, middle

triangle and small triangle, respectively. The advantage of Tangram

as use case for modeling cognitive solving behavior comes from

two factors. First, it is relatively easy to define and describe, with a

short set of rules and limited ambiguity. Second, it is an example

of a sequential problem solving task, in the sense that different

sequences of steps can lead to very different solution procedures,

and thus it cannot be simply defined by a deterministic evolution of

states.

These factors contribute in making the Tangram a solid starting

point for guiding research in sequential problem solving. If a

plausible TOM is suggested for the puzzle, it might provide support

to the fact that a similar approach can be further applied to practical

tasks that, while more complex and elaborate, share common

features with the Tangram, examples of which could be educational

and rehabilitation activities.

In addition, the inclusion of the Tangram puzzle in HCI

for education and medicine is a topic that despite a recent

rise in interest (García et al., 2010;Frutos-Pascual et al., 2012;

Kirschner et al., 2016) still offers plenty of space for research

1 In the following, screenshots showing tans or Tangram puzzles are taken

from an adapted version of http://python4kids.net/downloads/py4k_cda4/

turtledemo-python31/tdemo_games/tangram.py.

Frontiers in Artiﬁcial Intelligence 04 frontiersin.org

Zamprogno et al. 10.3389/frai.2023.1223251

effort. As the usage of Tangram as an educational tool has

been known for some years, developing HCI solutions based on

the puzzle seem a natural and promising advance that could

provide new tools for instructors and assistants alike. Until now,

computer-based assistants for Tangram like the ones mentioned

above have been based on search algorithm or machine-learning

solutions and a natural extension would thus be agents that

model and understand their partners’ solution processes, in order

to better interface with users and provide more explainable

support.

FIGURE 2

Tangrams to be solved in the empirical study were house boat, house with chimney, sailing boat and monk (left). Screenshot when house with

chimney was presented (right).

FIGURE 3

Frequency of chosen pieces by step for HOUSE (left) and MONK (right).

FIGURE 4

Common pattern for HOUSE (left) and MONK (right) Tangram.

Frontiers in Artiﬁcial Intelligence 05 frontiersin.org

Zamprogno et al. 10.3389/frai.2023.1223251

FIGURE 5

Correct placements (left) and wrong placement (right).

3.2 Empirical study scenario

In collaboration with the Technical University of Berlin (TUB)

and Airbus Central R&T, Hamburg, an empirical study was

developed and performed in order to obtain the training and

test sets.

Altogether, 40 participants were acquired in two study phases,

where one was dedicated to gather data for the later training

and one for the later test set. Participants were recruited among

students and PhD students at TU Berlin. In the first study phase,

31 participants were acquired (13 female), with a mean age of 21.53

years (SD = 3.21), ranging from 21 to 34 years. In the second study

phase, 9 participants were acquired (6 female), with a mean age of

26.11 years (SD = 4.91), ranging from 20 to 35 years.

All participants signed an informed consent and were

compensated with miniature aircraft models provided by the

company and course credit, where applicable.

Each participant was presented with a virtual implementation

of the Tangram puzzle running on a desktop computer, consisting

of one of the four puzzles on the left of Figure 2. A screenshot

is shown on the right of Figure 2. After an initial explanation of

the controls, all of which were performed with the mouse, each

participant was required to tackle the problem of the puzzles, which

were always shown in the same order. There was no time limit, and

at any moment a NEXT button was available, so that the player

could give up on the specific Tangram and move to the next one.

While backtracking was possible while working on a single task,

once the button was pressed the solution was submitted and no

option to go back was available.

The screen recording and application logs, including data

recording times, piece action types (rotation, movement) and piece

positioning were stored for the analysis. For simplicity, in the

sequel, we will only discuss the HOUSE and MONK.

4 Cognitive principles

Humans seem to make assumptions and apply a variety of

heuristics that guide their decisions, which are heavily context

dependent. We call the generalization of these observed heuristics

cognitive principles. The relevant cognitive principles that address

the observations made while watching the videos from the training

data of the Tangram task introduced in Section 3 will be introduced

in this section.

4.1 Affordance

In Human-Computer Interaction, immediately perceived

possible actions are often referred to as affordances (Norman,

2002), and originate from Gibson (1979). The solving process

for Tangrams across participants seems to be strongly guided by

such immediately perceived possible actions. In particular it seems

that participants first consider features in the silhouette that are

particularly similar to the available piece. We call this the BEST FIT

principle.

Let us illustrate this observation by considering the training

data from the HOUSE and the MONK Tangram: The heatmap

in Figure 3 left shows that the pattern {,△B,2}appears in the

majority of activities in the first steps. This pattern, intended as

the unordered set of chosen pieces, would compose the upper

section of the house in the puzzle’s solution (see Figure 4, left): the

sequence with which the pieces were chosen varies, but an initial

analysis shows that 47% of participants presented this pattern by

step 4, rising to 77% for pattern {△B,2}.Figure 3 right shows a

heatmap produced by the training data for the MONK Tangram,

where pattern {,△S}is present in 60% of participants by step 4.

This pattern would compose the head of the monk in the puzzle’s

solution (see Figure 4, right).

A common feature of the patterns {,△B,2}and {,△S}, is that

the shape of the area that the chosen pieces are covering clearly

resembles (or coincides with) some parts of the shape of the object

itself. Screen recordings showed that these combinations of actions

were shared among the participants, especially during the early

phases of the solution (see Section 4.4 for the PHASES OF SOLVING

principle). While the specific sequence of steps might vary, when

considering the unordered set of steps, the same patterns of action

combinations can be found in the majority of the participants.

4.2 Initial placement errors

Best fits which require the composition of more than one

tan are not always recognized in the initial phase. Consider the

Frontiers in Artiﬁcial Intelligence 06 frontiersin.org

Zamprogno et al. 10.3389/frai.2023.1223251

TABLE 1 Action frequencies of big triangle at initial phase.

Grid value Rotation Choice in %

7 45 10

7 90 5

8 90 2.5

8 270 2.5

9 180 2.5

10 225 2.5

12 45 2.5

12 135 10

12 180 2.5

13 270 47.5

14 90 2.5

14 180 2.5

17 135 7.5

FIGURE 6

Grid layout of MONK.

placement in Figure 5 left, where the two big triangles need to be

composed into a huge triangle in order to perfectly fit the “belly”

of the MONK. Quite consistently participants tend to first place a

big triangle rotated so to match the rotation of the larger silhouette,

aligned along one of the available edges as in Figure 5 right. They do

not seem to notice the possibility of a perfect fit by the arrangement

of the two triangles into the bigger one.

This can be observed in the starting phase of the MONK

Tangram: Table 1 shows the frequencies of action taken by

participants in the first 4 moves and it can be noticed how

47.5% of participants’ actions involve big triangles rotated 270

degrees and put at location 17 in Figure 6,2representing the

incorrect positioning, against a combined 20% from the correct

placements. In the sequel, we will call observations of this kind the

UNRECOGNIZED COMPOSITION principle.

4.3 Backtracking

The concept of backtracking involves any action or strategy

aimed at removing or replacing tans that have already been

moved into the silhouette. In a preliminary assessment of the

data, we can observe two types of backtracking. The unfeasible

region backtracking occurs when a region in the silhouette

is considered unfeasible, i.e. it clearly cannot host any of the

available pieces. Noticing the presence of such region clearly

hints at the unfeasibility of the current partial solution and thus

triggers backtracking. See Figure 7 for two examples. This type of

backtracking should involve one (or all) the pieces creating the

region, i.e. the pieces bordering with it. It is important to specify

that this strategy triggers when the unfeasible region is noticed, not

produced. It can thus be the case that further action are taken after

its creation, but when the participant backtrack, they would first

remove the problematic piece, possibly allowing following actions.

This observation is called the UNFEASIBLE REGION BACKTRACK

principle.

The second type of backtracking occurs even if no problematic

regions are currently present. Participants might still decide to

backtrack anyway. Possibly, the participants might have noticed

that any further placement will trigger an unfeasible region, or that

a previously placed tan did not result in a simple and clear path

to the solution. In this case, there is not a clear candidate piece to

be backtracked, but usually it can be expected that the piece that

is not a BEST FIT might be a possible choice for backtracking. This

observation is called the PIECE BACKTRACK principle.

4.4 Phases of solving, random search and

aha moments

The aforementioned principles appear to be present at different

points in the solution process. Particularly, a common trend for

a player is to start by exploiting the BEST FIT principle (see

Section 4.1) until a mistake is made and an unfeasible region is

found. In the analyzed Tangrams this often happens due to mistakes

that can be traced back to the UNRECOGNIZED COMPOSITION

(See Section 4.2 on initial placement errors). Backtracking follows,

2 Here, the correct placement of the triangle is deﬁned by a discrete value

in the 5 x 5 grid determined according to where the center point of its

hypotenuse is located. For the big triangle this placement is shown in the

Monk Tangram by the small black cross just above the horizontal line in

location 17 in Figure 6. It must be noted that small pixel differences can

inﬂuence grid values, thus introducing a noise component in the system.

Frontiers in Artiﬁcial Intelligence 07 frontiersin.org

Zamprogno et al. 10.3389/frai.2023.1223251

FIGURE 7

The uncovered areas in the HOUSE (left) and MONK (right) are unfeasible regions.

possibly combined with repetitions of combination mistakes, until

a particular action is taken, which might trigger an aha moment

[AHA MOMENT principle, see also Schulte (2005)]. The core

strategy of the second phase was described as “random search”

[RANDOM SEARCH principle, see also Wilson (2002)]: lacking

clearer affordances, Aha moments can be often clearly noticed

by video observations, as the search phase is followed by a

very quick solution obtained by BEST FIT. Trials following these

steps can thus be generally split in three phases: the starting

phase, where BEST FIT is dominant, exploration phase, showing

a combination of UNRECOGNIZED COMPOSITION,RANDOM

SEARCH possibly including PIECE BACKTRACK and UNFEASIBLE

REGION BACKTRACK (see Section 4.3), and eventually, the final

phase, mostly including AHA MOMENT, characterized once more

by BEST FIT.

In cases where the solution is accomplished by the minimal

number of necessary steps, the starting phase and the final phase

are present, but the exploration phase is missing.

5 Cognitive Tangram Solver

We believe that the best way to understand how humans

address sequential problem solving, or in particular, how they

solve Tangrams, is by building a computational system which

simulates their behavior. Therefore, we have developed the

Cognitive Tangram Solver (CTS) framwork, whose aim is to

behave as a human Tangram player. Given this objective, it

was evident to implement the decision making process in a

cognitive architecture. Cognitive architectures, especially in the

form of hybrid architectures (combining symbolic and subsymbolic

methods), can represent an interesting middle-ground approach

to modeling the cognitive principles introduced in Section 4. The

goal is to reproduce computationally the mechanisms of human

cognition, using such mechanism as foundation and justification

for the intelligent behavior of the system (Lieto et al., 2018). The

theoretically founded nature of cognitive architectures imposes

constraints and limits to the modeling process, but it provides

also inherent explicability and plausibility properties once the

system is functional. We chose the cognitive architecture ACT-R,

as it provides a wide range of functionalities, is well established

within the scientific community and has a very well documented

manual (Bothell, 2022) including an extensive tutorial.

While solving a Tangram puzzle relies heavily on the visual

perception (such as affordances, see Section 4.1) there is no such

straightforward way to extract elaborate geometric shapes in an

acceptable time through the visual module in ACT-R. Therefore, an

additional visual-system module outside of ACT-R was developed

to cover this functionality.

Figure 8 gives an overview of the different modules and

their functions in the Cognitive Tangram Solver framework.

The coordinator module (Section 5.1) handles the interaction of

multiple sub-components (Section 5.2-5.4). These sub-components

are distinct and provide various functionalities: the Tangram

application module (Section 5.2) creates and updates the empirical

study window, the visual-system module (Section 5.3) extracts

“plausible” ACTION-OPTIONS in the current state and the cognitive

model module (Section 5.4) performs the main reasoning task based

on which it selects the next action.

The implementation of the Cognitive Tangram Solving and

two videos that show the application in action can be found here:

https://github.com/Thezamp/tangram-solver.

5.1 Coordinator

The coordinator is an application written in python, which

is central as it provides the interfacing features between the

other modules. It initializes the various sub-components, keeps an

internal representation of the puzzle state and updates it according

to the ACTION-OPTION chosen by the cognitive model.

5.1.1 ACTION-OPTIONS

An ACTION-OPTION is the interfacing element between the

visual system, the training data and the cognitive model and based

on the following assumptions: (1) cognitive principles have a strong

influence on the decision making in Tangram (see Section 4) and

(2) they are not all perceived and processed by humans in the same

way and influence their decisions differently.

During Tangram solving, multiple cognitive principles might

possibly influence different available actions. In order to abstract

away from the single influence of one cognitive principle to one

action, we introduce the concept of a unitary action choice, the

ACTION-OPTION:

Frontiers in Artiﬁcial Intelligence 08 frontiersin.org

Zamprogno et al. 10.3389/frai.2023.1223251

FIGURE 8

Overview of the Cognitive Tangram Solver framework.

Action-Option. An area of the silhouette defines an action-

option when the edges limiting such area can be (partly) overlapped

with the edges of a given piece. Such action-option is then

characterized by 1. its location, 2. the matching piece type and 3. a

strength value representing the influence of the respective cognitive

principle(s).

Besides the first two features that are mostly involved in

the identification of the ACTION-OPTION, the third feature,

the strength value allows comparing among different ACTION-

OPTIONS.

The application of ACTION-OPTIONS to the Tangram solving

strategy is as follows: At each step, the ACTION-OPTION with the

highest strength defines the location and the tan TAN that best

fits to this location, as most probably chosen by the player. This

action is then expressed by the tuple (ACTION-OPTION,TAN). If no

ACTION-OPTION with fitting TAN can be chosen, backtracking is

necessary. In this case, from all previous actions, (ACTION-OPTION,

TAN), the TAN that has the lowest frequency in training data will be

chosen.

5.1.2 State representation

In order to provide interfacing functions, the coordinator

maintains an internal “virtual” representation of the current state

by storing the list of the chosen (ACTION-OPTION,TAN) still

active (in the sense of actions taken and not backtracked) on

the silhouette (i.e. the Tangram puzzle), together with the list of

currently available ACTION-OPTIONS.

When the cognitive model responds with an ACTION-OPTION,

the coordinator converts it into an actual action by creating

the tuple (ACTION-OPTION,TAN) and updating the state as a

consequence. This is required by the presence of tans of multiple

types: while in principle an ACTION-OPTION already includes the

involved piece type, it is necessary to keep track of which individual

piece is still free, in order to avoid unintentional backtracking or

re-using of a placed tan.

5.1.3 Updating actions

While the cognitive model module will provide

the reasoning and eventually choose the ACTION-

OPTION at each step, the processing and implementation

into an actual action, (ACTION-OPTION,TAN), is

then delegated to the coordinator. Methods are thus

present to implement both updating and backtracking

functionalities.

5.2 Tangram application

The empirical study window was derived by an existing

Tangram demo application and included interactive features which

allowed the users to manipulate and move the tans. While such

features were not required in the developed model, as no simulation

of motor functions was planned, most of the data gathered during

the empricial study was framed in the context of the application.

Frontiers in Artiﬁcial Intelligence 09 frontiersin.org

Zamprogno et al. 10.3389/frai.2023.1223251

As a result, the same window environment was maintained for

better correspondence, and the model runs on an adaptation of

the code for the original empirical study, providing just two main

functionalities: it updates the window to represent the current state,

and it captures the window screen so that it can be forwarded to the

visual system for processing.

5.3 Visual system

An automated way to extract the ACTION-OPTION from the

current puzzle’s situation (or current context) from the empirical

study window was required as hand-crafting the ACTION-OPTIONS

for each available state was unfeasible due to the exponentially

growing number of alternatives.

As a consequence, an external implementation based on

classical computer vision techniques is proposed, its functionalities

are limited by the following requirements in order to provide

meaningful results: the algorithm should work on the silhouette

edges, the algorithm should extract ACTION-OPTIONS that are

plausibly exploited by humans, and the algorithm must associate

an evaluation of the ACTION-OPTION’s strength upon extraction.

The visual system was thus implemented in order to represent

a structure cognitively inspired by the hierarchical structure of the

human visual system. Given the strictly geometrical nature of the

puzzle, and the fact that neurons in the primary visual cortex fire

when matching certain oriented lines in the visual field (Hubel,

1982;Loffler, 2008), the algorithm simulates an hypothetical higher

level construct able to identify certain line patterns representing the

shapes.3This was done via a pattern matching function applied on

the edges of the current state and the template, using the sum of

squared differences as similarity function.

Considering the binary nature of the images, a sum of

absolute differences or a custom function might have been possible

alternatives. However due to the amount of templates to match (for

each shape and for each of its available rotations, which are discrete

steps of 45◦) the faster opencv (Bradski, 2000) implementation

was preferred. Even though it is limited in the similarity functions

options, it still provides acceptable results.

For each template, up to five candidate placements4are

extracted (parameter empirically chosen). These are then filtered

for plausibility in two successive steps. First, it is checked whether

the placement would intersect with other pieces currently placed,

which can happen as the similarity function on edges might still

tolerate limited intersection of corners. The second screening

chooses only the placements that have some representation in the

training data at the current phase of the solution.

The process is illustrated in Figure 9: the empirical study

window returns the state of the puzzle and the image is binarized

and its edges are extracted. For each possible rotation of each

3 However, the intention was not to reproduce participants’ visual behavior

such as processing times but to provide a model with the ability to recognize

shapes.

4 Please notice that in the currently available version of the framework,

such candidates are named “landmarks” and not “action-options”. This is due

to a previous version of the framework which had a focus on the “technical”

implementation, and will likely be changed in future revisions of the code.

tan (here the small triangle at rotation 0 is shown, Figure 10

additionally shows the results for the other rotation angles), some

potential matches are extracted. Finally the candidates are filtered

for plausibility. The filtering also tackles an additional aspect: once

an ACTION-OPTION is found its strength must be defined, which in

turn will determine the baseline activation5of the ACTION-OPTION

in the model.

The following equation takes into account the strength of the

extracted ACTION-OPTION, based on similarity scores and the

available data:

strengthi,j=kd∗fi,j,phase +kcv ∗si(1)

where

strengthi,j:strength score for extracted ACTION-OPTION i for

puzzle j

kd:modeller-defined weight parameter for data

information

fi,j,phase :relative frequency of choice of ACTION-OPTION i

at current phase in participants data for puzzle j

kcv :modeller-defined weight parameter for similarity

information

si:similarity score from pattern matching algorithm,

normalized between 0- perfect match and

1- maximal dissimilarity

The values kdand kcv have been chosen empirically while

performing the training. Motivated by the common assumption

that the human’s visual working memory is limited (Miller, 1956),

only the six strongest ACTION-OPTIONS define the imaginal6buffer

to provide context and spreading activation.7The number of

ACTION-OPTIONS considered was chosen in order to be close to

the 7 ±2 from Miller’s findings, but could probably be tuned as a

parameter in further improvements of the framework.

It is thus possible to notice how the current implementation

of the visual system has a particular focus on the aforementioned

principle of Affordance (Section 4.1) coded as explicit pattern

matching for the pieces. Nonetheless, frequency data from

participants are also included in order to provide a proxy for the

other principles.

Finally, the visual system attempts to recognize unfeasible

regions (see Section 4.3, UNFEASIBLE REGION BACKTRACK

principle). In principle, such regions are parts of the silhouette in

which no tan can be placed respecting the rules. At the current

state, they cannot be matched directly as they come in various

5 In the ACT-R architecture, this represents the likelihood that a given piece

of information is extracted from the declarative memory when a retrieval

process is triggered (i.e. when asking to retrieve a “geometrical shape”,

retrieving a square might be more likely than retrieving an hexagon).

6 ACT-R representation of information chunks related to the current task

and context.

7 ACT-R mechanism that allows contextual information to inﬂuence

retrieval: chunks in the imaginal buffer related to context and task increase

the activation of information chunks in the declarative memory beyond the

baseline activation (Bothell, 2022, p.290ff).

Frontiers in Artiﬁcial Intelligence 10 frontiersin.org

Zamprogno et al. 10.3389/frai.2023.1223251

FIGURE 9

Steps of ACTION-OPTIONS extraction for small triangle at rotation 0.

FIGURE 10

Results of the pattern matching for small triangles rotations, before intersecting with data.

different shapes and sizes, but a property of the task can be exploited

to identify some of them: the silhouette area must eventually be

fully covered and all available pieces used. Considering this, if at

any point in time no available placement is found for a piece, it

means that its area is split between two or more separate regions

that cannot accommodate it. In such cases, the presence of a

problematic region is found and the coordinator will tag the next

action as uncertain.

5.4 The cognitive model

The ACT-R model is tasked with choosing the next

action to take at any given position, and it is mostly

based on the baseline activation and spreading activation

mechanisms. It mainly involves the goal module, the

imaginal module, the declarative module and the procedural

module.

Frontiers in Artiﬁcial Intelligence 11 frontiersin.org

Zamprogno et al. 10.3389/frai.2023.1223251

After the processing from the visual system the current

available action-options are extracted (Figure 11, left), added to the

declarative memory (Figure 11, bottom left) and the six strongest

are loaded into the imaginal buffer (Figure 11, bottom middle). In

this context, the imaginal buffer represents the “most noticeable”

action-options, and helps their retrieval by the spreading activation

mechanism through the respective production rule (as an example

consider the production rule in Figure 11, bottom right). The

main task of the module is to retrieve an action-option from the

declarative memory and forward the decision to the coordinator.

Using the spreading activation mechanism, the action-options

currently in context will have the highest chance of being retrieved.

Retrieved action options can trigger either one of two different

production rules. If it involves a piece, then the action-option will

be validated and forwarded for the status update. The alternatives

are that either a unfeasible-region chunk is retrieved, or that a

retrieval error happens as there is no available action-option, or due

to noise and low strength no action-option was strong enough to be

retrieved.

These two events correspond to the two types of observed

backtracking explained in Section 4.3 and respectively trigger

UNFEASIBLE REGION BACKTRACK and PIECE BACKTRACK action

lines.

The interaction between the two types of backtracking should

be noted: in some cases, a given incorrect placement of a piece

does not create any recognizable unfeasible-region, but will cause

all the following actions to be problematic. This will not be

noticed by any of the two individual productions, but it will

eventually be possible to backtrack by their combination. Due to

the constraint on :recently-retrieved nil,8after various iterations

of the region_backtracking with no solution, no new action-

options will be available. At this point the piece_backtracking

will be triggered, which will remove the weakest action, which

has been stored separately when the action was taken. At

later phases of the solution, the problematic ACTION-OPTIONS

will be less frequent, as more participants will actually have

solved the puzzle, and its strength in the model will thus be

lowered.

It must be noted that simplifications are also implemented in

order to deal with UNFEASIBLE REGION BACKTRACK: as the model

is currently not able to directly identify an unfeasible region, it

instead relies on tagging chosen actions that become problematic.

Thus the implementation of the strategy will differ from the

suggested one by removing the first noticed action causing the

problem and then proceeding in a queue-based manner until it

is solved: a particular (production) rule will fire in the cognitive-

model that will set the focus on solving the noticed issue until is not

present anymore.

A slight simplification is also present with PIECE BACKTRACK:

it will generally work as described, backtracking the action related

to the weakest chosen action-option, but its strength will be

determined uniquely by the frequency data and not any geometrical

representation.

8 Parameter that will avoid the retrieval of the same action-option (and in

general, informtion chunk) multiple times in a row.

6 Evaluation

Due to the relative novelty of the topic, there are no clear

metrics or benchmarks on how to evaluate the Cognitive Tangram

Solver performance on given human performance data. Generally

it seems a challenging task to find good evaluation methods

for cognitive models as they consider different aspects that are

relevant (e.g. standard deviation, response times). In the case of

the Cognitive Tangram Solver, the focus is to imitate the solving

process applied by humans in this puzzle setting, including their

backtracking strategies, in order to anticipate their next steps.

However, the absolute order of steps is often not relevant. Taking

as an example the {△B,,2}pattern in the solution of the HOUSE

Tangram, it is possible to identify six possible sequences that can

lead to the same pattern, any of which would be a desired behavior

if shown by the model. A second challenge is that in the current

implementation the model cannot distinguish between imagining

and doing an action: a participant might realize while hovering

over an area that the intended action will cause a problem region,

and thus change the current placement (still considering as a

single step). The Cognitive Tangram Solver does not implement

“immediate thinking” and needs to actually conclude an action

before possibly backtracking it. This is likely to cause longer

solution procedures in case of mistakes.

In this section, we aim at providing the best possible

picture of the behavior of the Cognitive Tangram Solver in

relation to the human data. We first propose three variations

of the CTS in Section 6.1. After that, Section 6.2 introduces

four evaluation methods, including classical statistics and self-

developed methods that focus on the solving process by

humans, and provides the results with respect to the CTS

variations.

6.1 Variations of the Cognitive Tangram

Solver

The CTS framework includes a large number of parameters

and design choices that can be optimized and discussed. Due to

the exploratory nature of this work, and considering the additional

need to identify acceptable metrics for evaluation, we decided to

limit the analysis to few controllable and significant parameters.

As a result, the CTS variations mostly differ on the relative

weight of the parameters defining the strength of the ACTION-

OPTIONS:

Vision-CTS The strength of an ACTION-OPTION is mainly

defined by the template matching error function, applied by the

template matching algorithm described in Section 5: the lower

the error, the stronger the ACTION-OPTION. This implies a high

influence from the BEST FIT principle.

Frequency-CTS The strength of a ACTION-OPTION is mainly

guided by the participants data. ACTION-OPTIONS are extracted

via template matching as described in Section 5, but the training

set is the main contributor to the successive validation. This

approach implies a high influence from the other principles.

Balanced-CTS The strength of a ACTION-OPTION is derived

both by the data and the template matching. The frequency of

Frontiers in Artiﬁcial Intelligence 12 frontiersin.org

Zamprogno et al. 10.3389/frai.2023.1223251

FIGURE 11

Example of action-options extraction in the visual system and their role in the model.

choice suffers a penalty depending on the size of the template

matching error. This approach is a combination of vision-CTS

and frequency-CTS.

6.2 Methods and results

The evaluation was done for the HOUSE and the MONK

tangram. The first three evaluation methods give us insights on

how closely the Cognitive Tangram Solver reproduces data similar

to the ones in the empirical study. For each of these versions, CTS

ran 30 times and the runs where aggregated. The activation noise

specified in ACT-R caused the variations. We will briefly describe

each method in more detail and then show their results.

6.2.1 Overall statistics

The mean and standard deviation of the number of steps to

reach the solution is calculated and compared to the participants’

data to evaluate compatibility. This is expected to be most affected

by the difference between “imagining” and “doing” an action.

Additionally, the ratio of “perfect solutions” is compared: to bypass

the issue with imperfect backtracking, it is evaluated how often

the participants and the model manage to solve the puzzle in the

minimal number of steps (7, one per tan).

Table 2 shows the overall statistics. In this and the following

tables, the evaluation value with respect to the test set is in brackets

next to the evaluation value with respect to the training set.

Comparing the mean and standard deviation of the participants’

data and the three models, all three models seem to lie within the

range of the training and the test set values. The last column shows

the perfect strategy ratio, which is the fraction of participants that

solved the puzzle with the minimum amount of steps (seven). It

seems that there is quite a discrepancy between the training and

the test set for both Tangrams. Additionally, the percentages seem

to suggest that none of the puzzles was significantly more difficult

than the other: for the HOUSE Tangram, 43% of the participants

applied a perfect strategy, but in the test set it was only 22%. In the

case of the MONK Tangram, the discrepancy between the training

and the test set is not as high. 27% of the participants in the training

set applied a perfect strategy, whereas in the test set, the percentage

is slightly lower with 22%. When considering both sets, on average

38% and 25% of the participants applied the perfect strategy for

HOUSE and MONK, respectively. Interestingly, the balanced-

CTS’s performance is similar to the participants’ performance for

HOUSE tangram, which is not the case for the MONK tangram. For

Frontiers in Artiﬁcial Intelligence 13 frontiersin.org

Zamprogno et al. 10.3389/frai.2023.1223251

TABLE 2 Overall statistics for total steps counts (test set values in brackets).

Tangram Type Mean Standard deviation Perfect strategy ratio

HOUSE Training (test) 10.8 (8.9) 6.9 (2.3) 0.43 (0.22)

Balanced-CTS 10.8 5.4 0.32

Frequency-CTS 10.3 5.1 0.48

Vision-CTS 10.2 5.5 0.58

MONK Training (test) 12.4 (12.7) 6.5 (5.1) 0.27 (0.22)

Balanced-CTS 13.1 6.5 0.42

Frequency-CTS 11.4 5.7 0.35

Vision-CTS 14.2 7.0 0.45

both tangrams, the vision-CTS performs the best regarding the

perfect-strategy ratio but also diverges most from the participants’

performance. A difference between the groups is noticeable despite

them still being comparable. This suggests a consistent variability

between different participants, and might reduce the informative

value of these statistical results. Nonetheless, by considering the

mean and standard deviation, it is still possible to infer some

degrees of accordance with the performances of the models which

still show a degree of alignment to the data sets, especially in the

case of HOUSE Tangram. The ratio of perfect solutions appears

instead to be widely ranging even among the participants, and

might thus not offer useful insights in the evaluation.

6.2.2 Heatmaps comparison and choice error

During the analysis of data, heatmaps were produced which

showed the relative frequency with which each tan type is chosen

at each step. Even from a purely visual inspection, the heatmaps

allow to consider similarities in the solution patterns. The data

from model runs is thus similarly processed, and the results

are compared. Following Schunn and Wallach (2005) we do not

consider the X2or ANOVA, but use the root mean square

error (RMSE) as the evaluation metric. The RMSE between the

histograms at each step is calculated and averaged as follows:

RMSEs=v

p=1

(h(p,s, model) −h(p,s, data))2

5(2)

where sis the step, pis the piece-type identifier, hs(p, model) is the

frequency histogram value of piece-type pin the model, at step s

and hs(p, data) is the frequency histogram value of piece-type pin

the data, at step s.

The third column in Table 3 shows the heatmap RMSE and

choice error. The heatmaps shown in Figure 12 give additional

insights to their RMSE value. The horizontal axis denotes the step

and the vertical axis denotes the tan that was chosen at this step.

The darker the color of the bock, the higher the percentage of

participants who have chosen the respective tan at that step. At

first glance, the heatmaps of the data for both the HOUSE and

MONK Tangram, are very similar to the heatmaps produced by

the models. For the HOUSE tangram (heatmaps on the top left

of Figure 12) a tendency to use the big triangles in the initial 1–

4 steps and to use the small triangles in the last 5–10 steps can

be observed in all heatmaps. The heatmap of the training data

(left) shows lighter colors for these steps which indicates a more

heterogenous group of the individuals. For the MONK tangram

(four heatmaps in the bottom of Figure 12) a less strong tendency

can be observed as the used tans among the different steps is more

distributed.

The evaluation methods presented so far are widely

used and helpful to compare the performance of different

approaches. However, they do not give us any insight on

whether CTS uses similar solving strategies as the humans,

let alone how well CTS anticipates the humans’ next steps.

For this purpose, we developed two additional evaluation

methods, the step-by-step states plausibility and the predictive

accuracy. An elaborate discussion on their role will be given in

Section 7.

6.2.3 Step-by-step states plausibility

First, we consider the plausibility of how the puzzle state evolves

with the model’s actions. In this regard a state consists in the

list of (grid_loc, rotation) values for each tan that is currently

placed inside the solution grid. The steps from 3 to mean +1∗

sd (varying on the puzzle) are considered, trying to capture the

majority of the available data. Each state in the model runs is

compared with the user states at step ±offset, where offset is a

flexibility parameter increasing with the number of steps, in order

to exclude mismatches due to longer backtracking. The overall

accuracy of the model is then computed by considering the states

presenting matches with the data over the overall number of model

states.

The fourth column in Table 3 shows the step-by-step states

plausibility. The plausibility values reflect whether the states

obtained by the model at a given step also appear among

the users’ states around the same step, trying to provide a

performance scoring that captures the plausibility of the model’s

strategy. It is possible to notice how generally all the models

perform worse for the HOUSE Tangram, and significantly worse

for the MONK Tangram. Predictably, the frequency model has

generally high training-set performances, but it is interesting

to notice how the vision model has instead relatively higher

performance in the test set, suggesting once more how an

affordance based on shape similarity might be guiding the

strategy.

Frontiers in Artiﬁcial Intelligence 14 frontiersin.org

Zamprogno et al. 10.3389/frai.2023.1223251

TABLE 3 Heatmap RMSE, plausibility and prediction accuracy of the three CTS variations (test set values in brackets).

Heatmap RMSE States plausibility Prediction accuracy

Tangram Type training (test) training (test) training (test)

HOUSE balanced-CTS 0.14 (0.18) 0.46 (0.34) 0.50 (0.50)

frequency-CTS 0.15 (0.19) 0.53 (0.42) 0.38 (0.43)

vision-CTS 0.12 (0.17) 0.47 (0.38) 0.50 (0.35)

MONK balanced-CTS 0.15 (0.18) 0.59 (0.19) 0.41 (0.30)

frequency-CTS 0.16 (0.19) 0.59 (0.16) 0.39 (0.39)

vision-CTS 0.16 (0.19) 0.57 (0.20) 0.46(0.39)

The highest values are highlighted in gray.

6.2.4 Prediction accuracy

This last proposed evaluation metric also considers contextual

information, and compares the model’s actions in the context of

the game sequence. It defines an accuracy evaluation when the

framework is used as a predictor, predicting the next step when

any participant is solving the Tangrams. By minimal changes in

the CTS, it is possible to have it predict the plausible next action

at any given step during a participant’s solution process. Trying

to predict the exact following move would likely be too restrictive:

consider as an example the roof pattern in the HOUSE tangram as

shown in Figure 4 left. Once the big triangle is placed, at the current

state it would be hard to distinguish the reasoning for which the

square would be placed before the parallelogram, or vice versa. As

a consequence, a prediction is accepted as valid if the suggested

action happens in the following 2 steps.

The last column in Table 3 shows the prediction

accuracy. The predictive accuracy evaluates the anticipatory

performance of the model by running the trials of the

participants and having the model predict the following

move (accepting up to two steps forward as a match, due

to the lower importance of strict sequences). Once more,

different models perform differently in the two Tangrams,

with the vision model having overall better training set

accuracy, and with the frequency model improving in test set

accuracy.

It is generally possible to notice a certain degree of overfitting,

which can be expected when including the training data, which

is still mostly limited to 5-10 percentage units. While the models’

performances have a significant variation between the Tangrams,

in this aspect the frequency-CTS seems more accurate.

6.2.5 Results summary and discussion

Summarizing the observed results, no system shows

consistently better results with respect to the others, neither

within nor between the different puzzles. It is still possible to

notice how the vision-CTS has the best performance in most cases,

which could point toward a certain importance of the affordance

principle.

It must be noted how the three models only focused on a limited

number of variable parameters (specifically, the relations between

data and similarity score for the strength in ACTION-OPTIONS), but

a more comprehensive study could probably benefit from extending

the scope to other evaluations too.

In any case, as the framework was meant to be considered

a proof-of-concept for the application of the described cognitive

principles, we suggest for these results to be seen as a basic

benchmark, to be used and expanded upon for future attempts and

studies.

7 Conclusion and future work

The intended contributions of the presented approach, which

were presented in the introduction, can be summarized as: (a) an

empirical study to an adequate problem solving geometrical task,

(b) A novel hypothesis for human strategies for this task, and

predict human behavior in the solution process. The first aspect

is addressed by the design, the analysis and the results of the

Tangram empirical study in Section 3. The second aspect to define

hypotheses for human strategies, so-called cognitive principles, is

addressed in Section 4. Finally, the third aspect is provided by

the Cognitive Tangram Solver itself, which is a hybrid framework

that integrates object recognition and cognitive modeling. This last

aspect also includes an evaluating part, i.e. in how far the CTS is

able to reproduce the strategies and predict human behavior in

the solving process. This is addressed by the last two proposed

evaluation methods in Section 6, the step-by-step stages plausibility

and the prediction accuracy. Overall, even though CTS does not

fully understand common mistakes, it can weakly anticipate the

human’s next steps: when CTS backtracks, we can assume that

humans also has difficulties in finding a solution immediately.

Finally, as discussed in the introduction, the overall objective

we are aiming at is the development of systems that can interpret

the state of a task, in particular when there is no definite solution

path. Even though the final goal is not known, the system needs

to have a sense of anticipating difficulties and pitfalls, in order to

predict individuals’ traces of action and optimally support them

whenever needed. As this objective contains a variety of highly

demanding challenges, we have only addressed some of these

challenges in a very controlled environment. The work presented

offers a proof-of-concept showing the potentiality of the approach

and justifies further studies in the field.

The presented approach can be improved in the future

considering several aspects. It necessarily renounces some of

Frontiers in Artiﬁcial Intelligence 15 frontiersin.org

Zamprogno et al. 10.3389/frai.2023.1223251

FIGURE 12

Heatmaps of the participants’ data for HOUSE (left) and MONK (right) compared to the heatmaps produced by the three models.

the principles for practical implementation, a more fine-grained

analysis that involves the strength definition should be considered.

In the specific case of this empirical study the available data

were strictly dependent on the implementation of the empirical

study window: the coordinates for the placements of the tans

were expressed with respect to the empirical study window,

and the tans themselves were defined as shapes within the

framework. Even though most cognitive architectures provide

a model of visual mechanisms, it seems that reproducing the

setting purely within a architecture would likely have caused

loss of precision in the shapes and coordinates definition,

besides an additional overhead due to the lack of established

methods for complex image manipulation. The development

of visual recognition systems within cognitive architectures

Frontiers in Artiﬁcial Intelligence 16 frontiersin.org

Zamprogno et al. 10.3389/frai.2023.1223251

would be beneficial for the implementation of these types of

tasks.

Data availability statement

The datasets presented in this study can be found in

online repositories. The names of the repository/repositories and

accession number(s) can be found below: https://github.com/

Thezamp/tangram-solver.

Ethics statement

The requirement of ethical approval was waived by Ethics

Committee of the Institute for Psychology and Ergonomics (IPA)

at Technische Universität Berlin for the studies involving humans

because Ethics Committee of the Institute for Psychology and

Ergonomics (IPA) at Technische Universität Berlin. The studies

were conducted in accordance with the local legislation and

institutional requirements. The participants provided their written

informed consent to participate in this study. Written informed

consent was obtained from the individual(s) for the publication of

any potentially identifiable images or data included in this article.

Author contributions

GZ implemented the original model of the CTS and performed

the extraction and analysis of the model results, defining together

with ED the metrics to be used. ED provided support and

supervision during the model development and results analysis,

worked as main reviewer for the manuscript, and contributed

to most of its structure. LH organized the human experiment

and gathered the data, analyzed the demographics of participants,

contributed in the review of the results, and additionally she curated

the bibliographic aspects and consistency. NR wrote most of the

introductory sections, was heavily involved in the literary review for

the related works, and also worked as supervisor in the cognitive

aspects of the model. All authors collaborated in the related work

section based on their expertise topics. All authors contributed to

the article and approved the submitted version.

Acknowledgments

We acknowledge support by the German Research Foundation

and the Open Access Publication Fund of TU Berlin. GZ thanks the

University of Bologna and his internal supervisor prof. Giuseppe

Di Pellegrino at the University of Bologna. The authors thank Arnd

Schirmann from Airbus for the support of the empirical study.

Conﬂict of interest

The authors declare that the research was conducted in the

absence of any commercial or financial relationships that could be

construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the

authors and do not necessarily represent those of their affiliated

organizations, or those of the publisher, the editors and the

reviewers. Any product that may be evaluated in this article, or

claim that may be made by its manufacturer, is not guaranteed or

endorsed by the publisher.

References

Anderson, J. R. (2007). How Can the Human Mind Occur in the Physical Universe?

Oxford: Oxford University Press.

Blum, S., Klaproth, O., and Russwinkel, N. (2022). Cognitive Modeling of

Anticipation: Unsupervised Learning and Symbolic Modeling of Pilots’ Mental

Representations. Topics in Cognitive Science. Hoboken: John Wiley & Sons, Ltd.

Bohning, G., and Althouse, J. K. (1997). Using tangrams to teach geometry to young

children. Early Child. Educ. J. 24:239–242. doi: 10.1007/BF02354839

Bothell, D. (2022). ACT-R 7.26+Reference Manual. Available online at: http://act-r.

psy.cmu.edu/actr7.x/reference-manual.pdf (accessed May 15, 2023).

Bradski, G. (2000). The OpenCV Library. San Diego: Dr. Dobb’s Journal of Software

Tools.

Brehmer, B. (1992). Dynamic decision making: human control of complex systems.

Acta Psychol. 81, 211–241. doi: 10.1016/0001-6918(92)90019-A

Chater, N., and Oaksford, M. (1999). The probability heuristics model of syllogistic

reasoning. Cogn. Psychol. 38, 191–258. doi: 10.1006/cogp.1998.0696

Dietz Saldanha, E., and Kakas, A. (2019). Cognitive argumentation

for human syllogistic reasoning. KI -Künstliche Intelligenz 33, 229–242.

doi: 10.1007/s13218-019-00608-y

Dietz Saldanha, E., and Schambach, R. (2020). “A computational approach

for predicting individuals’ response patterns in human syllogistic reasoning,” in

Armstrong. Proceedings of the Annual Meeting of the Cognitive Science Society,

eds. S. Denison, M. Mack, Y. Xu, and B. C. Sydney: Cognitive Science Society,

1247–1253.

Dietz, E., Fichte, J., and Hamiti, F. (2022). “A quantitative symbolic approach to

individual human reasoning,” in Proceedings of the Annual Meeting of the Cognitive

Science Society, eds. J. Culbertson, A. Perfors, H. R. and Ramenzoni, V. Sydney:

Cognitive Science Society, 2838–2846.

Dietz, E., and Klaproth, O. W. (2021). “Towards benchmarking cognitive models: a

python library for modular environment specification and partial model generation in

act-r,” in Proceedings of the 19th International Conference on Cognitive Modeling (ICCM

2021). Vienna: AUT, 50–56.

Frutos-Pascual, M., Garca-Zapirain, B., and Mandez-Zorrilla, A. (2012).

Improvement in cognitive therapies aimed at the elderly using a mixed-

reality tool based on tangram game. Commun. Comput. Inf. Sci. 351, 68–75.

doi: 10.1007/978-3-642-35600-1_10

García Zapirain, B., Méndez Zorrilla, A., and Larrañaga, S. (2010). Psycho-

stimulation for Elderly People Using Puzzle Game. Hong Kong: IEEE.

Gibson, J. J. (1979). The Ecological Approach to Visual Perception. Boston: Houghton

Mifflin.

Gibson, J. J. (2014). The Ecological Approach to Visual Perception: Classic Edition.

London: Psychology Press.

Gomez Cubero, C., and Rehm, M. (2021). “Intention recognition in human robot

interaction based on eye tracking,” in Human-Computer Interaction-INTERACT 2021:

18th IFIP TC 13 International Conference, Bari, Italy. Cham: Springer, 428–437.

Hubel, D. H. (1982). Exploration of the primary visual cortex, 1955-78. Nature 299,

515–524. doi: 10.1038/299515a0

Frontiers in Artiﬁcial Intelligence 17 frontiersin.org

Zamprogno et al. 10.3389/frai.2023.1223251

Johnson-Laird, P. N. (1983). Mental Models: Towards a Cognitive Science of

Language, Inference, and Consciousness. Cambridge, MA: Harvard University Press.

Judd, N., and Klingberg, T. (2021). Training spatial cognition enhances

mathematical learning in a randomized study of 17,000 children. Nat. Human Behav.

5, 1548–1554. doi: 10.1038/s41562-021-01118-4

Kambhampati, S. (2020). Challenges of human-aware ai systems. AI Magazine 41,

3–17. doi: 10.1609/aimag.v41i3.5257

Kirschner, D., Velik, R., Yahyanejad, S., Brandster, M., and Hofbaur, M. (2016).

Yumi, come and play with me! a collaborative robot for piecing together a tangram

puzzle. Lect. Notes Computer Sci. 9812, 243–251. doi: 10.1007/978-3-319-43955-6_29

Klaproth, O. W., Halbrge, M., Krol, L. R., Vernaleken, C., Zander, T. O., and

Russwinkel, N. (2020). A neuroadaptive cognitive model for dealing with uncertainty in

tracing pilots’ cognitive state. Top. Cognit. Sci. 12, 1012–1029. doi: 10.1111/tops.12515

Klein, G., Phillips, J. K., Rall, E. L., and Peluso, D. A. (2007). “A data-frame theory

of sensemaking,” in Proc. of the Sixth International Conference on Naturalistic Decision

Making, R. eds. R. Hoffman. Mahwah: Lawrence Erlbaum Associates Publishers,

113–155.

Klein, G., and Wright, C. (2016). Macrocognition: From theory to toolbox. Front.

Psychol. 7, 54. doi: 10.3389/fpsyg.2016.00054

Knauff, M., and Gazzo Casta neda, L. E. (2021). When nomenclature matters: is the

“new paradigm” really a new paradigm for the psychology of reasoning? Thinking &

Reason. 0, 1–30. doi: 10.31234/osf.io/xz62y

Kosch, T., Wo´

zniak, P. W., Brady, E., and Schmidt, A. (2018). “Smart kitchens

for people with cognitive impairments: A qualitative study of design requirements,”

in Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems

(New York, NY: Association for Computing Machinery), 1–12. doi: 10.1145/3173574.3

173845

Laird, J. E. (2012). The Soar Cognitive Architecture. Cambridge, MA: The MIT Press.

Laird, J. E., Lebiere, C., and Rosenbloom, P. S. (2017). A standard model of the mind:

toward a common computational framework across artificial intelligence, cognitive

science, neuroscience, and robotics. AI Magazine 38, 13. doi: 10.1609/aimag.v38i4.2744

Lee, J., Lee, J. O., and Collins, D. (2009). Enhancing children’s spatial

sense using tangrams. Childh. Educ. 86, 92–94. doi: 10.1080/00094056.2010.105

23120

Lieto, A. (2021). Cognitive Design for Artificial Minds. London, UK: Routledge,

Taylor & Francis Ltd.

Lieto, A., Bhatt, M., Oltramari, A., and Vernon, D.

(2018). The role of cognitive architectures in general artificial

intelligence. Cognit. Syst. Res. 48, 1–3. doi: 10.1016/j.cogsys.2017.0

8.003

Loffler, G. (2008). Perception of contours and shapes: low and intermediate stage

mechanisms. Vision Res. 48, 2106–2127. doi: 10.1016/j.visres.2008.03.006

Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on

our capacity for processing information. Psychol. Rev. 63, 81–97. doi: 10.1037/h0043158

Newell, A. (1990). Unified Theories of Cognition. Cambridge: Harvard University

Press, USA.

Norman, D. A. (2002). The Design of Everyday Things. New York: Basic Books, [New

York].

Oaksford, M., and Chater, N. (2020). New paradigms in the psychology of

reasoning. Annu. Rev. Psychol. 71, 305–330. doi: 10.1146/annurev-psych-010419-05

1132

Polk, T. A., and Newell, A. (1995). Deduction as verbal reasoning. Psychol. Rev. 102,

533–566. doi: 10.1037/0033-295X.102.3.533

Premack, D., and Woodruff, G. (1978). Does the chimpanzee have a theory of mind?

Behav. Brain Sci. 1, 515–526. doi: 10.1017/S0140525X00076512

Rips, L. J. (1994). The Psychology of Proof: Deductive Reasoning in Human Thinking.

Cambridge, MA: The MIT Press.

Rossi, S., Ferland, F., and Tapus, A. (2017). User profiling and behavioral

adaptation for hri: A survey. Pattern Recognit. Lett. 99, 3–12. doi: 10.1016/j.patrec.2017.

06.002

Scharfe-Scherf, M. S. L., Wiese, S., and Russwinkel, N. (2022). A cognitive model

to anticipate variations of situation awareness and attention for the takeover in highly

automated driving. Information 13, 418. doi: 10.3390/info13090418

Schulte, K. M. (2005). Lernen durch Einsicht. Erweiterung des gestaltpsychologischen

Lernbegriffs. 1. Aufl. Forschung Pdagogik. Wiesbaden: VS Verl. fr Sozialwissenschaften.

Schunn, C. D., and Wallach, D. (2005). “Evaluating goodness-of-fit in comparison

of models to data,” in Psychologie der Kognition Reden und Vorträge anlässlich der

Emeritierung Tack, eds. H. von Werner H. Saarbrücken, Germany: University of

Saarland Press, 115–154.

Weidemann, A., and Russwinkel, N. (2021). The role of frustration in human-robot

interaction-what is needed for a successful collaboration? Front. Psychol. 707, 640186.

doi: 10.3389/fpsyg.2021.640186

Wilson, M. (2002). Six views of embodied cognition. Psychonomic Bullet. Rev. 9,

625–636. doi: 10.3758/BF03196322

World Health Organization (2011). World Report on Disability 2011. Geneva:

World Health Organization.

Frontiers in Artiﬁcial Intelligence 18 frontiersin.org