The German Version of the Mobile App Rating Scale (MARS-G): Development and Validation Study [original]

Original Paper

The German Version of the Mobile App Rating Scale (MARS-G):

Development and Validation Study

Eva-Maria Messner1, MA; Yannik Terhorst1, MSc; Antonia Barke2, DPhil; Harald Baumeister1, PhD, Prof Dr; Stoyan

Stoyanov3, MRES; Leanne Hides4, PhD, Prof Dr; David Kavanagh5, PhD, Prof Dr; Rüdiger Pryss6, PhD, Prof Dr;

Lasse Sander7, PhD; Thomas Probst8, PhD, Prof Dr

1Clinical Psychology and Psychotherapy, Ulm University, Ulm, Germany

2Clinical and Biological Psychology, Catholic University Eichstaett-Ingolstadt, Eichstaett-Ingolstadt, Germany

3Creative Industries Faculty, Queensland University of Technology, Brisbane, Australia

4Faculty of Health and Behavioural Sciences, University of Queensland, Brisbane, Australia

5Psychology and Counselling, Queensland University of Technology, Brisbane, Australia

6Institute of Databases and Information Systems, Ulm University, Ulm, Germany

7Department of Rehabilitation Psychology and Psychotherapy, University of Freiburg, Freiburg, Germany

8Department for Psychotherapy and Biopsychosocial Health, Danube University Krems, Krems, Austria

Corresponding Author:

Eva-Maria Messner, MA

Clinical Psychology and Psychotherapy

Ulm University

Albert-Einstein-Allee 47

Ulm

Germany

Phone: 49 07315032802

Email: ev[email protected]

Abstract

Background: The number of mobile health apps (MHAs), which are developed to promote healthy behaviors, prevent disease

onset, manage and cure diseases, or assist with rehabilitation measures, has exploded. App store star ratings and descriptions

usually provide insufficient or even false information about app quality, although they are popular among end users. A rigorous

systematic approach to establish and evaluate the quality of MHAs is urgently needed. The Mobile App Rating Scale (MARS)

is an assessment tool that facilitates the objective and systematic evaluation of the quality of MHAs. However, a German MARS

is currently not available.

Objective: The aim of this study was to translate and validate a German version of the MARS (MARS-G).

Methods: The original 19-item MARS was forward and backward translated twice, and the MARS-G was created. App description

items were extended, and 104 MHAs were rated twice by eight independent bilingual researchers, using the MARS-G and MARS.

The internal consistency, validity, and reliability of both scales were assessed. Mokken scale analysis was used to investigate the

scalability of the overall scores.

Results: The retranslated scale showed excellent alignment with the original MARS. Additionally, the properties of the MARS-G

were comparable to those of the original MARS. The internal consistency was good for all subscales (ie, omega ranged from

0.72 to 0.91). The correlation coefficients (r) between the dimensions of the MARS-G and MARS ranged from 0.93 to 0.98. The

scalability of the MARS (H=0.50) and MARS-G (H=0.48) were good.

Conclusions: The MARS-G is a reliable and valid tool for experts and stakeholders to assess the quality of health apps in

German-speaking populations. The overall score is a reliable quality indicator. However, further studies are needed to assess the

factorial structure of the MARS and MARS-G.

(JMIR Mhealth Uhealth 2020;8(3):e14479) doi: 10.2196/14479

KEYWORDS

mHealth; Mobile App Rating Scale; mobile app; assessment; rating; scale development

JMIR Mhealth Uhealth 2020 | vol. 8 | iss. 3 | e14479 | p. 1http://mhealth.jmir.org/2020/3/e14479/ (page number not for citation purposes)

Messner et alJMIR MHEALTH AND UHEALTH

XSL

•

RenderX

Introduction

Mobile phones are an integral part of modern life. In Europe,

67% of the population owns smartphones, and the number of

smartphone users is rising worldwide [1]. It has been reported

that 30% of Germans have 11 to 20 apps installed on their

smartphones [2]. The use of mobile apps to improve mental

health and well-being is becoming increasingly common, with

roughly 29% of Germans currently using at least one health app

[3].

Globally, between 95 million and 130 million people speak

German, making it the 11th most spoken language worldwide

[4,5]. Elderly individuals and people with basic education are

commonly monolingual in Germany [6]. Yet, these populations

have a high need for assistance in developing and maintaining

health behaviors and could benefit from the use of mobile health

apps (MHAs). A German MHA rating scale could help

researchers and health care providers assess the quality of health

apps quickly and reliably in their mother tongue. Furthermore,

it would be easy to rate a German app with a German scale.

MHAs offer unique and diverse possibilities for health

promotion. They allow ecological momentary assessments [7,8]

and interventions [8,9]. Additionally, they can be used

irrespectively of geographical, financial, and social conditions;

can simultaneously target nonclinical and clinical populations;

and have the capacity to provide diverse health-management

strategies in an ecological setting [10]. Moreover, they support

individuals, including those from high-need high-cost

populations (eg, those with chronic or lifestyle diseases), in

managing their health [9]; reduce help-seeking barriers; and

offer a wide range of engagement options [10].

Despite the recent proliferation of MHAs [11], there are no

universally accepted criteria for measuring and reporting their

quality [8,12]. Therefore, it is necessary to support researchers,

users, and health care providers (eg, physicians,

psychotherapists, and physiotherapists) in selecting high-quality

MHAs. Safe and reliable use of MHAs requires evidence of

efficacy and quality, information about data protection,

information about routines for emergencies (eg, self-harm and

adverse effects), and overall consideration of associated risks

[10].

Boudreaux and colleagues [13] suggested the following seven

strategies to evaluate MHA quality: (1) Review the scientific

literature; (2) Search app clearinghouse websites; (3) Search

app stores; (4) Review app descriptions, user ratings, and user

reviews; (5) Conduct a social media query within professional

and, if available, patient networks; (6) Pilot the apps; and (7)

Elicit feedback from users. This process might be too demanding

for health care providers and end users when making treatment

choices. A standardized and reliable quality assessment tool

could facilitate this process.

Several MHA evaluation scales exist to date. The American

Psychological Association released an app evaluation model

comprising 33 items across the following five scales:

background information, risk/privacy and security, evidence,

ease of use, and interoperability [12]. The main aim of this

model is to assess the likelihood of harm [9]. However, the

validity and reliability assessment of this rating instrument has

not yet been reported, and there is no agreement regarding its

application [14].

Baumel and colleagues [15] developed the Evaluation Tool for

Mobile and Web-Based eHealth Interventions (ENLIGHT),

according to a comprehensive systematic review of relevant

criteria. The tool allows the evaluation of app quality in terms

of seven dimensions (usability, visual design, user engagement,

content, therapeutic persuasiveness, therapeutic alliance, and

general subjective evaluation) with 28 items. ENLIGHT also

provides a checklist to assess credibility, evidence base, privacy

explanation, and basic security.

The Mobile App Rating Scale (MARS) [16] is the most

commonly used app evaluation tool that allows electronic health

experts to rate MHAs. It includes 19 items comprising four

subscales on objective MHA characteristics (engagement,

functionality, esthetics, and information quality) and a further

10 items comprising two subscales on subjective characteristics

(subjective app quality and perceived impact). The subscale and

overall scores indicate the quality of MHAs. The MARS has

been used to scientifically assess app quality in the following

fields: weight management, physical activity, heart failure, diet

in children and adolescents, medication adherence, mindfulness,

back pain, chronic pain, smoking cessation, and depression

[8,17-24]. Thus, it is the most widely used MHA quality rating

tool in the scientific community. Furthermore, numerous

international efforts promoting safe MHA use (eg, Mobile

Health App Database, PsyberGuide or App Script, Reachout,

Kinds Helpline, Health Navigator, and Vic Health) are based

on the MARS.

The original version of the MARS is in English, but culture-

and language-specific app ratings are needed globally. Spanish

and Italian versions of the MARS have been developed [25,26].

A German MARS is necessary considering the growing and

unregulated MHA market in Germany. Therefore, this study

aimed to develop and validate a German version of the Mobile

App Rating Scale (MARS-G) and to investigate the scalability

of the overall MARS score with Mokken scale analysis—an

approach that is closely related to item response theory.

Methods

Adaptation and Translation

The MARS was translated from English into German by two

independent bilingual scientists (EMM and TP). After review

and discussion of both forward translations, a pilot version of

the MARS-G was created. This pilot version underwent blind

back translation by two bilingual speakers with different

backgrounds (a postdoctoral psychologist [AB] and a

nonacademic individual [LMZ]). Thereafter, the back translation

was compared with the original English version by the bilingual

scientists (EMM and TP), and the penultimate version of the

MARS-G was created. This version was evaluated for

comprehensibility by three researchers and three nonacademics.

After addressing their comments, the final version of the

MARS-G was created and used in this study. The MARS-G can

JMIR Mhealth Uhealth 2020 | vol. 8 | iss. 3 | e14479 | p. 2http://mhealth.jmir.org/2020/3/e14479/ (page number not for citation purposes)

Messner et alJMIR MHEALTH AND UHEALTH

XSL

•

RenderX

be downloaded from the supplementary materials or obtained

from the authors on request.

Search and Procedure

The MARS-G was validated within the framework of a study

on the quality of apps targeting anxiety (E M Messner et al,

unpublished data, 2020). Apps were identified using the

following search terms: anxiety, fears, anxiety attack, anxious,

anxiousness, anxiety disorder, fear, dread, fearful, panic, panic

attacks, worry, and worries. Each search term was provided

separately, as no truncation or use of logic operators (AND,

OR, and NOT) was possible in the Google Play Store and iOS

Store.

The inclusion process was divided into three steps (searching,

screening, and determining eligibility). (1) Using the search

terms mentioned above, the initial app pool was identified. (2)

App details on the store sites were screened, and apps were

downloaded and reviewed if they were developed for anxiety,

were available in German or English, were downloadable

through the official Google Play Store or iOS Store, and met

no relevant exclusion criteria (app bundles [many applications

only available as a group]). (3) All downloaded apps were

assessed and excluded if they did not address anxiety, were not

in German or English, were malfunctioning, or met relevant

exclusion criteria (device incompatibility and development/test

phase). We identified 3562 MHAs from the app stores.

However, we excluded 810 duplicate apps, 2577 apps considered

inappropriate on screening, and 71 apps considered ineligible.

The remaining 104 apps were rated using the MARS and

MARS-G by two independent trained raters. The raters tested

all MHAs for 15 minutes. Quality was assessed immediately

after the testing period in both languages. The assessment of

the MARS-G is present in a review evaluating the quality of

MHAs available for anxiety (E M Messner et al, unpublished

data, 2020).

Rater Training

We followed the rating methodology in the original study by

Stoyanov and colleagues [16]. We created a YouTube video

with an introduction on MARS-G rating and an exercise on how

to rate an app used as an exemplary health app

(TrackYourTinnitus) [27]. This video can be requested from

the corresponding author. Each rater was trained using this

video, and five predefined apps were then rated to ensure that

each rater was appropriately trained. If the individual rating

score was different from our standard rating score by at least 2

points, the difference was discussed until agreement. All raters

had at least a bachelor’s degree in psychology to ensure a

necessary minimum psychodiagnostic competence standard.

German Version of the Mobile Application Rating

Scale

We added the following items in the app description section for

the MARS-G: theoretical background (cognitive-behavioral,

therapy, systemic therapy, etc), methods (eye movement

desensitization and reprocessing, tracking, feedback, etc),

category in the app store (lifestyle, medicine, etc), embedding

into routine care (communication with therapist, etc), type of

use (prevention, treatment, rehabilitation, etc), guidance

(stand-alone, blended care, etc), certification (medical device

law, etc), and data safety (log in, informed consent, etc). The

four sections of the original MARS were expanded with an

additional section focusing on the therapeutic gain associated

with the app. The derived items were as follows: gain for the

patient; gain for the therapist; risks and adverse effects; and

ease of implementation in routine health care.

Analyses

Intraclass Correlation

The included MHAs were rated independently by two trained

raters. The intraclass correlation coefficient (ICC) was calculated

to assess the extent of agreement between the raters. An ICC

of <0.50 indicated poor correlation, 0.51-0.75 indicated

moderate correlation, 0.76-0.89 indicated good correlation, and

>0.90 indicated excellent correlation [28]. According to the

findings of previous studies, an ICC >0.75 was considered to

indicate sufficient correlation [8,29,30].

Internal Consistency

Internal consistency of the MARS-G and its subscales was

assessed as a measure of scale reliability, similar to the original

MARS [16]. Omega was used instead of the widely adopted

Cronbach alpha to assess reliability, as it provides a more

unbiased estimation of reliability [31-33]. For estimations, the

procedure introduced by Zhang and Yuan [34] was used to

obtain robust coefficients and bootstrapped bias-corrected

confidence intervals. Reliability of ω <0.50 indicated

unacceptable internal consistency, 0.51-0.59 indicated poor

consistency, 0.60-0.69 indicated questionable consistency,

0.70-0.79 indicated acceptable consistency, 0.80-0.89 indicated

good consistency, and >0.90 indicated excellent consistency

[35].

Validity

We assessed correlations between corresponding subscales of

the MARS and MARS-G, as well as the overall correlation

between the MARS total score and MARS-G total score. A r

value >0.8 was a priori considered by the author group as an

indicator of a strong and sufficient association between the

MARS and MARS-G. Additionally, mean comparisons were

performed between the dimensions of the MARS and MARS-G,

using two-sided t tests. For all comparisons, a P value <.05 was

considered significant.

Mokken Scale Analysis

Mokken scale analysis (MSA) is a scaling approach closely

related to nonparametric item response theory [36]. The

preconditions to use MSA are monotonicity and nonintersection.

The key parameter in the MSA is Loevinger H. Hi is the scaling

parameter for item i, and the overall scalability of all items

clustering onto scale k is Hk. Hi indicates the strength of the

relationship between a latent variable (app quality) and item i.

A high scalability score indicates a high probability that an

increase in item i is accompanied by an increase in the latent

variable. A scale is considered weak if H is <0.4, moderate if

H is ≥0.4 but <0.49, and strong if H is >0.5 [36]. This approach

has been described in detail previously [36-39]. For both the

MARS and MARS-G, the MSA was conducted to assess the

JMIR Mhealth Uhealth 2020 | vol. 8 | iss. 3 | e14479 | p. 3http://mhealth.jmir.org/2020/3/e14479/ (page number not for citation purposes)

Messner et alJMIR MHEALTH AND UHEALTH

XSL

•

RenderX

scalability of the mean scores. As recommended by van der Ark

[36], the reliability of the scales was additionally assessed using

the Molenaar-Sijtsma method (MS) [40,41], lambda-2 [42], and

latent class reliability coefficient (LCRC) [43]. The MSA has

been described previously [36].

Analysis Software

R software (R Foundation for Statistical Computing, Vienna,

Austria) [44] was used for all analyses, except intraclass

correlation. The MSA was conducted using the R package

mokken [36,38]. Correlations and internal consistency were

calculated using the psych (version 1.8.12) [45] and

coefficientalpha packages (version 0.5) [34]. The

coefficientalpha package includes the calculation of omega with

missing and nonnormal data. The ICC was calculated using

IBM SPSS 24 (IBM Corp, Armonk, New York) [46].

Results

Descriptive Data and Mean Comparisons

The ICCs for the MARS and MARS-G were high (ICCMARS:

0.84, 95% CI 0.82-0.85; ICCMARS-G: 0.83, 95% CI 0.82-0.85).

The mean and standard deviation scores of the items in the

MARS-G are presented in Table 1. The mean and standard

deviation scores of the items in the MARS are reported

elsewhere (E M Messner et al, unpublished data, 2020). The

mean scores of the dimensions engagement (t206=0.12; P=.91),

functionality (t205=0.39; P=.70), esthetics (t206=−0.012; P=.99),

and information quality (t204=0.45; P=.66) and the overall rating

(t206=0.27; P=.80) were equivalent between the MARS and

MARS-G.

Table 1. Summary of item and scale scores for the German version of the Mobile App Rating Scale.

Score, mean (SD)Dimension

2.52 (0.70)Engagement

2.64 (0.93)Item 01

2.79 (0.90)Item 02

2.19 (1.00)Item 03

1.86 (0.79)Item 04

3.15 (0.72)Item 05

4.12 (0.69)Functionality

4.13 (0.82)Item 06

4.24 (0.77)Item 07

4.09 (0.74)Item 08

4.03 (0.78)Item 09

3.21 (0.94)Esthetics

3.40 (0.93)Item 10

3.20 (1.09)Item 11

3.04 (0.99)Item 12

2.75 (0.60)Information quality

3.60 (0.76)Item 13

2.63 (0.68)Item 14

2.67 (0.76)Item 15

2.61 (0.88)Item 16

3.66 (0.68)Item 17

1.87 (0.89)Item 18

3.00 (N/Aa)

Item 19

3.11 (0.58)Overall mean

aThis item on information quality could be rated for only 1 app, for the rest it was rated not applicable.

Internal Consistency

The internal consistency for the MARS dimension engagement

was good (ω=0.84, 95% CI 0.77-0.88). The internal

consistencies for functionality (ω=0.90, 95% CI 0.85-0.94) and

esthetics (ω=0.91, 95% CI 0.92-0.96) were excellent. The

internal consistency for information quality was acceptable

(ω=0.74, 95% CI 0.14-0.99; α=.75, 95% CI 0.67-0.83). The

internal consistency of the overall MARS score was good

(ω=0.81, 95% CI 0.74-0.86).

JMIR Mhealth Uhealth 2020 | vol. 8 | iss. 3 | e14479 | p. 4http://mhealth.jmir.org/2020/3/e14479/ (page number not for citation purposes)

Messner et alJMIR MHEALTH AND UHEALTH

XSL

•

RenderX

The internal consistencies of the MARS-G dimensions were

almost identical to those of the original MARS (engagement:

ω=0.85, 95% CI 0.78-0.89; functionality: ω=0.91, 95% CI

0.87-0.94; esthetics: ω=0.93, 95% CI 0.90-0.95; information

quality: ω=0.72, 95% CI 0.33-0.81). The internal consistency

of the overall score was good (ω=0.82, 95% CI 0.76-0.86).

Validity

The correlation coefficients between corresponding dimensions

of the MARS and MARS-G ranged from 0.93 to 0.98, and P

values were adjusted for multiple testing according to the

Holmes method [47] (Table 2). Correlations between the

respective items are presented in Multimedia Appendix 1. There

were no associations between user ratings and quality ratings

(Table 1).

Table 2. Validity of the German version of the Mobile App Rating Scale (r and P value).

Star ratingInformation qualityGER

EstheticsGER

FunctionalityGER

EngagementGERa

Dimension

−0.03 (.99)0.52 (.001)0.73 (<.001)0.49 (<.001)0.97 (<.001)

EngagementENGb

0.06 (.99)0.36 (.002)0.43 (<.001)0.98 (<.001)0.45 (<.001)FunctionalityENG

0.12 (.99)0.41 (.001)0.97 (<.001)0.41 (<.001)0.69 (<.001)EstheticsENG

0.25 (.19)0.93 (.001)0.47 (<.001)0.34 (.004)0.55 (<.001)Information qualityENG

—c

0.26 (.19)0.12 (>.99)0.07 (>.99)−0.03 (>.99)Star rating

aGerman version.

bEnglish version.

cNot applicable.

Mokken Scale Analysis

The MSA of the MARS revealed strong scalability (H=0.50;

SE 0.062). There were no violations of monotonicity and

nonintersection. The internal consistency of this scale was

acceptable (MS=0.74; lambda 2=0.73; LCRC=0.72). The MSA

of the MARS-G revealed good scalability (H=0.48; SE 0.060).

The internal consistency of this scale was acceptable (MS=0.74;

lambda 2=0.72; LCRC=0.74). The scalability results of the

MARS and MARS-G are presented in Table 3.

Table 3. Summary of the Hk coefficient (overall scalability of all items in the scale) for the Mobile App Rating Scale (MARS) and the German version

of the Mobile App Rating Scale (MARS-G).

MARS-GMARSDimension

0.570.59Engagement

0.410.43Functionality

0.510.51Esthetics

0.410.45Information quality

0.480.50Total scale

Discussion

Principal Findings

This study developed and evaluated the MARS-G for MHAs.

The results showed that the MARS-G is a reliable and valid

tool for experts to assess the quality of MHAs. The validity and

reliability of the MARS-G were comparable to those of the

original MARS. With regard to the reliability of the dimension

information quality, the confidence interval of omega was

overestimated owing to planned missingness. The planned

missingness originated from the response option not applicable,

which allows raters to skip an item if the app does not have any

health information (eg, diary apps and brain games). There were

no differences in reliability between the MARS-G and original

MARS.

The MSA revealed that the use of the MARS-G total score is

appropriate. Furthermore, there was good correspondence

between the MARS-G and original MARS, indicating good

validity. Our results are consistent with the findings of a study

that introduced and tested an Italian version of the MARS [25].

The MARS-G has been presented in Multimedia Appendix 2

and can be obtained from the authors on request. It can be used

freely for research and noncommercial MHA-evaluation

projects. To reach satisfactory interrater reliability, completion

of an online training exercise provided by the corresponding

author is highly recommended. Furthermore, a training dataset

of five apps can be obtained from the corresponding author on

request. The MARS-G ratings should be revised until an

appropriate level (ie, ICC >0.75) of interrater reliability is

achieved.

To assist in MHA selection, standardized high-quality ratings

of MHA are needed in German-speaking countries. Overall, a

publicly available database presenting reliable, valid, and

standardized expert ratings, like MARS-G ratings, could

JMIR Mhealth Uhealth 2020 | vol. 8 | iss. 3 | e14479 | p. 5http://mhealth.jmir.org/2020/3/e14479/ (page number not for citation purposes)

Messner et alJMIR MHEALTH AND UHEALTH

XSL

•

RenderX

contribute to informed health care decisions on which app to

use for a specific disease or purpose. The mobile health app

database [48] is one example of such a tool that assists users

and health care providers in selecting appropriate apps for

different health-related purposes.

Limitations

This study has several limitations. First, convergent validity

was only evaluated by comparing the MARS and MARS-G.

Comparisons with other app rating scales, such as ENLIGHT

[15] and the American Psychological Association app evaluation

model [12], are necessary in future studies. Second, the focus

on anxiety apps limits generalization. Further studies are needed

to confirm that these findings can be generalized to other mobile

health domains. Such studies would require expert raters who

are familiar with the specific domain. Finally, a confirmatory

factor analysis of the MARS and MARS-G should be conducted

in future studies with larger samples to ensure that the

predefined subscales of the MARS and MARS-G can be

confirmed.

Future Research

This translation study of the MARS led to the discovery of

several research gaps. Future studies should focus on the

improvement of app quality assessment and therefore the

augmentation of safe MHA use on a broad scale. A challenge

in this research is that the sequence in which apps are presented

in the app store is incomprehensible and differs depending on

which account is used for the search. In future studies, a web

crawler could be used to search European app stores with

keywords in order to build an unbiased database of available

MHAs. Such a database already exists in China, and it contains

all MHAs available in the United States, China, Japan, Brazil,

and Russia [49].

Future studies should also shed light on the correlation between

real-life user behavior and MARS or MARS-G ratings. As the

MARS and MARS-G capture app quality, they could help

predict the ability of users to download and use digital resources.

Such research has already been conducted for ENLIGHT and

real-life user engagement [50]. The efficacy of MHAs is strongly

related to user adherence [50-52]; thus, high-quality apps might

need to include adherence facilitation strategies to reach their

potential.

Moreover, patient involvement should be taken into account.

The user version of the MARS (uMARS) [53] should be

translated and tested for reliability and validity as well, so that

expert ratings of the MARS-G can be complemented with user

ratings of the uMARS-G in German-speaking countries. In

addition, there is a need for additional studies in the future to

investigate the MARS-G and uMARS-G for apps related to

specific health problems.

In conclusion, the MARS-G could be used by various

stakeholders, such as public health authorities, patient

organizations, researchers, health care providers (eg, physicians

and psychotherapists), and interested third parties, to assess

MHA quality. Furthermore, app developers could use the

MARS-G as a tool to improve the quality of their apps.

Acknowledgments

The authors thank Linda Maria Zisch for her help in the translation process.

Conflicts of Interest

None declared.

Multimedia Appendix 1

Item correlation matrix of MARS and MARS-G.

[DOCX File , 25 KB-Multimedia Appendix 1]

Multimedia Appendix 2

Mobile Application Rating Scale-German.

[PDF File (Adobe PDF File), 563 KB-Multimedia Appendix 2]

References

1. UM London, eMarketer. Statista - The Statistics Portal. 2014 Aug. Smartphone user penetration as percentage of total

population in Western Europe from 2011 to 2018 URL: https://www.statista.com/statistics/203722/smartphone-

penetration-per-capita-in-western-europe-since-2000/ [accessed 2019-12-05]

2. ForwardAdGroup. Statista - Das Statistik-Portal. 2015. Wie viele Apps haben Sie auf Ihrem Smartphone installiert? URL:

https://de.statista.com/statistik/daten/studie/162374/umfrage/durchschnittliche-anzahl-von-apps-auf-dem-

handy-in-deutschland/ [accessed 2019-12-05]

3. Thranberend T, Knöppler K, Neisecke T. Gesundheits-Apps: Bedeutender Hebel für Patient Empowerment - Potenziale

jedoch bislang kaum genutzt. Spotlight Gesundh 2016;2:1-8.

4. Deutschland.de. Deutschland.de. 2018. We speak German URL: https://www.deutschland.de/en/topic/culture/

the-german-language-surprising-facts-and-figures [accessed 2019-04-24]

JMIR Mhealth Uhealth 2020 | vol. 8 | iss. 3 | e14479 | p. 6http://mhealth.jmir.org/2020/3/e14479/ (page number not for citation purposes)

Messner et alJMIR MHEALTH AND UHEALTH

XSL

•

RenderX

5. Contributors Wikipedia. Wikipedia, The Free Encyclopedia. 2019. List of languages by number of native speakers URL:

https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers [accessed 2019-04-24]

6. Ellis E, Gogolin I, Clyne M. The Janus Face of Monolingualism: A Comparison of German and Australian Language

Education Policies. Curr Issues Lang Plan 2010;11(4):439-460. [doi: 10.1080/14664208.2010.550544] [Medline: 26281194]

7. Heron KE, Smyth JM. Ecological momentary interventions: Incorporating mobile technology into psychosocial and health

behaviour treatments. Br J Health Psychol 2010 Feb;15(1):1-39 [FREE Full text] [doi: 10.1348/135910709X466063]

[Medline: 19646331]

8. Terhorst Y, Rathner EM, Baumeister H, Sander L. «Hilfe aus dem App-Store?»: Eine systematische Übersichtsarbeit und

Evaluation von Apps zur Anwendung bei Depressionen. Verhaltenstherapie 2018 May 8;28(2):101-112. [doi:

10.1159/000481692]

9. Ebert DD, Van Daele T, Nordgreen T, Karekla M, Compare A, Zarbo C, et al. Internet and mobile-based psychological

interventions: Applications, efficacy and potential for improving mental health. Eur Psychol 2018 Jul;23(2):167-187. [doi:

10.1027/1016-9040/a000346]

10. Boulos MNK, Brewer AC, Karimkhani C, Buller DB, Dellavalle RP. Mobile medical and health apps: state of the art,

concerns, regulatory control and certification. Online J Public Health Inform 2014 Feb;5(3):229 [FREE Full text] [doi:

10.5210/ojphi.v5i3.4814] [Medline: 24683442]

11. Albrecht UV. Kapitel 8. Gesundheits-Apps und Risiken. In: Albrecht UV, editor. Chancen und Risiken von Gesundheits-Apps

(CHARISMHA). Hannover: Medizinische Hochschule Hannover; 2016:176-192.

12. American Psychiatric Association. American Psychiatric Association. 2017. App evaluation model URL: https://www.

psychiatry.org/psychiatrists/practice/mental-health-apps/app-evaluation-model [accessed 2019-12-05]

13. Boudreaux ED, Waring ME, Hayes RB, Sadasivam RS, Mullen S, Pagoto S. Evaluating and selecting mobile health apps:

Strategies for healthcare providers and healthcare organizations. Transl Behav Med 2014 Dec;4:363-371 [FREE Full text]

[doi: 10.1007/s13142-014-0293-9] [Medline: 25584085]

14. Nouri R, Kalhori S, Ghazisaeedi M, Marchand G, Yasini M. Criteria for assessing the quality of mHealth apps: a systematic

review. J Am Med Informatics Assoc 2018:1-10 [FREE Full text] [doi: 10.1093/jamia/ocy050]

15. Baumel A, Faber K, Mathur N, Kane JM, Muench F. Enlight: A Comprehensive Quality and Therapeutic Potential Evaluation

Tool for Mobile and Web-Based eHealth Interventions. J Med Internet Res 2017 Mar 21;19(3):e82 [FREE Full text] [doi:

10.2196/jmir.7270] [Medline: 28325712]

16. Stoyanov SR, Hides L, Kavanagh DJ, Zelenko O, Tjondronegoro D, Mani M. Mobile app rating scale: a new tool for

assessing the quality of health mobile apps. JMIR mHealth uHealth 2015 Mar;3(1):e27 [FREE Full text] [doi:

10.2196/mhealth.3422] [Medline: 25760773]

17. Bardus M, van Beurden SB, Smith JR, Abraham C. A review and content analysis of engagement, functionality, aesthetics,

information quality, and change techniques in the most popular commercial apps for weight management. Int J Behav Nutr

Phys Act 2016 Mar 10;13:35 [FREE Full text] [doi: 10.1186/s12966-016-0359-9] [Medline: 26964880]

18. Grainger R, Townsley H, White B, Langlotz T, Taylor WJ. Apps for People With Rheumatoid Arthritis to Monitor Their

Disease Activity: A Review of Apps for Best Practice and Quality. JMIR Mhealth Uhealth 2017 Feb 21;5(2):e7 [FREE

Full text] [doi: 10.2196/mhealth.6956] [Medline: 28223263]

19. Knitza J, Tascilar K, Messner EM, Meyer M, Vossen D, Pulla A, et al. German Mobile Apps in Rheumatology: Review

and Analysis Using the Mobile Application Rating Scale (MARS). JMIR Mhealth Uhealth 2019 Aug 05;7(8):e14991 [FREE

Full text] [doi: 10.2196/14991] [Medline: 31381501]

20. Machado GC, Pinheiro MB, Lee H, Ahmed OH, Hendrick P, Williams C, et al. Smartphone apps for the self-management

of low back pain: A systematic review. Best Practice & Research Clinical Rheumatology 2016 Dec;30(6):1098-1109. [doi:

10.1016/J.BERH.2017.04.002]

21. Mani M, Kavanagh DJ, Hides L, Stoyanov SR. Review and Evaluation of Mindfulness-Based iPhone Apps. JMIR Mhealth

Uhealth 2015 Aug 19;3(3):e82 [FREE Full text] [doi: 10.2196/mhealth.4328] [Medline: 26290327]

22. Masterson Creber RM, Maurer MS, Reading M, Hiraldo G, Hickey KT, Iribarren S. Review and Analysis of Existing

Mobile Phone Apps to Support Heart Failure Symptom Monitoring and Self-Care Management Using the Mobile Application

Rating Scale (MARS). JMIR Mhealth Uhealth 2016 Jun 14;4(2):e74 [FREE Full text] [doi: 10.2196/mhealth.5882] [Medline:

27302310]

23. Salazar A, de Sola H, Failde I, Moral-Munoz JA. Measuring the Quality of Mobile Apps for the Management of Pain:

Systematic Search and Evaluation Using the Mobile App Rating Scale. JMIR Mhealth Uhealth 2018 Oct 25;6(10):e10718

[FREE Full text] [doi: 10.2196/10718] [Medline: 30361196]

24. Thornton L, Quinn C, Birrell L, Guillaumier A, Shaw B, Forbes E, et al. Free smoking cessation mobile apps available in

Australia: a quality review and content analysis. Aust N Z J Public Health 2017 Dec;41(6):625-630. [doi:

10.1111/1753-6405.12688] [Medline: 28749591]

25. Domnich A, Arata L, Amicizia D, Signori A, Patrick B, Stoyanov S, et al. Development and validation of the Italian version

of the Mobile Application Rating Scale and its generalisability to apps targeting primary prevention. BMC Med Inform

Decis Mak 2016 Jul 7;16(83):1-10. [doi: 10.1186/s12911-016-0323-2]

JMIR Mhealth Uhealth 2020 | vol. 8 | iss. 3 | e14479 | p. 7http://mhealth.jmir.org/2020/3/e14479/ (page number not for citation purposes)

Messner et alJMIR MHEALTH AND UHEALTH

XSL

•

RenderX

26. Martin Payo R, Fernandez Álvarez MM, Blanco Díaz M, Cuesta Izquierdo M, Stoyanov SR, Llaneza Suárez E. Spanish

adaptation and validation of the Mobile Application Rating Scale questionnaire. Int J Med Inform 2019 Sep;129:95-99.

[doi: 10.1016/j.ijmedinf.2019.06.005]

27. Pryss R, Probst T, Schlee W, Schobel J, Langguth B, Neff P, et al. Prospective crowdsensing versus retrospective ratings

of tinnitus variability and tinnitus–stress associations based on the TrackYourTinnitus mobile platform. Int J Data Sci Anal

2019:327-338. [doi: 10.1007/s41060-018-0111-4]

28. Portney LG, Watkins MP. Foundations of clinical research: applications to practice. Upper Saddle River, NJ: Pearson/Prentice

Hall; 2009.

29. Lin J, Sander L, Paganini S, Schlicker S, Ebert D, Berking M, et al. Effectiveness and cost-effectiveness of a guided internet-

and mobile-based depression intervention for individuals with chronic back pain: Protocol of a multi-centre randomised

controlled trial. BMJ Open 2017 Dec 28;7:e015226. [doi: 10.1136/bmjopen-2016-015226]

30. Sander L, Paganini S, Lin J, Schlicker S, Ebert DD, Buntrock C, et al. Effectiveness and cost-effectiveness of a guided

Internet- and mobile-based intervention for the indicated prevention of major depression in patients with chronic back

pain—study protocol of the PROD-BP multicenter pragmatic RCT. BMC Psychiatry 2017 Jan 21;17(36). [doi:

10.1186/s12888-017-1193-6]

31. Dunn TJ, Baguley T, Brunsden V. From alpha to omega: A practical solution to the pervasive problem of internal consistency

estimation. Br J Psychol 2014 Aug;105(3):399-412. [doi: 10.1111/bjop.12046] [Medline: 24844115]

32. Revelle W, Zinbarg RE. Coefficients Alpha, Beta, Omega, and the GLB: Comments on Sijtsma. Psychometrika

2009;74(1):145-154. [doi: 10.1007/s11336-008-9102-z]

33. McNeish D. Thanks Coefficient Alpha, We’ll Take It From Here. Psychol Methods 2018 Sep;23(3):412-433. [doi:

10.1037/met0000144]

34. Zhang Z, Yuan KH. Robust Coefficients Alpha and Omega and Confidence Intervals With Outlying Observations and

Missing Data: Methods and Software. Educ Psychol Meas 2016 Jun;76(3):387-411 [FREE Full text] [doi:

10.1177/0013164415594658] [Medline: 29795870]

35. George D, Mallery P. SPSS For Windows Step By Step: A Simple Guide And Reference, 11.0 Update. Boston: Allyn &

Bacon; 2003.

36. van der Ark LA. Mokken Scale Analysis in R. J Stat Softw 2007 Nov 8;20(11):1-19. [doi: 10.1007/s11336-007-9034-z]

37. Mokken RJ. A theory and procedure of scale analysis: With applications in political research. New York: De Gruyter

Mouton; 1971.

38. van der Ark LA. New Developments in Mokken Scale Analysis in R. J Stat Softw 2012;48(5):1-27. [doi:

10.18637/jss.v048.i05]

39. Sijtsma K, van der Ark LA. A tutorial on how to do a Mokken scale analysis on your test and questionnaire data. Br J Math

Stat Psychol 2017;70(1):137-158. [doi: 10.1111/bmsp.12078]

40. Molenaar I, Sijtsma K. Mokken's Approach to Reliability Estimation Extended to Multicategory Items. Kwant Methoden

1988;9(28):115-126.

41. Sijtsma K, Molenaar IW. Reliability of test scores in nonparametric item response theory. Psychometrika 1987

Mar;52(1):79-97. [doi: 10.1007/bf02293957]

42. Guttman L. A basis for analyzing test-retest reliability. Psychometrika 1945 Dec;10(4):255-282. [doi: 10.1007/bf02288892]

43. van der Ark LA, van der Palm DW, Sijtsma K. A Latent Class Approach to Estimating Test-Score Reliability. Appl Psychol

Meas 2011 Mar 09;35(5):380-392. [doi: 10.1177/0146621610392911]

44. R Core Team. R: A Language and Environment for Statistical Computing. R Found Stat Comput Vienna, Austria 2017

[FREE Full text]

45. Revelle W. psych: Procedures for Psychological, Psychometric, and Personality Research [Computer Software]. 2017.

URL: https://personality-project.org/r/psych [accessed 2019-11-22]

46. IBM Corporation. IBM SPSS Advanced Statistics 24 [Software]. 2016. URL: http://www-01.ibm.com/support/

docview.wss?uid=swg27047033#ja%5Cnftp://public.dhe.ibm.com/software/analytics/spss/documentation/statistics/24.0/

ja/client/Manuals/IBM_SPSS_Advanced_ [accessed 2020-01-16]

47. Holm S. A simple sequentially rejective multiple test procedure. Scand J Stat 1979;6:65-70 [FREE Full text]

48. MHAD Core Team. Mobile Health App Database. 2019. URL: http://www.mhad.science/ [accessed 2019-09-11]

49. Xu W, Liu Y. mHealthApps: A Repository and Database of Mobile Health Apps. JMIR mHealth uHealth 2015 Mar

18;3(1):e28 [FREE Full text] [doi: 10.2196/mhealth.4026] [Medline: 25786060]

50. Baumel A, Yom-Tov E. Predicting user adherence to behavioral eHealth interventions in the real world: examining which

aspects of intervention design matter most. Transl Behav Med 2018 Sep 08;8(5):793-798. [doi: 10.1093/tbm/ibx037]

[Medline: 29471424]

51. Christensen H, Griffiths KM, Farrer L. Adherence in internet interventions for anxiety and depression: Systematic review.

J Med Internet Res 2009 Apr;11(2):e13 [FREE Full text] [doi: 10.2196/jmir.1194] [Medline: 19403466]

52. Van Ballegooijen W, Cuijpers P, Van Straten A, Karyotaki E, Andersson G, Smit JH, et al. Adherence to internet-based

and face-to-face cognitive behavioural therapy for depression: A meta-analysis. PLoS ONE 2014 Jul;9(7):e100674 [FREE

Full text] [doi: 10.1371/journal.pone.0100674] [Medline: 25029507]

JMIR Mhealth Uhealth 2020 | vol. 8 | iss. 3 | e14479 | p. 8http://mhealth.jmir.org/2020/3/e14479/ (page number not for citation purposes)

Messner et alJMIR MHEALTH AND UHEALTH

XSL

•

RenderX

53. Stoyanov SR, Hides L, Kavanagh DJ, Wilson H. Development and Validation of the User Version of the Mobile Application

Rating Scale (uMARS). JMIR mHealth uHealth 2016 Jun 10;4(2):e72 [FREE Full text] [doi: 10.2196/mhealth.5849]

[Medline: 27287964]

Abbreviations

ENLIGHT: Evaluation Tool for Mobile and Web-Based eHealth Interventions

ICC: intraclass correlation coefficient

LCRC: latent class reliability coefficient

MARS: Mobile App Rating Scale

MARS-G: German version of the Mobile App Rating Scale

MHA: mobile health app

MS: Molenaar-Sijtsma method

MSA: Mokken scale analysis

uMARS: user version of the Mobile App Rating Scale

Edited by G Eysenbach; submitted 24.04.19; peer-reviewed by C Aljoscha, M Bardus, E de Krijger, R Bipeta; comments to author

05.06.19; revised version received 29.07.19; accepted 24.09.19; published 27.03.20

Please cite as:

Messner EM, Terhorst Y, Barke A, Baumeister H, Stoyanov S, Hides L, Kavanagh D, Pryss R, Sander L, Probst T

The German Version of the Mobile App Rating Scale (MARS-G): Development and Validation Study

JMIR Mhealth Uhealth 2020;8(3):e14479

URL: http://mhealth.jmir.org/2020/3/e14479/

doi: 10.2196/14479

PMID:

Rüdiger Pryss, Lasse Sander, Thomas Probst. Originally published in JMIR mHealth and uHealth (http://mhealth.jmir.org),

27.03.2020. This is an open-access article distributed under the terms of the Creative Commons Attribution License

(https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium,

provided the original work, first published in JMIR mHealth and uHealth, is properly cited. The complete bibliographic information,

a link to the original publication on http://mhealth.jmir.org/, as well as this copyright and license information must be included.

JMIR Mhealth Uhealth 2020 | vol. 8 | iss. 3 | e14479 | p. 9http://mhealth.jmir.org/2020/3/e14479/ (page number not for citation purposes)

Messner et alJMIR MHEALTH AND UHEALTH

XSL

•

RenderX