Synthesis – senior seminar – screening and diagnostic tools

Do you require help with your paper? Use our custom writing service to achieve better grades and meet your deadlines. Trust our team of writing experts with your work today, and enjoy peace of mind.

Order a Similar Paper Order a Different Paper

Read the following 3 articles and synthesize (Combine the ideas of all three sources into one overall point – DO NOT SUMMARIZE)  them into 1 and a half page word document. Also, write a well-elaborated question from each reading. Keep in mind the following points when working on this task:

*Questions must be original, thoughtful, and not easily found in the articles.

*Follow APA Rules

*Use proper citations

*Use  PAST TENSE when discussing the articles  (Research already took place)

*DO NOT USE the following words: Me, you, I, we, prove, proof.

*Refer to the articles by their AUTHORS (year of publication) 

*DO NOT USE the article name or words first, second, or third.



Two Factor Model of ASD Symptoms

One of the key factors in determining whether an individual has Autism Spectrum Disorder (ASD) is in their social and communication skills. Individuals who are diagnosed with ASD have delayed joint attention, eye gazing, and other social interactions such as pointing (Swain et al., 2014).

Joint attention is an important social skill to master because it is a building block for developing theory of mind which, helps us to understand other’s perspectives. Korhonen et al. (2014) found that individuals with autism have impaired joint attention. However, some did not show impairment in joint attention, which lead to evidence that suggests there are different trajectories for joint attention. One suggestion as to why Korhonen et al. (2014) found mixed results, is that there is evidence that joint attention may not be directly linked to individuals with ASD since they were unable to find a difference in joint attention between ASD and developmentally delayed (DD) individuals. Another suggestion for the mixed results, is individual interest in the task vary. Research has found that while individualized studies are beneficial in detecting personal potential and abilities, it would be difficult to generalize the study in order to further research to ASD as a whole (Korhonen et al., 2014). In addition to joint attention, atypical gaze shifts is a distinguishing factor in individuals with ASD. Swain et al. (2014) found the main difference between typically developing (TD) and ASD individuals in the first 12 months of life is in gaze shifts. Individuals that were diagnosed with ASD earlier had lower scores on positive affect, joint attention, and gaze shifts, however those diagnosed later differed from typically developing (TD) only in gaze shifts. It is not until 24 months that later onset ASD individuals significantly differ from their TD peers, by displaying lower positive affect and gestures (Swain et al., 2014). These findings may lead to other ASD trajectories.

Another defining characteristic of ASD is the excess of restrictive patterns of interest and repetitive motor movements. These patterns and movements often impaired the individual from completing daily tasks. Like joint attention and gaze shifts, these repetitive movements and patterns of interest have different trajectories (Joseph et al., 2013). Joseph et al. (2013) found that individuals with high cognitive functioning ASD engage in more distinct and specific interests and less in repetitive motor movements than individuals with lower cognitive functioning ASD. Another finding showed that at the age of two, repetitive motor and play patterns were more common than compulsion. By the age of four all these behaviors increased however, repetitive use of specific objects was found to be less frequent in older children than younger children. This finding suggests that the ritualistic behaviors and motor movements may present themselves differently based on the age of the individual (Joseph et al., 2013).

Joseph et al. (2013), Korhornen et al. (2014), and Swain et al. (2014) all defined key characteristics of an ASD individual and explains the different trajectories of each characteristic. The difficulty with the trajectories is that it is specific to each individual, some symptoms may worsen while others remain stable. It is also difficult to generalize finding with small sample sizes (Joseph et al., 2013).

Discussion Questions:

1. Korhonen et al. (2014) did not use preference-based stimuli to look for joint attention and did not separate high- from low-functioning ASD individuals. Do you think that there could be a difference in level of motivation from each group? If so, how do you think this could change the results?

2. Swain et al. (2014) found that early and late onset of ASD did not differ in their social skills scores at the age of 12 months. If we know that their social skills do not differ then, is there another factor that would allow diagnosis of late onset ASD to be diagnosed at an earlier point in development?

3. Joseph et al. (2013) explains that it is difficult to assess the trajectories of ASD with a small sample size however, how do you think that their findings still help advance the research on ASD?

2019, Vol. 23(2) 468 –476
© The Author(s) 2018
Article reuse guidelines:
DOI: 10.1177/1362361318755318


The most recent edition of the Diagnostic and Statistical
Manual of Mental Disorders (5th ed.; DSM-5) introduced
substantial revisions to the diagnostic criteria for autism
(American Psychiatric Association, 2013). Key changes
included a shift from triadic to dyadic symptom group-
ings, and a consolidation of previously separate diag-
nostic subcategories (i.e. autistic disorder, Asperger’s
disorder, and pervasive developmental disorder not other-
wise specified) into a single category of autism spectrum
disorder (ASD). These primary changes have received a
great deal of attention from scientific, clinical, and lay
communities, primarily focused on concern about poten-
tial effects on prevalence estimates and service eligibility
(Buxbaum and Baron-Cohen, 2013; Grzadzinski et al.,
2013; Halfon and Kuo, 2013; Volkmar and Reichow,
2013). As a result, a number of studies of sensitivity, spec-
ificity, and diagnostic concordance between DSM-IV and
DSM-5 have been conducted since the draft criteria were

first released (see, for review, Kulage et al., 2014; Smith
et al., 2015). By contrast, the addition of severity level
ratings, an equally significant change to the diagnostic
criteria for ASD, has received little scientific attention.

As noted above, changes to DSM-5 were intended, in
part, to address problems with inter-rater agreement on
DSM-IV subcategories (Lord and Bishop, 2015; Ozonoff,
2012a, 2012b). Moving from three subcategories to a

Factors associated with DSM-5 severity
level ratings for autism spectrum disorder

Micah O Mazurek1 , Frances Lu2, Eric A Macklin2,3
and Benjamin L Handen4

The newest edition of the Diagnostic and Statistical Manual of Mental Disorders (5th ed., DSM-5) introduced substantial
changes to the diagnostic criteria for autism spectrum disorder, including new severity level ratings for social
communication and restricted and repetitive behavior domains. The purpose of this study was to evaluate the use of
these new severity ratings and to examine their relation to other measures of severity and clinical features. Participants
included 248 children with autism spectrum disorder who received diagnostic evaluations at one of six Autism Treatment
Network sites. Higher severity ratings in both domains were associated with younger age, lower intelligence quotient,
and greater Autism Diagnostic Observation Schedule–Second Edition domain-specific symptom severity. Greater
restricted and repetitive behavior severity was associated with higher parent-reported stereotyped behaviors. Severity
ratings were not associated with emotional or behavioral problems. The new DSM-5 severity ratings in both domains
were significantly associated with behavioral observations of autism severity but not with measures of other behavioral
or emotional symptoms. However, the strong associations between intelligence quotient and DSM-5 severity ratings
in both domains suggest that clinicians may be including cognitive functioning in their overall determination of severity.
Further research is needed to examine clinician decision-making and interpretation of these specifiers.

autism spectrum disorder, diagnosis, DSM-5, need for support, severity level

1University of Virginia, USA
2Massachusetts General Hospital, USA
3Harvard Medical School, USA
4University of Pittsburgh School of Medicine, USA

Corresponding author:
Micah O Mazurek, Curry School of Education, University of Virginia,
417 Emmet Street South, P.O. Box 400267, Charlottesville, VA 22904,
Email: [email protected]

755318AUT0010.1177/1362361318755318AutismMazurek et al.

Original Article

Mazurek et al. 469

single ASD category with two domain-specific severity
ratings allowed the diagnostic system to retain strong reli-
ability for overall ASD category while allowing for a
multi-dimensional assessment of severity. These new rat-
ings provide a system for documenting an individual’s
symptom severity in the areas of social communication
and restricted and repetitive behavior (RRB). Each domain
receives a rating of 1 (requiring support), 2 (requiring sub-
stantial support), or 3 (requiring very substantial support)
(American Psychiatric Association, 2013). Additional text
explanation and some examples are provided for each
level, but a lack of clear-cut operational definitions means
that rating determinations remain somewhat subjective. As
described in a recent review (Mehling and Tassé, 2016), it
is not clear how clinicians will make these determinations
or whether their ratings will reflect symptom severity
alone or be influenced by other indices of impairment
(such as cognitive functioning) or co-occurring symptoms
(such as internalizing symptoms or challenging behavior).

Symptom-related functional impact is an important ele-
ment of overall severity of psychopathology. The inclusion
of this dimensional coding of symptom-specific severity
has intuitive appeal in that it may help guide treatment
planning and allow for examination of an individual’s pro-
gress over time within a particular symptom domain.
However, to date, the new severity scales have not been
empirically validated against other indicators of severity.
As a result, clinicians have little overt guidance in making
these determinations. To our knowledge, there has been no
published study examining how these severity ratings will
be used in clinical practice, how they relate to other meas-
ures of symptom severity, or how they relate to other child
characteristics. This information is necessary for interpret-
ing the clinical significance, validity, and utility of these
ratings. The relationship between severity ratings and cog-
nitive functioning may be particularly important to exam-
ine, as intellectual impairment may be conflated with
overall severity. In addition, determining whether ASD
severity ratings are distinct from measures of general emo-
tional and behavioral symptomatology will be important
for evaluating discriminant validity.

Current study

The purpose of this study was to evaluate the DSM-5
severity level ratings in a large sample of children and
adolescents with ASD. Our primary aims were to (1)
describe the distribution of DSM-5 defined social com-
munication and RRB severity ratings across our sample,
(2) assess the relationship between DSM-5 severity rat-
ings and a standardized measure of ASD severity, and (3)
assess the relationship between DSM-5 severity ratings
and other clinical features, particularly cognitive and
behavioral functioning.


Participants and procedures

Participants consisted of 248 children and adolescents
(ages 2–17 years, M = 6.4 years, SD = 4.0 years) with ASD
enrolled in a larger study focused on DSM-5 criteria for
ASD (Mazurek et al., 2017). All children received a com-
prehensive diagnostic evaluation for autism at one of six
Autism Treatment Network (ATN) sites: Children’s
Hospital Los Angeles, Cincinnati Children’s Hospital
Medical Center, Nationwide Children’s Hospital,
University of Missouri, University of Pittsburgh Medical
Center, and Vanderbilt University Medical Center. Each
clinical diagnostic assessment was conducted in accord-
ance with the standard ATN diagnostic process and
included a review of records, a non-standardized diagnos-
tic clinical interview, standardized observation using the
Autism Diagnostic Observation Schedule–Second Edition
(ADOS-2), cognitive assessment, and assessment of
behavioral functioning. Additional measures were included
when necessary on a case-by-case basis to further inform
diagnostic determination. In total, 52% of the participants
were assessed by a psychologist, 5.7% were assessed by a
physician (i.e. developmental behavioral pediatrician, neu-
rologist, pediatrician, or psychiatrist), and 42.3% were
assessed by an interdisciplinary team (all teams included a
psychologist and/or physician).

The study was approved by the Institutional Review
Board at the clinical and data coordinating center at
Massachusetts General Hospital and at each clinical site,
and informed written consent from each family was
obtained prior to participation. Families whose children
were between the ages of 2 and 17 years 11 months and
who were seen for an autism diagnostic evaluation were
recruited for participation. Recruitment and enrollment
continued until the target sample size was met. Only those
meeting DSM-5 criteria for ASD were included in this
study. Most children were male (82%) and Caucasian
(76%), and most primary caregivers had received some
post-secondary education (66%).


Demographics. Primary caregivers completed a demo-
graphic questionnaire to report child age, sex, ethnicity,
race, caregiver education level, and household income.

Autism symptom severity. The ADOS-2 (Lord et al., 2012)
is a standardized diagnostic observational tool that assesses
communicative behavior, social interaction skills, and
repetitive behaviors and restricted interests. The ADOS-2
was administered at all sites by assessors with extensive
experience and formal training on administration and scor-
ing of the measure. The ADOS-2 comprises five different

470 Autism 23(2)

modules, one of which is selected for administration based
on the child’s age and verbal ability. A continuous 10-point
metric, the ADOS-2 calibrated severity score (CSS), has
been developed as a measure of overall autism symptom
severity (Esler et al., 2015; Gotham et al., 2009; Hus and
Lord, 2014). The CSS was standardized to account for
individual differences in age and language level. Higher
CSS scores indicate greater symptom severity. Separate
scores were calculated by domain: the social affect cali-
brated severity score (SA-CSS) and the restricted and
repetitive behavior calibrated severity score (RRB-CSS).
Each domain score represents a continuous 10-point score
that accounts for individual differences in age and lan-
guage (Hus et al., 2014).

Two subscales from the Aberrant Behavior Checklist
(ABC) (Aman and Singh, 1986) were included to assess
parent-reported severity in social and repetitive behav-
ior domains. The ABC is a 58-item caregiver-report
questionnaire that measures current behavioral func-
tioning across five empirically derived subscales. For
the purpose of this study, the Social Withdrawal sub-
scale (comprising 16 items assessing social isolation,
withdrawal, and lack of social reciprocity) and the
Stereotypic Behavior subscale (comprising seven items
assessing repetitive behaviors and stereotyped move-
ments) were examined as parent-report measures of
symptom severity.

Intellectual ability. A range of measures were used across
ATN sites to assess overall intelligence (Full Scale IQ),
verbal intelligence (VIQ), and nonverbal intelligence
(NVIQ). A small portion (10.9%) of the sample was
administered a nonverbal measure of intelligence, the
Leiter International Performance Scale—Third Edition,
(Roid et al., 2013); therefore, only NVIQ scores were
available for this subset of the sample. Intellectual testing
could not be completed for 16.1% of the sample due to dif-
ficulties participating or understanding task demands. As a
result, valid Full Scale IQ scores were available for 181
children (73% of the total sample). Measures included the
Stanford Binet Scales of Intelligence–Fifth Edition
(24.6%) (Roid, 2003), the Wechsler Intelligence Scale for
Children–Fourth Edition (3.6%) (Wechsler, 2003), the
Wechsler Intelligence Scale for Children–Fifth Edition
(6.9%) (Wechsler, 2014), the Wechsler Preschool and Pri-
mary Scale of Intelligence–Third Edition (1.2%)
(Wechsler, 2002), the Wechsler Abbreviated Scale of Intel-
ligence–Second Edition (8.5%) (Wechsler, 2011), the
Wechsler Adult Intelligence Scale—Fourth Edition (0.4%)
(Wechsler, 2008), the Differential Ability Scales–Second
Edition (3.6%) (Elliot, 2007), the Bayley Scales of Infant
and Toddler Development–Third Edition (2%) (Bayley,
2006), or the Mullen Scales of Early Learning (MSEL,
22.2%) (Elliot, 2007). For those receiving the MSEL, the

Early Learning Composite Standard Score was used as a
measure of Full Scale IQ.

Emotional and behavioral functioning. The Child Behavior
Checklist (CBCL) (Achenbach and Rescorla, 2001) was
administered to assess emotional and behavioral difficul-
ties. The CBCL is a broad-band parent-report question-
naire providing an overall assessment of symptoms (i.e.
Total Problems score) as well as more specific summary
and syndrome scales. Items are rated on a three-point
scale (Not True to Very True). Two separate versions are
available based on the child’s age, including younger
(ages 1.5–5 years) and older (ages 6–18 years) versions.
Although the specific syndrome scales differ across ver-
sions, the Total and Internalizing and Externalizing Scale
T-scores are comparable across versions. For this study,
overall levels of both internalizing and externalizing
problems were examined using Internalizing and Exter-
nalizing composite T-scores. The Internalizing domain
comprises mood and anxiety symptoms, while the Exter-
nalizing domain includes behavioral problems, such as
aggression and noncompliance.

Three additional subscales from the ABC (Aman and
Singh, 1986) were included to assess additional challeng-
ing behaviors, specifically: Irritability, Hyperactivity/
Noncompliance, and Inappropriate Speech.

DSM-5 checklist. After all diagnostic assessment proce-
dures were conducted, clinicians completed a DSM-5
diagnostic checklist for each participant. The checklist
contained seven symptoms grouped in two areas: (1) social
communication deficits (three symptoms), and (2) RRBs
(four symptoms). The clinician noted whether each symp-
tom was “absent,” “present by history,” or “currently pre-
sent,” consistent with DSM-5 descriptions (American
Psychiatric Association, 2013). Additional checklist sec-
tions included whether symptoms were present or absent
in the early developmental period and whether impairment
was present or absent. The checklist also included severity
level ratings for both social communication and RRB on a
three-point scale, consistent with DSM-5 criteria (Ameri-
can Psychiatric Association, 2013).

Data analysis plan

Descriptive statistics (mean, standard deviation, range,
and percentage) were calculated for demographic and pri-
mary variables. To examine the distribution of DSM-5-
defined social communication and RRB severity levels,
cross-tabulation of the percentages at each severity level
were calculated. The second and third research ques-
tions were addressed by first conducting bivariate analyses
to examine whether DSM-5 severity ratings were associ-
ated with individual demographic (i.e. age and sex) or

Mazurek et al. 471

clinical features (i.e. ADOS-2 CSS domain scores, IQ
score, internalizing symptoms, externalizing behaviors,
and aberrant behaviors). DSM-5 severity scores are formally
ordinal metrics with potentially unequal intervals between
levels. We ran three models for each bivariate analysis and
looked for agreement across the three models. The first
model was a cumulative logistic regression model, which
properly accounts for the variable intervals between levels
but also assumes parallel cumulative odds across each pre-
dictor. We tested the parallel cumulative odds assumption

with the proportional odds test. The second model was a
binary logistic regression model, which dichotomized
DSM-5 severity scores between requiring support versus
requiring substantial or very substantial support. This divi-
sion was selected because of the low prevalence of partici-
pants scored as requiring very substantial support. The
third model was a linear regression model, which assigned
the values 1 through 3 to the three severity levels as a con-
tinuous scale. The binary logistic model is correct but
potentially less powerful than the cumulative logistic
model. The cumulative logistic model is appropriate if the
proportion odds assumption is met, but cumulative odds
are difficult to communicate. The linear model is not for-
mally correct, but the interpretation is easy. We focused on
results for which there was agreement across all three
models and thus an unambiguous conclusion of significant
association. Future studies where power is more limited
might choose to focus on inference from the cumulative
logistic model for analyses of ordinal severity scales where
the proportional odds assumption is met. Finally, we used
cumulative and binary logistic and linear multiple regres-
sion models to determine which clinical and demographic
features were independent predictors of DSM-5 severity
scores. For each model, we included all significant varia-
bles from the bivariate models for each DSM-5 severity


Demographic and clinical characteristics of the sample are
presented in Table 1.

For DSM-5 social communication severity, 30% of
the sample were rated as requiring support, 45% as
requiring substantial support, and 25% as requiring very
substantial support (Table 2). For DSM-5 RRB severity,
44% of the sample were rated as requiring support, 39%
as requiring substantial support, and 17% as requiring
very substantial support. In the cross-tabulation, 26% of
the sample were rated as requiring support in both social
communication and RRB domains and 28% were rated as
requiring substantial support in both social communica-
tion and RRB domains. Overall, social communication
severity was greater than RRB severity (test for symme-
try, p < 0.001), although there was substantial concord-
ance between the two metrics (simple kappa = 0.52; 95%
confidence interval (CI) 0.43, 0.60; p < 0.001). Basic
sample characteristics across severity level are shown in
Table 3.

Bivariate analyses

Greater social communication severity was associated
with younger age, lower IQ, and higher ADOS-2 CSS
scores in both social affect and RRB domains (Table 3).

Table 1. Demographic and clinical features.

% (n)

Female 18.1 (45)
Male 81.9 (203)
Asian 1.2 (3)
Black or African American 10.1 (25)
Caucasian/White 76.2 (189)
Other 7.3 (18)
Hispanic/Latino 8.1 (20)
Not Hispanic/Latino 85.9 (213)
Parental education
<High School 4.0 (10)
High School 22.2 (55)
Some College 33.9 (84)
Bachelor’s degree 16.9 (42)
Postgraduate 15.3 (38)
Household income
⩽US$24,999 23.0 (57)
US$25,000–49,999 22.6 (56)
US$50,000–74,999 16.9 (42)
US$75,000–99,999 12.1 (30)
⩾US$100,000 11.3 (28)

Mean (SD)

Age 6.4 (4.0); range: 2.0–17.6
IQ 76.1 (22.5); range: 33–127
Child Behavior Checklist (CBCL)
Externalizing T-score 61.7 (11.6)
Internalizing T-score 65.2 (9.6)
Aberrant Behavior Checklist (ABC)
Irritability 15.0 (10.0)
Social Withdrawal 12.8 (8.7)
Stereotypic Behavior 5.9 (4.9)
Hyperactivity/Noncompliance 20.0 (11.8)
Inappropriate Speech 3.7 (3.1)
ADOS-2 SA-CSS 7.4 (1.9)
ADOS-2 RRB-CSS 7.3 (2.1)

SA-CSS: social affect calibrated severity score, RRB-CSS: restricted
and repetitive behavior calibrated severity score. ADOS-2: Autism
Diagnostic Observation Schedule–Second Edition.

472 Autism 23(2)

Inferences of significant association were consistent
(p < 0.05) for these variables across all three model types.

Greater RRB severity was associated with younger age,
lower IQ, higher ABC Stereotypic Behavior subscale
scores, higher ADOS-2 SA-CSS and RRB-CSS scores,
and being male (Table 4). Inferences of significant associ-
ation were consistent (p < 0.05) across all models for most
of these variables, with the exception of ABC Stereotypic

Behavior, for which two out of three models indicated a
statistically significant association (Table 5).

Multivariate analyses

Final multiple regression models indicated that age, IQ,
and ADOS-2 SA-CSS were significant independent pre-
dictors of social communication severity (p < 0.05 across

Table 2. Distribution of DSM-5 severity ratings across the total sample, n (%).

Restricted and repetitive behavior (RRB) severity Total

RRB Level 1 RRB Level 2 RRB Level 3

(SC) severity

SC Level 1 64 (25.8) 10 (4.0) 1 (0.4) 75 (30.2)
SC Level 2 39 (15.7) 69 (27.8) 3 (1.2) 111 (44.8)
SC Level 3 7 (2.8) 18 (7.3) 37 (14.9) 62 (25)

Total 110 (44.4) 97 (39.1) 41 (16.5) 248 (100)

Level 1 = requiring support, Level 2 = requiring substantial support, Level 3 = requiring very substantial support.

Table 3. Sample characteristics by severity level rating.

Restricted and repetitive behavior (RRB) severity

RRB Level 1 RRB Level 2 RRB Level 3

(SC) severity

SC Level 1 FSIQ = 91.5 (18.2)
%FSIQ = 89 (57/64)

FSIQ = 95.8 (7.7)
%FSIQ = 80 (8/10)

FSIQ = 61.0 (–)
%FSIQ = 100 (1/1)

VIQ = 93.5 (19.3)
%VIQ = 63 (40/64)

VIQ = 101.3 (8.3)
%VIQ = 70 (7/10)

No VIQ (0/1)

NVIQ = 93.4 (19.6)
%NVIQ = 72 (46/64)

NVIQ = 96.6 (8.4)
%NVIQ = 90 (9/10)

No NVIQ (0/1)

Age = 9.5 (4.3) years Age = 7.4 (3.0) years Age = 10.8 (–) years
% Module 1 = 6 (4/64) % Module 1 = 0 (0/10) % Module 1 = 0 (0/1)

SC Level 2 FSIQ = 78.1 (22.5)
%FSIQ = 69 (27/39)

FSIQ = 69.6 (20.9)
%FSIQ = 71 (49/69)

FSIQ = 73.7 (12.0)
%FSIQ = 100 (3/3)

VIQ = 76.3 (24.4)
%VIQ = 46 (18/39)

VIQ = 62.5 (16.3)
%VIQ = 52 (36/69)

VIQ = 77.0 (18.4)
%VIQ = 67 (2/3)

NVIQ = 83.5 (21.8)
%NVIQ = 64 (25/39)

NVIQ = 76.9 (21.0)
%NVIQ = 68 (47/69)

NVIQ = 83.0 (16.8)
%NVIQ = 100 (3/3)

Age = 7.4 (4.3) years Age = 5.3 (2.9) years Age = 4.9 (1.2) years
% Module 1 = 21 (8/39) % Module 1 = 45 (31/69) % Module 1 = 0 (0/3)

SC Level 3 FSIQ = 60.6 (15.1)
%FSIQ = 71 (5/7)

FSIQ = 52.4 (6.3)
%FSIQ = 39 (7/18)

FSIQ = 54.8 (9.2)
%FSIQ = 65 (24/37)

VIQ = 54.5 (7.8)
%VIQ = 29 (2/7)

VIQ = 46.7 (7.0)
%VIQ = 39 (7/18)

VIQ = 52.8 (10.2)
%VIQ = 49 (18/37)

NVIQ = 53.3 (6.6)
%NVIQ = 57 (4/7)

NVIQ = 68.6 (20.8)
%NVIQ = 67 (12/18)

NVIQ = 55.0 (10.9)
%NVIQ = 54 (20/37)

Age = 3.6 (1.2) years Age = 4.7 (2.7) years Age = 3.5 (1.8) years
% Module 1 = 57 (4/7) % Module 1 = 56 (10/18) % Module 1 = 57 (20/37)

IQ = M (SD) of Full Scale IQ (FSIQ), Verbal IQ (VIQ), and Nonverbal IQ (NVIQ) for each cell; % IQ = percentage and frequency of children for
whom IQ was available; age = M (SD) of age for each cell; % Module 1 = percentage and frequency of children who were administered Module 1 of
the ADOS-2 (intended for children with minimal verbal abilities).

Mazurek et al. 473

all models), and that age IQ, and ADOS RRB-CSS were
significant independent predictors of RRB severity
(p < 0.05 across all models) (Table 6).


In our analysis of DSM-5 severity level ratings for ASD in
a large sample of children and adolescents with ASD, we
observed that 25% of children were rated as requiring
support, the lowest severity, in both social communication

and repetitive behavior domains; 27% were rated as
requiring substantial support, the intermediate severity, in
both domains; and 15% were rated as requiring very sub-
stantial support, the most severe symptoms, in both
domains. Severity was largely consistent across domains,
with only a handful of children receiving the lowest sever-
ity ratings in one domain and most severe in the other. In
general, social communication symptoms were rated at a
higher level of severity than repetitive behaviors across
the sample.

Table 4. DSM-5 social communication severity levels and clinical features: bivariate analyses.

Cumulative logit Binary logit Linear model


95% CI p Odds

95% CI p Slope

95% CI p

Age 0.75 (0.70, 0.81) <0.001 0.77 (0.71, 0.83) <0.001 −0.093 (–0.113, –0.073) <0.001
Sex 0.90 (0.50, 1.63) 0.742 1.08 (0.54, 2.27) 0.827 −0.045 (–0.286, 0.197) 0.717
IQ 0.94 (0.92, 0.95) <0.001 0.94 (0.93, 0.96) <0.001 −0.019 (–0.023, –0.016) <0.001
CBCL Externalizing T-score 1.00 (0.98, 1.02) 0.965 0.99 (0.97, 1.02) 0.607 0.000 (–0.008, 0.009) 0.923
CBCL Internalizing T-score 0.98 (0.96, 1.01) 0.169 0.98 (0.95, 1.00) 0.100 −0.007 (–0.017, 0.003) 0.193
ABC Irritability 1.01 (0.98, 1.04) 0.644 1.00 (0.97, 1.03) 0.840 0.003 (–0.008, 0.014) 0.573
ABC Social Withdrawal 1.01 (0.97, 1.04) 0.689 1.00 (0.96, 1.03) 0.852 0.003 (–0.009, 0.015) 0.624
ABC Stereotypic Behaviora 1.05 (0.99, 1.12) 0.079 1.01 (0.95, 1.08) 0.758 0.020 (–0.001, 0.042) 0.063
ABC Hyperactivity 1.01 (0.99, 1.04) 0.343 1.01 (0.98, 1.03) 0.688 0.004 (–0.004, 0.013) 0.326
ABC Inappropriate Speech 0.94 (0.85, 1.03) 0.169 0.95 (0.86, 1.06) 0.368 −0.024 (–0.058, 0.010) 0.164
ADOS-2 SA-CSS 1.27 (1.11, 1.46) <0.001 1.31 (1.12, 1.54) <0.001 0.092 (0.042, 0.142) <0.001
ADOS-2 RRB-CSS 1.15 (1.03, 1.29) 0.018 1.16 (1.02, 1.32) 0.028 0.056 (0.012, 0.100) 0.014

CI: confidence interval; CBCL: Child Behavior Checklist; ABC: Aberrant Behavior Checklist; ADOS-2: Autism Diagnostic Observation Schedule–
Second Edition; SA-CSS: social affect calibrated severity score; RRB-CSS: restricted and repetitive behavior calibrated severity score.
aABC Stereotypic Behavior failed to meet the proportional odds assumption (p = 0.007).

Table 5. DSM-5 restricted and repetitive behavior severity levels and clinical features: bivariate analyses.

Cumulative logit Binary logit Linear model


95% CI p Odds

95% CI p Slope

95% CI p

Age 0.76 (0.70, 0.82) <0.001 0.78 (0.72, 0.84) <0.001 –0.081 (–0.101, –0.060) <0.001
Sex 0.46 (0.24, 0.86) 0.017 0.46 (0.24, 0.88) 0.021 –0.285 (–0.519, –0.050) 0.018
IQ 0.96 (0.94, 0.97) <0.001 0.96 (0.94, 0.97) <0.001 –0.015 (–0.019, –0.011) <0.001
CBCL Externalizing T-Score 1.01 (0.99, 1.03) 0.533 1.00 (0.98, 1.02) 0.869 0.003 (–0.005, 0.011) 0.429
CBCL Internalizing T-Scorea 0.99 (0.96, 1.01) 0.388 0.98 (0.95, 1.00) 0.091 –0.002 (–0.012, 0.007) 0.643
ABC Irritability 1.02 (0.99, 1.05) 0.195 1.01 (0.98, 1.05) 0.354 0.008 (–0.002, 0.018) 0.131
ABC Social Withdrawal 0.99 (0.96, 1.02) 0.593 0.99 (0.95, 1.02) 0.408 –0.002 (–0.014, 0.010) 0.707
ABC Stereotypic Behavior 1.07 (1.00, 1.13) 0.036 1.06 (1.00, 1.13) 0.074 0.024 (0.003, 0.045) 0.026
ABC Hyperactivity 1.02 (0.99, 1.04) 0.134 1.02 (0.99, 1.04) 0.186 0.007 (–0.002, 0.016) 0.127
ABC Inappropriate Speech 1.02 (0.93, 1.11) 0.749 1.04 (0.94, 1.14) 0.468 0.002 (–0.032, 0.035) 0.919
ADOS-2 SA-CSS 1.20 (1.05, 1.38) 0.007 1.19 (1.03, 1.38) 0.019 0.071 (0.021, 0.121) 0.006
ADOS-2 RRB-CSSb 1.40 (1.22, 1.62) <0.001 1.45 (1.25, 1.70) <0.001 0.100 (0.058, 0.143) <0.001

CI: confidence interval; CBCL: Child Behavior Checklist; ABC: Aberrant Behavior Checklist; ADOS-2: Autism Diagnostic Observation Schedule–
Second Edition; SA-CSS: social affect calibrated severity score; RRB-CSS: restricted and repetitive behavior calibrated severity score.
aCBCL Internalizing Problems T-score failed to meet the proportional odds assumption (p = 0.005).
bADOS-2 RRB-CSS failed to meet the proportional odds assumption (p = 0.049).

474 Autism 23(2)

Our findings indicate that clinician ratings of severity
are consistent to some degree with both behavioral obser-
vations and parental ratings of severity. Specifically, the
results revealed significant associations between both
social communication and RRB severity ratings and
respective ADOS-2 domain scores. Significant associa-
tions were also observed between RRB severity ratings
and parent-reported symptoms of stereotyped behavior on
the ABC. By contrast, parental ratings of social with-
drawal on the ABC were not associated with DSM-5 rat-
ings of social communication severity. The ABC subscales
included in this study provide a narrow assessment of
very specific types of RRB and social communication.
Thus, future studies should include more comprehensive
parent-report measures of the full range of both RRB and
social communication functioning. It is also noteworthy
that parent-reported behavioral and emotional problems
were not significantly associated with DSM-5 symptom
severity in either domain, providing some evidence that
clinicians are not basing their severity ratings on general
behavioral or emotional problems.

The results also revealed that intellectual functioning
was strongly associated with both social communication
and RRB severity ratings. Children with lower IQ had sig-
nificantly greater clinician-rated severity in both domains.
It could be the case that children who were more signifi-
cantly affected by autism were also more likely to have
global cognitive or developmental impairment. Alterna-
tively, intellectual impairment may contribute indepen-
dently to social communication deficits and repetitive
behaviors above and beyond the effects of core ASD
symptoms alone. Thus, the DSM-5 symptom severity rat-
ings may reflect the combined manifestation of both symp-
tom-specific and global developmental impairment. Given
the wording of the new DSM-5 severity level descriptors
(e.g. “requiring support”), clinicians may also have diffi-
culty determining whether to assign ratings based on ASD

symptom severity alone (more consistent with text exam-
ples) or based largely on need for support (more consistent
with the level descriptors). If clinicians adhere to the latter
interpretation, there may be greater potential for confla-
tion of intellectual and symptom-related impairment. This
poses problems for both inter-rater reliability and con-
struct validity. Without more specific guidance, clinicians
are likely to vary in the extent to which they classify
severity based on domain-specific deficits, cognitive
impairments, or need for support in activities of daily liv-
ing. As shown in a recent descriptive study of children
with ASD, there is a potential for significant discrepancy
in severity classification depending on the measure and
construct (Weitlauf et al., 2014). Further research is
needed to better understand clinician decision-making and
interpretation of the intended construct assessed by these
new DSM-5 specifiers.

Age was also found to be inversely associated with
DSM-5 symptom severity in both social communication
and RRB domains. This is difficult to interpret within the
context of this study because of the potential for sampling
bias and may not reflect a true decrease in ASD severity
with age. Children were recruited and enrolled into this
study based on referral for autism diagnostic assessments.
It is likely that individuals who were referred for an initial
diagnostic assessment in adolescence generally had more
subtle symptom presentation than those who were referred
in early childhood. This would be consistent with prior
research finding an inverse relationship between autism
symptom severity and age at first diagnosis (Mazurek et al.,
2014; Wiggins et al., 2006). Because of this, it is likely that
the adolescents in our study had less severe symptoms than
the larger population of adolescents with ASD. To fully
examine the associations between age and DSM-5 symp-
tom severity indicators, it would be most informative to
enroll a broader population of individuals with ASD, not
only those seen at the time of initial diagnosis.

Table 6. DSM-5 severity levels and clinical features: final multiple regression models.

Cumulative logit Binary logit Linear model


95% CI p Odds

95% CI p Slope

95% CI p

Outcome variable: social communication severity level
Age 0.82 (0.73, 0.91) <0.001 0.82 (0.72, 0.93) 0.002 –0.054 (–0.081, –0.028) <0.001
IQ 0.95 (0.93, 0.96) <0.001 0.95 (0.93, 0.97) <0.001 –0.015 (–0.019, –0.010) <0.001
ADOS-2 SA-CSS 1.30 (1.08, 1.57) 0.006 1.47 (1.16, 1.89) 0.002 0.065 (0.018, 0.113) 0.007
Outcome variable: restricted and repetitive behavior severity level
Age 0.83 (0.74, 0.91) <0.001 0.83 (0.74, 0.93) 0.002 –0.046 (–0.072, –0.021) <0.001
IQ 0.97 (0.96, 0.99) 0.002 0.98 (0.96, 0.99) 0.014 –0.008 (–0.013, –0.004) <0.001
ADOS-2 RRB-CSS 1.69 (1.37, 2.12) <0.001 1.80 (1.42, 2.38) <0.001 0.109 (0.066, 0.153) <0.001

CI: confidence interval; CBCL: Child Behavior Checklist; ABC: Aberrant Behavior Checklist; ADOS-2: Autism Diagnostic Observation Schedule–
Second Edition; SA-CSS: social affect calibrated severity score; RRB-CSS: restricted and repetitive behavior calibrated severity score.

Mazurek et al. 475

Limitations and future directions

As the first study of this type, the current findings provide an
important first examination of the clinical application of
DSM-5 ASD severity ratings across a large and well-charac-
terized sample. The sample spanned a wide range of func-
tioning and was typical of the male:female ratio found in
population studies of ASD (Centers for Disease Control and
Prevention (CDC), 2014). However, several factors may
limit generalizability to the larger ASD population. First, the
centers participating in our study were all located at aca-
demic medical centers and specialize in ASD diagnosis,
treatment, and research. Thus, the clinicians in our study may
not be representative of the larger population of clinicians
practicing in community-based or other settings. Future
research should examine how clinicians in different settings
may be using these DSM-5 severity ratings. It would also be
informative to evaluate potential differences in clinical deci-
sion-making across professional disciplines, as well as inter-
rater reliability in assignment of severity level ratings.

Additional measurement limitations should also be con-
sidered. First, we chose to include ADOS-2 CSS scores
rather than raw scores because they were specifically
designed to account for individual differences in age and
language level. However, it should be noted that these CSS
scores still do not fully account for the associations between
autism symptoms and age and language. In addition,
although ADOS-2 assessments were overseen by research-
reliable clinicians at each site, we did not specifically track
whether all assessments were directly administered by
research-reliable clinicians. Another limitation is that we
did not collect data related to adaptive functioning. Although
many clinicians administered adaptive measures as part of
their clinical evaluations, these data were not collected dur-
ing this study. In the future, it would be informative to eval-
uate the extent to which adaptive functioning correlates with
clinician ratings of symptom severity. Overall, the current
findings suggest that further guidance and more specific
operational definitions may be helpful for clinicians assign-
ing these new DSM-5 severity level ratings.


The authors are extremely grateful to all the families and clinicians
who participated in this study.

Declaration of conflicting interests

Dr M.O.M has received research support from National Institute
of Mental Health (NIMH), Autism Speaks, and Health Resources
and Services Administration (HRSA). Ms F.L. has received
research support from Autism Speaks and HRSA. Dr E.A.M.
serves as a DSMB member for Acorda Therapeutics and Shire
Human Genetic Therapies and receives research support from
Adolph Coors Foundation, ALS Association, ALS Finding a
Cure, Autism Speaks, Biotie Therapies, Michael J Fox Foundation,
FDA, HRSA, NIH, and PCORI. Dr B.L.H. has received research

support from Curemark, Neuropharm, Lilly, Forest, Bristol Myers
Squibb, Roche, Pediamed, Pfizer, and Autism Speaks.


The author(s) disclosed receipt of the following financial support
for the research, authorship, and/or publication of this article: This
network activity was supported by Autism Speaks and coopera-
tive agreement UA3 MC11054 through the US Department of
Health and Human Services, Health Resources and Services
Administration, Maternal and Child Health Research Program to
the Massachusetts General Hospital. This work was conducted
through the Autism Speaks Autism Treatment Network.


Micah O Mazurek


Achenbach TM & Rescorla L (2001) Manual for the ASEBA
school-age forms & profiles: an integrated system of
multi-informant assessment. Burlington, VT: University of
Vermont, Research Center for Children, Youth & Families.

Aman M and Singh N (1986) Aberrant Behavior Checklist:
Manual. East Aurora, NY: Slosson Educational Publications.

American Psychiatric Association (2013) Diagnostic and
Statistical Manual of Mental Disorders (DSM-5). 5th ed.
Washington, DC: APA.

Bayley N (2006) Bayley Scales of Infant and Toddler Development.
3rd ed. San Antonio, TX: Harcourt Assessment, Inc.

Buxbaum JD and Baron-Cohen S (2013) DSM-5: the debate con-
tinues. Molecular Autism 4(1): 11.

Centers for Disease Control and Prevention (CDC) (2014)
Prevalence of autism spectrum disorder among children
aged 8 years—autism and developmental disabilities moni-
toring network, 11 sites, United States, 2010. MMWR
Surveill Summ 63(2): 1–21.

Elliot C (2007) Differential Abilities Scale—2nd Edition (DAS-II)
Manual. 2nd ed. San Antonio, TX: Harcourt Assessment, Inc.

Esler AN, Bal VH, Guthrie W, et al. (2015) The autism diagnostic
observation schedule, toddler module: Standardized sever-
ity scores. Journal of Autism and Developmental Disorders
45(9): 2704–2720.

Gotham K, Pickles A and Lord C (2009) Standardizing ADOS
scores for a measure of severity in autism spectrum disor-
ders. Journal of Autism and Developmental Disorders 39(5):

Grzadzinski R, Huerta M and Lord C (2013) DSM-5 and autism
spectrum disorders (ASDs): an opportunity for identifying
ASD subtypes. Molecular Autism 4(1): 12.

Halfon N and Kuo AA (2013) What DSM-5 could mean to chil-
dren with autism and their families. JAMA Pediatrics 167(7):

Hus V, Gotham K and Lord C (2014) Standardizing ADOS
domain scores: separating severity of social affect and
restricted and repetitive behaviors. Journal of Autism and
Developmental Disorders 44: 2400–2412.

Hus V and Lord C (2014) The autism diagnostic observation
schedule, module 4: Revised algorithm and standardized

476 Autism 23(2)

severity scores. Journal of Autism and Developmental
Disorders 44(8): 1996–2012.

Kulage KM, Smaldone AM and Cohn EG (2014) How will DSM-5
affect autism diagnosis? A systematic literature review
and meta-analysis. Journal of Autism and Developmental
Disorders 44(8): 1918–1932.

Lord C and Bishop SL (2015) Recent advances in autism
research as reflected in DSM-5 criteria for autism spec-
trum disorder. Annual Review of Clinical Psychology 11:

Lord C, Rutter M, DiLavore PC, et al. (2012) Autism Diagnostic
Observation Schedule, Second Edition (ADOS-2) Manual
(Part 1): Modules 1–4. 2nd ed. Torrance, CA: Western
Psychological Services.

Mazurek MO, Handen BL, Wodka EL, et al. (2014) Age
at first autism spectrum disorder diagnosis: The role of
birth cohort, demographic factors, and clinical features.
Journal of Developmental and Behavioral Pediatrics 35(9):

Mazurek MO, Lu, Symecko H, et al. (2017) A prospective study
of the concordance of DSM-IV and DSM-5 diagnostic cri-
teria for autism spectrum disorder. Journal of Autism and
Developmental Disorders 47(9): 2783–2794.

Mehling MH and Tassé MJ (2016) Severity of autism spectrum
disorders: current conceptualization, and transition to DSM-
5. Journal of Autism and Developmental Disorders 46(6):

Ozonoff S (2012a) Editorial: DSM-5 and autism spectrum
disorders—two decades of perspectives from the JCPP.
Journal of Child Psychology and Psychiatry 53(9):

Ozonoff S (2012b) Editorial perspective: autism spectrum dis-
orders in DSM-5—an historical perspective and the need

for change. Journal of Child Psychology and Psychiatry
53(10): 1092–1094.

Roid GH (2003) Stanford-Binet Intelligence Scales. 5th ed.
Itasca, IL: Riverside Publishing.

Roid GH, Miller LJ, Pomplun M, et al. (2013) Leiter-3: Leiter
International Performance Scale. Torrance, CA: Western
Psychological Services.

Smith IC, Reichow B and Volkmar FR (2015) The effects of
DSM-5 criteria on number of individuals diagnosed with
autism spectrum disorder: a systematic review. Journal of
Autism and Developmental Disorders 45(8): 2541–2552.

Volkmar RF and Reichow B (2013) Autism in DSM-5: progress
and challenges. Molecular Autism 4(1): 13.

Wechsler D (2002) Wechsler Preschool and Primary Scale
of Intelligence. 3rd ed. San Antonio, TX: Psychological

Wechsler D (2003) Wechsler Intelligence Scale for Children. 4th
ed. San Antonio, TX: Psychological Corporation.

Wechsler D (2008) Wechsler Adult Intelligence Scale. 4th ed.
San Antonio, TX: Psychological Corporation.

Wechsler D (2011) Wechsler Abbreviated Scale of Intelli-
gence (WASI-II). 2nd ed. San Antonio, TX: Psychological

Wechsler D (2014) Wechsler Intelligence Scale for Children. 5th
ed. San Antonio, TX: NCS Pearson.

Weitlauf AS, Gotham K, Vehorn AC, et al. (2014) Brief report:
DSM-5 “levels of support”: a comment on discrepant con-
ceptualizations of severity in ASD. Journal of Autism and
Developmental Disorders 44(2): 471–476.

Wiggins LD, Baio J and Rice C (2006) Examination of the time
between first evaluation and first autism spectrum diagno-
sis in a population-based sample. Journal of Developmental
and Behavioral Pediatrics 27(2): S79–S87.


Brief Report: Concurrent Validity of Autism Symptom
Severity Measures

Stephanie S. Reszka • Brian A. Boyd •

Matthew McBee • Kara A. Hume • Samuel L. Odom

Published online: 27 June 2013

� Springer Science+Business Media New York 2013

Abstract The autism spectrum disorder (ASD) diagnos-

tic classifications, according to the DSM-5, include a

severity rating. Several screening and/or diagnostic mea-

sures, such as the autism diagnostic and observation

schedule (ADOS), Childhood Autism Rating Scale (CARS)

and social responsiveness scale (SRS) (teacher and parent

versions), include an assessment of symptom severity. The

purpose of this study was to examine whether symptom

severity and/or diagnostic status of preschool-aged children

with ASD (N = 201) were similarly categorized on these

measures. For half of the sample, children were similarly

classified across the four measures, and scores on most

measures were correlated, with the exception of the ADOS

and SRS-P. While the ADOS, CARS, and SRS are reliable

and valid measures, there is some disagreement between

measures with regard to child classification and the cate-

gorization of autism symptom severity.

Keywords Concurrent validity � Autism � Severity �
Diagnostic classification


The proposed changes to the forthcoming diagnostic and

statistical manual of mental disorders, DSM-5 (http:// would include severity criteria for the

autism spectrum disorders (ASD) category. This new cri-

teria would combine autism disorder, Asperger syndrome,

and pervasive developmental disorder—not otherwise

specified (PDD-NOS) into one larger ASD category. As a

result of this collapse, reliable and valid measurement of

autism severity will be even more important in the deter-

mination of services for children with a diagnosis of ASD

(Matson et al. 2012).

Currently, the Childhood Autism Rating Scale (CARS;

Schopler et al. 1986) and Social Responsiveness Scale

(SRS; Constantino 2002) are two commonly used measures

that include a symptom severity estimate. Previously,

higher raw scores on the autism diagnostic and observation

schedule (ADOS; Lord et al. 1999) indicated the presence

of more deficits that are characteristic of individuals with

ASD, suggesting a greater level of impairment, but the raw

scores were not normalized to indicate severity (Gotham

et al. 2009). A recent calibrated severity metric provides

estimations of ASD symptom severity using ADOS scores

(see Gotham et al. 2009). Generally, severity is measured

in several areas for children with ASD: language delay,

cognitive functioning, and behavioral issues (Gotham et al.

2009), however these are not necessarily considered the

core features of ASD. Each of these measures, the CARS,

SRS, and ADOS utilizes slightly different methods of

evaluating the severity of ASD symptoms and have varied

diagnostic cut-offs along the ASD spectrum.

The primary purpose of this study was to examine

whether children’s symptom severity and/or diagnostic

status were similarly categorized across the four measures.

S. S. Reszka (&) � B. A. Boyd
Department of Allied Health, Division of Occupational Science

and Occupational Therapy, University of North Carolina, 321 S.

Columbia Street, Bondurant Hall CB #7122, Chapel Hill,

NC 27599-7122, USA

e-mail: [email protected]

Present Address:

M. McBee

East Tennessee State University, Johnson City, TN, USA

M. McBee � K. A. Hume � S. L. Odom
Frank Porter Graham Child Development Institute, University

of North Carolina, Chapel Hill, NC, USA


J Autism Dev Disord (2014) 44:466–470

DOI 10.1007/s10803-013-1879-7

The two study goals were to examine: (1) the concurrent

validity of the ADOS, CARS, and SRS (parent and teacher

versions) and (2) the categorization of children’s diagnostic

status and symptom severity.


Data for this study were collected on 201 children as part of a

larger study comparing the efficacy of school-based, com-

prehensive treatment models for preschoolers with ASD.

Data were collected across four states (CO, NC, FL, and

MN), and at the beginning of the school year. For each child,

all measures were collected within a 6-week time window.



At enrollment, the mean child age was 3.59 years (SD = 0.56,

range 2.24–5.04). Most participating children were male

(83.3 %) and ethnically non-Hispanic (64.6 %). In terms of

racial status, 5.1 % were identified as Asian, 12.1 % were

Black, 78.3 % were White, and 4.0 % were multiracial. To be

eligible for the larger study, each child was required to have a

clinical or school diagnosis of autism, PDD-NOS, or Asper-

ger’s Syndrome, or meet the autism spectrum cut-off score on

the ADOS and Social Communication Questionnaire (SCQ;

Rutter et al. 2003). If the child had an educational label of

developmental delay (DD) instead of ASD, which is consis-

tent with federal and state policy for children in this age range,

then s/he must have met diagnostic criteria on both the ADOS

and SCQ to be eligible for the study. It was not the point of our

study to diagnose children, but rather screen them for potential

eligibility and a DD educational label is reflective of the real-

world heterogeneity when recruiting children through local

school systems. The other study measures included the fol-

lowing: (1) Mullen Scales of Early Learning (Mullen 1995),

which is a measure of children’s cognitive and motor devel-

opment. Trained research staff administered the visual

reception, fine motor, expressive language, and receptive

language subscales to the child. The mean standard score on

the Mullen was 64.40 (N = 193, SD = 19.6, range 49–136).

And (2) Preschool Language Scale, fourth edition (PLS-4;

Zimmerman et al. 2003), which is a measure of children’s

auditory comprehension and expressive communication

skills. The mean standard score on the PLS-4 was 68.23

(N = 198, SD = 68.23, range 50–134).


Most participating parents were female (88.2 %), non-

Hispanic (66.8 %). Additionally, 5.2 % were identified as

Asian, 13.0 % were black, 78.7 % were white, and 3.1 %

were multiracial. Household annual income ranged from

less than $20,000 (12.8 %) to over $100,000 (26.7 %).

Parents completed the parent version of the SRS (SRS-P).


Teachers completed the teacher version of the SRS (SRS-

T). Participating teachers were almost exclusively female

(98.6 %) and non-Hispanic (83.6 %), and identified them-

selves as white (97.3 %), with the remaining 2.7 % iden-

tifying themselves as black. most held a master’s degree

(56.2 %), while 37 % had a bachelor’s, 2.7 % had an

associate’s, and 4.1 % had a degree above the master’s


Diagnostic and Severity Measures

The measures examined in this study included the ADOS,

CARS, and SRS parent and teacher versions. Both the

ADOS and CARS were administered by trained and reli-

able project staff. The ADOS was administered by a

research-trained and/or research reliable staff member at

each site, and staff across sites met reliability criterion on a

series of CARS training tapes prior to administration.

The ADOS is a semi-structured assessment of children’s

communication, social, and play skills. Module 1 is for

children who are non-verbal or who have a few words.

Module 2 is for children with phrase speech, while Module

3 is intended for children who are verbally fluent. In

accordance with the suggested severity ratings, ADOS

severity scores of 4–5 indicated autism spectrum disorder

and scores from 6 to 10 indicated autism (Gotham et al.

2009). In this sample, 125 children were administered

Module 1 of the ADOS, 57 were administered Module 2,

and 15 were administered Module 3, while 4 children had

missing data for the ADOS.

Using the CARS, the child is rated on 15 subscales

based on observation (during the Mullen administration, in

this case). To ensure consistency in CARS scoring across

study sites and classrooms, the measure was completed

based on observations of children’s behavior during the

structured administration of the Mullen and 15 min of

unstructured time post-Mullen administration. The CARS

includes items on socialization, communication, emotional

response, and sensory issues. Each of the 15 items is rated

on a scale from 0 to 4, with 4 indicating severe impair-

ments. A CARS cutoff raw score of 25.5 was used to

indicate autism spectrum disorder, with raw scores over 30

indicating autism (Chlebowski et al. 2010). The original

CARS was used, as opposed to the newly released CARS2

(Schopler et al. 2010), because the CARS2 only became

publicly available after the study was already underway.

J Autism Dev Disord (2014) 44:466–470 467


This study used the original CARS, which is aligned with

the currently available CARS2-ST, for children younger

than 6 years of age.

The SRS is a 65-item rating scale that was completed by

parents and teachers. The SRS provides information about

children’s social functioning including social awareness,

social information processing, social reciprocal communi-

cation, social anxiety/avoidance behaviors, and stereotypic

behavior/restricted interests. Each item is rated on a scale

of 1 (not true) to 4 (almost always true). T-scores (mean of

50, standard deviation of 10) were used in the analyses,

with a T-score of 60–75 indicating mild to moderate

symptoms of ASD, and scores over 75 indicating severe

symptoms. The SRS was normed with T-scores for parent

and teacher versions, with separate norms within each for

child gender. The appropriate scoring norms were used for

each measure, as specified by the SRS manual. The pre-

school version of the SRS was used for children aged

36–47 months, and the standard version was used for

children 48 months and older.


Autism diagnostic and observation schedule scores ranged

from 2 to 10, with a mean of 7.19 (SD = 1.64) suggesting

that children in the sample tended to score in the milder

end of the ASD category, but represented the full range of

severity across the spectrum. The mean score on the CARS

was 33.37 (SD = 7.31) with a range of 15–55.5. Similarly

to the ADOS mean score, the mean score of 33.37 corre-

sponds to the autism category for the CARS. The SRS-

Teacher (SRS-T) version and SRS-Parent (SRS-P) versions

both showed mean scores in the mild to moderate symptom

category (66.27 and 73.70, respectively). Descriptive

information for each measure is available in Table 1.

Question 1: Concurrent Validity at Pretest

The ADOS severity scores were significantly correlated

with the CARS total score (r = 0.432, p 0.001) and the
total score on the teacher version of the SRS (r = 0.418,

p 0.001). The ADOS severity scores were not signifi-
cantly correlated with scores on the SRS-P (r = 0.088,

p = 0.236). The CARS was significantly correlated with

both versions of the SRS (r = 0.558, p 0.001 for the
teacher version; r = 0.292, p 0.001 for the parent ver-
sion). The SRS-Teacher and SRS-Parent scores were sig-

nificantly correlated (r = 0.275, p 0.001). The
correlation matrix for these measures is shown in Table 2.

Question 2: Categorization of Diagnostic Status/


Nearly 98 % of the children scored on the spectrum

according to the ADOS. The CARS scores classified

64.7 % of children as being on the spectrum. The SRS-

Teacher and SRS-Parent scores classified 76.6 and 82.1 %

of children as being on the spectrum, respectively. Diag-

nostic classification charts for each measure are available

in Fig. 1.

A summary of children’s diagnostic classifications

across all measures is available in Table 3. Ratings were

collapsed so that a score of 0 indicated that the child did

not score on the autism spectrum, while a score of 1

indicated that a child would score in the autism spectrum

range (mild/moderate/severe autism symptoms). As shown,

for 92 cases (50 % of the sample) children were classified

similarly across all measures. For another 25 cases

(13.59 % of the sample), children were classified similarly

on the ADOS and both versions of the SRS, but not the

CARS. The remaining children scored on the spectrum on

one or more of the measures. Almost 14 % scored on the

spectrum according to the ADOS and both SRS versions,

but not the CARS, followed by 10.33 % on the ADOS and

SRS-Parent only. Another 6.52 % of children scored on the

spectrum on the ADOS, CARS, and SRS-Parent. Approx-

imately 6 % scored on the spectrum on both the ADOS and

SRS-Teacher and another nearly 6 % on the ADOS,

CARS, and SRS-Teacher. Almost 4 % scored on the

spectrum only on the ADOS. Just over 2 % scored on the

spectrum only on the ADOS and CARS. Finally, 1 % of

children scored on the spectrum according to the SRS-

Parent and SRS-Teacher forms only, and 0.54 % scored on

the spectrum only on the SRS-Parent. For approximately

76 % of the sample (140 cases), children were similarly

classified on at least three of the four measures.

Table 1 Descriptives for measures

Measure N Mean (SD) Range

ADOS severity 198 7.19 (1.64) 2.00–10.00

CARS total score 200 33.37 (7.31) 15.00–55.50

SRS-Teacher total score 200 66.27 (9.66) 42.00–90.00

SRS-Parent total score 185 73.70 (14.27) 42.00–111.00

Table 2 Bivariate correlations of measures

ADOS severity


CARS total




CARS total score 0.432 (.001) – –
SRS-Teacher total


0.418 (.001) 0.558 (.001) –

SRS-Parent total


0.088 (.236) 0.292 (.001) 0.275 (.001)

p values in parentheses

468 J Autism Dev Disord (2014) 44:466–470



Generally, children’s severity scores on the measures were

correlated, indicating that the severity of autism symptoms

was rated similarly across all measures, with the exception

of the ADOS and SRS-Parent version. There were mod-

erate to strong correlations between the CARS and all other

measures, and between the SRS-T and all other measures.

The ADOS was moderately correlated with both the CARS

and SRS-T, but not with the SRS-P. Research suggests that

scores on the SRS agree with clinical diagnosis a signifi-

cant portion of the time and the SRS teacher and parent

versions have shown correlations ranging from 0.75 to 0.91

in a clinical sample (Constantino et al. 2003), while this

sample showed a weaker, but still significant, correlation of

0.275. Interestingly, the parent version of the SRS was

correlated, albeit moderately, with all other measures with

the exception of the ADOS. However, the statistical

significance of some of the more modest correlations may

be an artifact of the relatively large sample size used in this


The differences in ADOS and SRS-Parent scores seen in

this study may reflect potential variations in child behaviors

across different contexts; all measures except the SRS-P

were completed in the school context, while the SRS-P

reflects parental views of child behaviors at home. It is

important to consider the context under which these mea-

sures of symptom severity were collected. The parent

measures were not always correlated with measures taken

in the school context by teachers or research staff, and

children may display different behaviors at home than they

would in a classroom or research setting. Thus the context

may be a factor in potential disagreements between par-

ents’ and clinicians’ or practitioners’ interpretations of

symptom severity or autism diagnosis.

For half of the sample, children were similarly classified

across all measures. About three quarters (76 %) of the

sample were similarly classified on at least three of the four

measures. Ratings on the CARS appear to be the most

conservative regarding diagnosis, as only 64.7 % (119

children) were rated as having an ASD diagnosis using the

CARS, while nearly all (98.4 %; 181 out of 184) of the

children were rated as having a diagnosis on the spectrum

according to the ADOS. However the ADOS, along with

the SCQ, was used to determine children’s study eligibility,

and was selected because it is considered a gold-standard

measure for ASD diagnosis.

While the children in this study were between the ages

of 3 and 5, previous research comparing the ADOS and

CARS for diagnosing toddlers with ASD suggests that

there is a significant agreement between the two for diag-

nosing ASD in toddlers, matching clinical judgment

(Ventola et al. 2006). Children in this study tended to have

Fig. 1 Diagnostic classification
pie charts by measure

Table 3 Collapsed summary of diagnostic ratings

ADOS CARS SRS-Teacher SRS-Parent N %

0 0 0 1 1 0.54

0 0 1 1 2 1.09

1 0 0 0 7 3.80

1 0 0 1 19 10.33

1 0 1 0 11 5.98

1 0 1 1 25 13.59

1 1 0 0 4 2.17

1 1 0 1 12 6.52

1 1 1 0 11 5.98

1 1 1 1 92 50.00

184 100.00

17 cases were missing and not included in the analysis. 0 = not

autistic/not on spectrum, 1 = on spectrum/mild autism/severe autism

J Autism Dev Disord (2014) 44:466–470 469


mild to moderate symptoms of autism. The CARS is better

at diagnosing children who tend to be lower functioning

than those who are higher functioning (Mayes et al. 2009),

which may explain some of the discrepancy between

CARS classification and the other measures. A newly

released version of the CARS (CARS2-HF) assesses ver-

bally fluent, more high-functioning children, but currently

is only available for children age 6 and older.

The proposed changes to the DSM-5 include severity

criteria for the ASD category, allowing ratings of symp-

toms along ‘‘a continuum from mild to severe rather than a

simple yes or no diagnosis to a specific disorder’’ (APA

2012). Given these changes, measures of symptom severity

may become more critical in autism research and clinical

practice. While the severity measures used in this study

may not match the severity criteria in the proposed DSM-5,

this study is a first step toward examining the agreement, or

lack thereof, of commonly used measures of autism

symptom severity. Additional future studies should exam-

ine the relationships between the current measures of

severity described in this study with the severity classifi-

cations that will be found in the DSM-5.

While there are instruments that can produce reliable

and valid assessments of autism severity available, this

study demonstrates that there is some disagreement among

several of these measures with regard to child classifica-

tions and the categorization of symptom severity. The type

of measure used could affect child classifications, and by

extension, services provided to these children.

Acknowledgments This research was supported by the Institute of
Education Sciences (R324B070219).


American Psychiatric Association (2012). DSM-5 proposed criteria for

autism spectrum disorder designed to provide more accurate

diagnosis and treatment.

%20Autism%20Spectrum%20Disorders%20-%20DSM5.pdf. Ac-

cessed 12 Mar 2013.

Chlebowski, C., Green, J., Barton, M., & Fein, D. (2010). Using the

cars to diagnose ASD. Journal of Addiction, 40, 787–799.

Constantino, J. N. (2002). Social Responsiveness Scale (SRS). Los

Angeles: Western Psychological Services.

Constantino, J. N., Davis, S. A., Todd, R. D., Schindler, M. K., Gross,

M. M., Brophy, S. L., et al. (2003). Validation of a brief measure

of autistic traits: Comparison of the social responsiveness scale

with the autism diagnostic interview-revised. Journal of Autism

and Developmental Disorders, 33, 427–433.

Gotham, K., Pickles, A., & Lord, C. (2009). Standardizing the ADOS

scores for a measure of severity in autism spectrum disorders.

Journal of Autism and Developmental Disorders, 39, 695–705.

Lord, C., Rutter, M., DiLavore, P., & Risi, S. (1999). Autism

diagnostic observation schedule (ADOS). Los Angeles, CA:

Western Psychological Services.

Matson, J. L., Beighley, J., & Turygin, N. (2012). Autism diagnosis

and screening: Factors to consider in differential diagnosis.

Research in Autism Spectrum Disorders, 6, 19–24.

Mayes, S. D., Calhoun, S. L., Murray, M. J., Morrow, J. D., Yurich,

K. K. L., Mahr, F., et al. (2009). Comparison of scores on the

checklist for autism spectrum disorder, childhood autism rating

scale, and Gilliam Asperger’s disorder scale for children with

low functioning autism, high functioning autism, Asperger’s

disorder, ADHD, and typical development. Journal of Autism

and Developmental Disorders, 39, 1682–1693.

Mullen, E. (1995). The Mullen Scales of early learning. Circle Pines,

MN: American Guidance Service.

Rutter, M., Bailey, A., & Lord, (2003). Social communication Ques-

tionnaire (SCQ). Los Angeles: Western Psychological Services.

Schopler, E., Reichler, R. J., & Renner, B. R. (1986). The Childhood

Autism Rating Scale (CARS). Los Angeles, CA: Western

Psychological Services.

Schopler, E., Van Bourgondien, M. E., Wellman, G. J., & Love, S. R.

(2010). Childhood autism rating scale, second edition (CARS2).

Los Angeles: Western Psychological Services.

Ventola, P. E., Kleinman, J., Pandey, J., Barton, M., Allen, S., Green,

J., et al. (2006). Agreement among four diagnostic instruments

for autism spectrum disorder in toddlers. Journal of Autism and

Developmental Disorders, 36, 839–847.

Zimmerman, I., Steiner, V., & Pond, R. (2003). Preschool Language

Scale-IV. San Antonio: Psychological Corporation.

470 J Autism Dev Disord (2014) 44:466–470


Reproduced with permission of the copyright owner. Further reproduction prohibited without

  • c.10803_2013_Article_1879.pdf
    • Brief Report: Concurrent Validity of Autism Symptom Severity Measures
      • Abstract
      • Introduction
      • Methods
        • Participants
          • Children
          • Parents
          • Teachers
        • Diagnostic and Severity Measures
      • Results
        • Question 1: Concurrent Validity at Pretest
        • Question 2: Categorization of Diagnostic Status/Severity
      • Discussion
      • Acknowledgments
      • References

Vol.:(0123456789)1 3

Journal of Autism and Developmental Disorders


Systematic Review and Meta‑Analysis of the Clinical Utility
of the ADOS‑2 and the ADI‑R in Diagnosing Autism Spectrum
Disorders in Children

Jenna B. Lebersfeld1 · Marissa Swanson1 · Christian D. Clesi1 · Sarah E. O’Kelley1

Accepted: 9 December 2020
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC part of Springer Nature 2021

The Autism Diagnostic Observation Schedule, Second Edition (ADOS-2) and the Autism Diagnostic Interview, Revised
(ADI-R) have high accuracy as diagnostic instruments in research settings, while evidence of accuracy in clinical settings is
less robust. This meta-analysis focused on efficacy of these measures in research versus clinical settings. Articles (n = 22)
were analyzed using a hierarchical summary receiver operating characteristics (HSROC) model. ADOS-2 performance was
stronger than the ADI-R. ADOS-2 sensitivity and specificity ranged from .89-.92 and .81-.85, respectively. ADOS-2 accuracy
in research compared with clinical settings was mixed. ADI-R sensitivity and specificity were .75 and .82, respectively, with
higher specificity in research samples (Research = .85, Clinical = .72). A small number of clinical studies were identified,
indicating ongoing need for investigation outside research settings.

Keywords Autism spectrum disorder · ADOS-2 · ADI-R · Meta-analysis · Diagnosis · HSROC


Diagnostic evaluations are crucial for children with autism
spectrum disorder (ASD) to access early intervention ser-
vices and therapies. Determining the accuracy of meas-
ures commonly used for ASD assessment is necessary to
aid clinicians in making better and more accurate clinical
diagnoses. A comprehensive evaluation for autism spectrum
disorder (ASD) is most accurately conducted by a multidis-
ciplinary team through the use of information from multi-
ple sources, including a clinical observation of the child, an
ASD-focused clinical interview with caregivers, and child
and family history (Risi et al. 2006; Kim and Lord 2012;
Stewart et al. 2014). The Autism Diagnostic Observation
Schedule, Second Edition (ADOS-2; Lord et al. 2012a) and
the Autism Diagnostic Interview, Revised (ADI-R; Rutter

et al. 2003; Howes et al. 2017; Penner et al. 2017) have high
levels of diagnostic accuracy; however, both instruments
require specialized training and experience to administer and
score. Using both instruments together improves diagnostic
accuracy (sensitivity .70-.98, specificity .80-.96) compared
to each measure alone (Risi et al. 2006; Ventola et al. 2006;
Kim and Lord 2012). The multidisciplinary team, often led
by a clinical psychologist or physician (e.g., developmental/
behavioral pediatrician), takes the results of these measures
as well as other information gathered during the evaluation
and uses clinical judgement to render a final diagnosis. The
ADOS-2 and ADI-R were initially developed as research
tools and have been studied at length in the research litera-
ture, with subsequent publication for use in clinical settings
to aid diagnosis. Much of the literature published on the
accuracy of the ADOS-2 and the ADI-R utilized evaluations
from populations recruited specifically for research, and
these are the studies on which the published psychometrics
were based. However, research samples often utilize strict
exclusion criteria, such as excluding children with behavio-
ral challenges, intellectual disability, and genetic disorders,
to create a more homogenous research sample. Therefore,
results may not generalize to a clinical community sample
(de Bildt et al. 2004; Tomanik et al. 2007; Neuhaus et al.
2017), and it is important to understand the accuracy of

Supplementary Information The online version of this article
(https :// 3-020-04839 -z) contains
supplementary material, which is available to authorized users.

* Jenna B. Lebersfeld
[email protected]

1 University of Alabama at Birmingham, 1720 7th Ave S,
Birmingham, AL 35233, USA

Journal of Autism and Developmental Disorders

1 3

these measures when used in clinical practice compared to
research settings.

Several statistical approaches are available and accepted
in evaluating the accuracy of these measures in diagnosing
ASD. Sensitivity (Se) is the likelihood that a child with a
clinical diagnosis of ASD will score in the ASD range on
the measure, and specificity (Sp) indicates the likelihood
that a child without ASD will score in the non-ASD range
on the measure. Positive predictive value (PPV) is the likeli-
hood that a child who received an ASD classification on a
measure truly has a diagnosis of ASD, and negative predic-
tive value (NPV) is the likelihood that a child who scores in
the non-ASD range on a measure will not receive a clinical
ASD diagnosis. PPV and NPV are influenced by the preva-
lence of the disorder in the sample whereas Se and Sp are
not; therefore, Se and Sp are used to measure diagnostic test
accuracy when comparing across samples. The accuracy of
the ADOS-2 and ADI-R has been shown to be lower in clini-
cal settings compared to the research context (de Bildt et al.
2009; Zander et al. 2016; Langmann et al. 2017; Zander
et al. 2017; Kamp-Becker et al. 2018); however, the majority
of these clinical studies were conducted in countries outside
of the United States, including the Netherlands (de Bildt
et al. 2009; Oosterling et al. 2010), Greece (Papanikolaou
et al. 2009), Australia (Dereu et al. 2012; Gray et al. 2008),
Germany (Kamp-Becker et al. 2018) and Sweden (Zander
et al. 2015; Zander et al. 2017). Differences in sociocultural
norms as well as translations of the originally published
measures may have increased the error associated with these
measures in clinical settings. Given that children referred for
clinical evaluations in the community often have more com-
plex presentations than samples recruited for and included
in research studies, it was hypothesized that these two diag-
nostic tools may be less accurate in clinical settings than the
reported psychometrics from large scale studies conducted
in the research setting. Therefore, the purpose of this system-
atic review and meta-analysis was to determine the accuracy
and clinical utility of the ADOS-2 and the ADI-R.


This systematic review and meta-analysis utilized methods
outlined in the Preferred Reporting Items for Systematic
Review and Meta-Analysis (PRISMA) of Diagnostic Test
Accuracy studies guidelines (McInnes et al. 2018) and the
Handbook for Diagnostic Test Accuracy Reviews (Deeks
2013) and was approved by the university Institutional
Review Board. This protocol was registered with PROS-
PERO 2018 (https :// ero/displ ay_
recor d.php?ID = CRD42018111589, Registration number:

Measures for Index Tests

Autism Diagnostic Observation Schedule, Second Edition
(ADOS‑2; Lord et al. 2012a)

The ADOS-2 is a semi-structured, 45- to 60-minute obser-
vation and interaction session with an evaluator and the
child which is used to aid in the diagnosis of ASD. Only the
ADOS-2 and its direct precursors were considered as accept-
able index tests (i.e., ADOS-Toddler (Luyster et al. 2009;
ADOS-G with revised algorithms (Gotham et al. 2007)), as
they formed the basis for the WPS ADOS-2 publication. For
ease of reference, these will be referred to collectively as the
“ADOS-2.” Older ADOS versions were not considered eli-
gible index tests for the purpose of this study (i.e., ADOS-G
without the revised algorithms (Lord et al. 2000); PL-ADOS
(DiLavore et al. 1995)). Published ADOS-2 sensitivity (Se)
ranges from .60 to .95 and specificity (Sp) ranges from .75 to
1.00 (Lord et al. 2012a, b). A recent meta-analysis indicated
pooled Se ranging from .77 to .90 and Sp ranging from .62
to .90 for the ADOS-2 (Dorlack et al. 2018).

Autism Diagnostic Interview, Revised (ADI‑R; Rutter et al.

The ADI-R is a semi-structured diagnostic interview given
to a parent or caregiver by a trained clinician asking detailed
questions about development and underlying behaviors asso-
ciated with ASD. Each section of the algorithm has a raw
score cut-off, and a child must meet or exceed the cut-off
in all four sections to receive a classification of autism or
not autism if any domain cut-off is not exceeded. Se and Sp
were not published in the ADI-R manual; however, origi-
nal research literature conducted prior to measure publica-
tion indicated that Se varied widely and ranged from .19
to .88, and Sp was 1.00 (Cox et al. 1999; Gilchrist et al.
2001; Table 1). More recent literature suggests the Se of the
ADI-R ranges from .53 to .92, and Sp ranges from .62 to .95
(Risi et al. 2006; Falkmer et al. 2013). Studies which used
other algorithms such as the ADI-R Toddler diagnostic algo-
rithms or those developed by the Autism Genetic Research
Exchange (AGRE) were excluded given these algorithms are
not yet published for clinical use.

Table 1 Sensitivity and specificity of published ADI-R algorithms

Se sensitivity, Sp specificity

Article n Se Sp

Cox 1999
ADI-R at 20 months, diagnosis at 42 months 45 .19 1.00
ADI-R at 42 months, diagnosis at 42 months 45 .48 1.00
Gilchrist 2001 53 .88 1.00

Journal of Autism and Developmental Disorders

1 3

Eligibility Criteria

Studies administering either one or both of the ADI-R and
the ADOS-2 to children under 18 years for the purpose of an
initial diagnostic evaluation in a clinical or research setting
were eligible. A clinical setting was defined as a commu-
nity setting where participants were not recruited specifically
for research. A research setting included any studies which
recruited participants for research. Some studies included
participants from both community and research settings and
were classified as such. Studies in which diagnostic tests
were administered to confirm ASD diagnosis or assess treat-
ment outcome were excluded. Data for all included articles
were collected in the United States, Canada, or the United
Kingdom and were published in English.

Reference Standard for Diagnosis

The reference standard for diagnosis was the final consensus
diagnosis of a comprehensive evaluation for ASD (i.e., ASD
or non-ASD), using the following conservative approach.
The comprehensive evaluation must have included any ver-
sion of the ADOS and any ASD-focused clinical interview.
Papers using the ADOS-2 as the index test were required to
include some type of ASD-focused clinical interview in the
evaluation, but this interview did not necessarily have to be
the ADI-R. For studies in which the ADI-R served as the
index test, the evaluation must have included the administra-
tion of any version of the ADOS (i.e., PL-ADOS, ADOS-G,
ADOS-G with revised algorithms, ADOS-T, or ADOS-2) but
did not need to include the ADOS-2 specifically. Papers were
excluded in which the ADI-R was administered but no ver-
sion of the ADOS was administered.

Studies which used another method for determining ASD
or non-ASD diagnosis (e.g., pre-determined algorithm based
on a combination of ADOS and ADI-R results) were not
included, given that this type of methodology for determin-
ing ASD diagnosis does not reflect clinical practice. Studies
which did not report a final consensus clinical diagnosis and
included only diagnoses reported by a parent, pediatrician,
or educator, and/or other forms of ASD diagnosis were not

Study Design

Article eligibility included peer-reviewed original research
with prospective, retrospective, cross-sectional, or longi-
tudinal study designs. Case studies and case series were
excluded. Review articles, meta-analyses, and grey litera-
ture were not included, but citations within were reviewed.

Search Strategy

Searches were conducted in September 2018 from Psy-
cINFO, ERIC, PubMed/MEDLINE, Cochrane Database of
Systematic Reviews (including Cochrane Central Register
of Controlled Trials (CENTRAL)), Journal of Autism and
Developmental Disorders, Research in Autism Spectrum
Disorders, Autism Research, and Autism. Google Scholar
was used informally to identify keywords but not included
in the formal search strategy. All articles published since the
original publication date of each measure were considered
(ADI-R – 2003, ADOS-2 with revised algorithms—2007).
Detailed search terms are included in Online Appendix A.

Assessment of Methodological Quality

The QUADAS-2 (Quality Assessment of Diagnostic Accu-
racy Studies-2, Whiting et al. 2011) is a tool used in sys-
tematic reviews to evaluate risk of bias and applicability
concerns in diagnostic test accuracy studies related to patient
selection, index tests, reference standard, and flow and tim-
ing. The QUADAS-2 tool for this study was adapted and
operationalized from Vllasaliu et al. (2016).

Study Selection

Figure 1 reviews the process by which articles were selected
for inclusion in the study. Citations from searches (n =
11,672) were exported into EndNote and duplicate articles
(n = 2,591) were eliminated automatically. An additional
949 duplicate articles were identified manually. Therefore,
8,132 unique citations were reviewed. All titles, abstracts,
and possibly relevant full-text articles were reviewed by
two authors (JL and MS). Given differences in initial article
eligibility identification between the two authors resulting
in low initial agreement (i.e., 48 articles, 23% agreement),
the inclusion criteria were clarified, and articles were re-
reviewed by the same two authors yielding 62% agreement.
Remaining discrepancies were rectified through discussion
between these two authors and the last author (SO) as well
as via outside review by two clinical psychologists with
research backgrounds and expertise in ASD. These outside
reviewers had 100% agreement with one another. These
procedures resulted in 22 articles deemed appropriate for
inclusion in the meta-analysis, with 14 articles included in
the ADOS-2 analyses and 13 papers included in the ADI-R
analyses. Despite the complexity of the inclusion criteria,
the additional steps taken to rectify initial low agreement
likely resulted in the inclusion of all appropriate papers in
the meta-analysis.

Journal of Autism and Developmental Disorders

1 3

Fig. 1 PRISMA flow diagram

Journal of Autism and Developmental Disorders

1 3

Data Extraction

True positives (TP), false positives (FP), true negatives
(TN), false negatives (FN), Se, and Sp for the ADOS-2 and/
or the ADI-R classifications were extracted by two authors
(JL and CC). For some articles, these metrics were stated
directly in the text or presented in supplementary materials.
For articles in which these numbers were not directly stated,
these statistics were calculated using the Review Manager
(RevMan) software provided by Cochrane Library (Review
Manager 2014). A total of 116 data points was extracted by
JL and CC, and 112 data points were agreed upon (97%)
across the 22 articles. Discrepancies were identified as errors
due to referring to wrong text in the table (n = 2) or typo-
graphical or calculation errors (n = 2).

Data Analysis

Articles were organized using the RevMan software. The
hierarchical summary receiver operating characteristic
(HSROC) model of Rutter and Gatsonis (Rutter 1995; Rut-
ter and Gatsonis 2001) was conducted using the MetaDAS
SAS macro (Takwoingi and Deeks 2010). This model pro-
duces pooled Se and Sp and accounts for the correlation
between Se and Sp across studies. Separate pooling of Se
and Sp results in underestimation of these statistics, since
it does not take into account the inherent trade-off between
these statistics (Deeks 2001). Positive and negative predic-
tive values are influenced by prevalence in the sample, which
introduce heterogeneity and uncertainty. The chosen method
for statistical analysis uses a Bayesian model to determine
random effects and was preferred to fixed effects due to
the large amount of heterogeneity commonly seen among
diagnostic test accuracy studies. Additionally, the HSROC
method is recommended when covariates are included in the
model. This model also produces the Diagnostic Odds Ratio
(DOR), a global estimate of overall test accuracy. The DOR
is a summary of the diagnostic accuracy of a test and can be
interpreted as how many times higher the odds are of a per-
son with ASD to score in the ASD range on the diagnostic
test compared to someone without ASD. DOR can be used
to interpret and compare across tests and models.

Statistical analyses were conducted separately for the
ADOS-2 and the ADI-R. The HSROC model was computed
with and without the setting covariate to determine whether
setting had an effect on diagnostic test accuracy. The setting
covariate included three groups: clinical, research, and both.

For the ADI-R analysis, the model converged using these
three groups. For the ADOS-2 analyses, having three groups
did not allow the model to converge. Therefore, the “both”
group was combined with the “research” group for the

ADOS-2 analyses of the setting covariate. Combining the
“both” group with the “research group” was viewed as the
more conservative approach compared with combining the
“both” and “clinical” groups. If, as hypothesized, the admin-
istration of the ADOS-2 in research settings was more accu-
rate than clinical settings, including articles with clinical
evaluations in the “research” group would dilute the accu-
racy of the ASD diagnostic measures within the “research”
setting and reduce the difference in accuracy of the ADOS-2
in clinical and research settings in this study.

Outliers and Sensitivity Analysis

Studies were plotted graphically on HSROC plots and
visually inspected for outliers, with one article with low
Sp identified as an outlier in the ADOS-2 analysis. Study
characteristics were reviewed, and low sensitivity was
likely due to the clinical population, which included many
children with severe developmental and behavioral chal-
lenges, resulting in many false positives on the ADOS-2.
Although these children are often excluded from research
studies, they present for clinical evaluations, and it is
important to investigate the accuracy of diagnostic meas-
ures in these populations. However, these study results
may not generalize to other clinical settings given the
sample characteristics. Therefore, the authors conducted
analyses both with and without the outlier. Sp analyses
were conducted by removing the outlier article and repeat-
ing the analyses. Results were compared with and without
the outlier to determine the effect of this specific study on
the results, as discussed below. No outliers were identified
for the ADI-R analysis.

The Gotham et al. (2007, 2008) papers provided sepa-
rate Se and Sp estimates based on differing criteria from
the Diagnostic and Statistical Manual for Mental Disorders,
Fourth Edition (DSM-IV, American Psychiatric Association
2000) for two instances: Autism (i.e., Autistic Disorder) vs.
Non-spectrum (NS) and ASD vs. NS. In the Autism vs.
NS analysis, PDD-NOS and Asperger Disorder cases were
excluded and ADOS-2 classifications of ASD were classified
as non-spectrum. In the ASD vs NS condition, children with
Autistic Disorder were excluded and ADOS-2 classifications
of “autism spectrum” and “autism” were both considered
classifications of ASD. For the purposes of the current study,
including both estimates in a single analysis would result
in the inclusion of the non-spectrum cases more than once,
thus separate analyses were conducted for the Autism vs.
NS and ASD vs. NS estimates for the Gotham et al. articles.
Additionally, results were analyzed both with and without
the outlier. For clarity, analytic approaches for the ADOS-2
are defined in Table 2.

Journal of Autism and Developmental Disorders

1 3


Table  3 outlines study characteristics for the 22 articles
included in the meta-analysis.

Quality of the Included Studies

Figure 2 displays metrics used for evaluating quality of the
studies including risk of bias and applicability concerns.

Risk of bias was unclear or high risk for 12 of the 22 papers
(54%), and there were concerns regarding the use of the
reference standard (i.e., unclear or high risk for all articles).
This was primarily due to the clinicians’ knowledge of the
results of the index tests prior to the implementation of
the reference standard, as opposed to using blind raters to
come to a diagnostic conclusion. This is common practice
in clinical settings, as the index tests (i.e., the ASD diagnos-
tic measures) are inextricably linked and used as a primary
source of information in the reference standard (i.e., ASD
diagnostic evaluation and final clinical diagnosis) (Figs 3
and 4). Overall, there was low risk of bias from the index
tests, flow and timing, and applicability of the findings to

Diagnostic Accuracy of Measures


Estimates of overall Se (.89–.92) and Sp (.81–.85) of the
ADOS-2 as well as individual estimates for identified articles

Table 2 ADOS-2 Analytical Approaches

Approach Outlier article Gotham et al. 2007,

1 Included ASD vs. NS
2 Included Autism vs. NS
3 Excluded ASD vs. NS
4 Excluded Autism vs. NS
5 Included Excluded
6 Excluded Excluded

Table 3 Study characteristics

a 57 to 86% male
b Diagnosis deferred n = 14
c 4 years for younger group, 9 years for older group
d One participant diagnosis not reported

Study Test(s) Total N Sex Age Diagnosis

ADOS-2 ADI-R Male n Female n M or Range ASD n Non-ASD n

1. Baird 2006 X 255 223 32 12 years 158 97
2. Bishop et al. 2017 X X 289 203 86 8 years 142 126
3. Camodeca 2018 X 483 355 128 10 years 127 356
4. Dykens 2017 X 146 72 74 11 years 32 114
5. Gillentine 2017 X X 18 12 6 9 years 7 10
6. Gotham et al. 2007 X 1630 a a 41 to 104 months 1,351 279
7. Gotham et al. 2008 X 1282 923 359 37 to 118 months 1,068 214
8. Grzadzinski 2016 X X 212 176 36 9 years 164 48
9. Guthrie 2013 X 82b 64 18 19 months 56 12
10. Harris 2008 X 63 63 0 8 years 38 25
11. Havdahl 2016 X 389 288 101 c 255 163
12. Kim 2012 X 695d 353 160 33 months 491 203
13. Le Couteur 2008 X 101 81 20 36 months 77 24
14. Luyster 2009 X 206 158 48 15 to 26 months 59 147
15. Mazefsky 2006 X 78 56 22 4 years 59 19
16. Molloy 2011 X 584 507 77 3 to 9 years 329 255
17. Risi 2006 X 1039 818 221 27 to 94 months 881 158
18. Ventola 2006 X 45 37 8 26 months 36 9
19. Wiggins 2008 X 142 112 30 26 months 73 69
20. Wiggins 2015 X X 922 581 341 59 months 584 338
21. Ziats 2016 X X 18 14 4 14 years 8 10
22. Zwaigenbaum 2016 X 381 215 166 39 months 103 278

Journal of Autism and Developmental Disorders

1 3

Fig. 2 QUADAS-2 risk of bias
and applicability concerns

Journal of Autism and Developmental Disorders

1 3

(Se =.85–1.00; Sp =.44–1.00) are presented in Table 4 and
Fig. 5. These estimates were generally comparable to pub-
lished algorithms (Table 5). Addition of the setting covariate

was significant (−2LL = 7.87, p < .05) when the Gotham
et  al. (2007, 2008) papers were excluded (Table  4). The
highest DOR was reported within clinical samples when
the outlier was excluded and the Gotham et al. (2007, 2008)
papers utilized the Autism vs. Non-Spectrum algorithms
(Table 4). When all articles were included, the DOR was
higher for research compared with clinical samples; how-
ever, inclusion of the setting covariate was not significant (p
=.071). Exclusion of the outlier had little effect on Se of the
clinical sample but increased the Sp of the clinical sample
from .80 to .90, which is higher than specificities reported
in research samples (.81 and .83; Table 4).

Interpretation of the SROC plot (Fig. 3) for all three set-
ting types (clinical, research, and both) when all articles
were included in the analysis and the Gotham et al. (2007,
2008) ASD vs. NS accuracy estimates were used (Approach
1) suggests research samples have higher levels of accuracy
compared with clinical samples and combined clinical and
research samples. When the outlier (Sp =.44) was removed
from the analysis (Fig. 4), and the ASD vs. NS accuracy
estimates were used (Approach 3), visual inspection of the
SROC curve suggests there was not a difference between
accuracy of the ADOS-2 in research and clinical settings, and
accuracy of the ADOS-2 for studies including both research
and clinical evaluations was lower than either research or
clinical settings individually.


The ADI-R pooled Se was .75, Sp was .82, and individual
articles ranged widely (Se =.33–1.00, Sp =.61–1.00, see
Fig. 6 and Table 6).

Inclusion of the setting covariate in the model compared
to the model without the covariate trended toward signif-
icance (−2LL difference = 11.788, p = .067, see Fig. 7).
Clinical and research samples had comparable Se (clinical
= .71, research = .73) but articles utilizing both research
and clinical samples had higher Se (.82). Sp was higher for
research samples (.85) compared to clinical samples (.72)
and those including both research and clinical evaluations
in the study (.76, see Table 6 and Fig. 6).


This study utilized a systematic review and meta-analysis
to investigate the accuracy of the ADOS-2 and the ADI-R
in clinical settings compared to research settings, and it was
hypothesized that these measures would perform better in
research settings given the heterogeneity and complexity of
children referred for an ASD evaluation in clinical samples.
ADOS-2 accuracy from the meta-analysis was comparable

Fig. 3 SROC plot of ADOS-2 by setting for Approach 1 (outlier
included). Note: Size of shape indicates sample size

Fig. 4 SROC plot of ADOS-2 by setting for Approach 3 (outlier
excluded). Note: Size of shape indicates sample size

Journal of Autism and Developmental Disorders

1 3

to accuracy reported in the published manual and was more
accurate than the ADI-R in both research and clinical set-
tings. For the ADI-R, the current meta-analysis painted a
more nuanced picture than the literature cited in the pub-
lished manual with overall Se of .75 and Sp of .82, and the
ADI-R was less accurate in clinical studies compared to
research-only studies or those utilizing both research and
clinical samples.

For the ADOS-2, when comparing samples of children
evaluated in clinical settings with those whose evalua-
tions were completed in research settings (or which used a
combination of clinical and research evaluations), analyses
indicated Se was comparable across settings and Sp results
were mixed. Some analyses indicated comparable or slightly
lower Sp in clinical compared to research samples, whereas
when an outlier was excluded, results showed that Sp in clin-
ical samples was higher than research samples. This suggests

Table 4 Sensitivity and
specificity of ADOS-2 overall
and by evaluation setting

Se sensitivity, Sp specificity, DOR diagnostic odds ratio, −2LL −2 log likelihood difference, “–-” data not
* p < .05

Approach Overall Research or both Clinical −2LL p

n Se Sp DOR n Se Sp DOR n Se Sp DOR

1 14 .89 .81 36.5 11 .89 .80 34.8 3 .89 .80 31.0 7.02 .071
2 14 .92 .83 52.7 11 .92 .83 59.2 3 .89 .80 30.9 5.81 .120
3 13 .89 .83 42.3 11 .89 .81 36.3 2 .88 .90 71.1 3.23 .357
4 13 .92 .85 61.9 11 .92 .83 59.7 2 .88 .90 70.8 2.83 .418
5 12 .91 .81 47.0 9 .93 .81 53.8 3 .89 .80 31.0 7.87 .049*
6 11 .92 .84 55.7 – – – – – – – – 7.53 .057

Fig. 5 Forest plot of ADOS-2 by setting using the Gotham ASD vs. NS estimates

Table 5 Sensitivity and
specificity of published ADOS

NVMA nonverbal mental age, ASD autism spectrum disorder, NS non-spectrum, AUT autism, Se sensitivity,
Sp specificity, “–”data not available

Gotham et al. 2007 Gotham et al. 2008

Module and algorithm AUT vs. NS ASD vs. NS AUT vs. NS ASD vs. NS

Se Sp Se Sp Se Sp Se Sp

Module 1, no words,
NVMA > 15 mo.

.95 .94 .82 .79 .86 .80 – –

Module 1, some words .97 .91 .77 .82 .89 .91 .95 1.00
Module 2, younger .98 .93 .84 .77 .94 1.00 .65 .88
Module 2, older .98 .90 .83 .83 – – – –
Module 3 .91 .84 .72 .76 .82 .92 .60 .75

Journal of Autism and Developmental Disorders

1 3

that given the small number of studies identified that were
conducted in solely clinical settings, a single article can
have a large effect on results. Therefore, more research is
needed to further examine ADOS-2 performance in clinical
evaluations. Given current findings, Sp of the ADOS-2 may
be more variable across clinical settings, whereas Se may
remain relatively stable.

Sources of Heterogeneity

One limitation of this meta-analysis is that additional
sources of heterogeneity were not investigated due to the
limited number of eligible articles identified for inclusion.
One consideration is the shifting definition of autism spec-
trum disorder over time. Current diagnoses are based on cri-
teria for Autism Spectrum Disorder outlined in the Diagnos-
tic and Statistical Manual of Mental Disorders, Fifth Edition
(DSM-5, American Psychiatric Association 2013), which
conceptualizes ASD as a single disorder with differing levels
of severity. The DSM-IV defined multiple types of autism
spectrum disorders including Asperger’s Disorder; Autis-
tic Disorder; and Pervasive Developmental Disorder, Not
Otherwise Specified (PDD-NOS). However, these disorders
could not be reliably differentiated, which led to the revi-
sion of the diagnostic criteria in the DSM-5. The ADOS-2
revised algorithms reflect this change in conceptualization of
ASD. However, the ADI-R has not yet been updated, and the
ADI-R manual states the measure only reliably differentiates
between those with Autistic Disorder (DSM-IV) and other
non-spectrum conditions, not those with milder ASD symp-
toms. These factors further complicate the already multifac-
eted ASD diagnostic process. Within the research literature,
ADI-R algorithms have been developed to capture milder
presentations of ASD. However, these types of algorithms
are used predominately for research, have not been published
for clinical use, and are not widely used clinically. Given
that a primary aim of this study was to investigate how

Fig. 6 ADI-R forest plot by setting

Table 6 Sensitivity and specificity of ADI-R Overall and by evalua-
tion setting

Setting n Sens Spec DOR

Overall 13 .75 .82 13.6
Research 9 .73 .85 15.8
Both 2 .82 .76 15.9
Clinical 2 .71 .72 6.2

Fig. 7 ADI-R SROC plot by setting. Note: size of shape indicates
sample size

Journal of Autism and Developmental Disorders

1 3

these measures function in clinical settings, articles utiliz-
ing alternative and less disseminated diagnostic algorithms
were not included in the meta-analysis. Wider clinical use of
these types of algorithms may be beneficial in improving the
diagnostic accuracy of the ADI-R. This distinction further
emphasizes the importance of clinical expertise in accurate
differential diagnosis of ASD from other non-ASD condi-
tions and neurodevelopmental disorders which impact social
communication (Maddox et al. 2017; Reaven et al. 2008).

Risk and Sources of Bias

The QUADAS-2 identified no studies with concerns of the
applicability of the results to practice. This is likely due to
the eligibility criteria of the studies included in the meta-
analysis, which specified the types of measures and evalu-
ations which were considered acceptable based on current
clinical practice. However, nearly half of the articles had
high risk of bias regarding patient selection, most often due
to not enrolling a consecutive or random sample of par-
ticipants in the study. Additionally, high or unclear levels
regarding risk of bias of the reference standard were indi-
cated for all studies. This is inherent in the nature of conduct-
ing ASD evaluations, as the reference standard (i.e., outcome
diagnosis) was almost always interpreted with knowledge of
the index tests (e.g., ADOS-2), as is true in clinical practice.
Clinicians making ASD diagnoses were therefore not blind
to the results of the index tests; in fact, clinicians utilize the
results of the index tests as part of the information used to
make the final clinical diagnosis. Therefore, the reference
standard is inherently influenced by the results of the index
tests and cannot be interpreted separately. Although some
research studies may consider utilizing techniques to miti-
gate these concerns of bias, including having outside video
reviewers or independent re-evaluations, this does not occur
clinically. The accuracy of these measures in clinical prac-
tice is predicated on the clinician administering and scor-
ing the measures accurately, without outside confirmation.
Given that a primary goal of this study was to investigate the
utility of these measures in clinical practice, and the index
test results and reference standard are inextricably linked in
this type of evaluation, this bias is considered inherent in any
comprehensive clinical evaluation for ASD.

Additionally, only peer-reviewed, published articles were
considered for inclusion in this meta-analysis since the reli-
ability of the information presented from other types of
sources can be variable and difficult to determine. However,
many other sources of potentially useful information were
excluded. There is a clear publication bias within the inter-
vention literature wherein studies with negative findings are
often not accepted for publication, but this bias is less often
observed in studies focused on diagnostics. There may be a
publication bias regarding the level of training completed

by providers administering the ASD diagnostic measures.
Two levels of training are available for the ADOS-2 and the
ADI-R: clinical training, which is for professionals using the
measure in clinical practice, and research training, which
is designed for those who use the instrument for research.
The clinical training is a prerequisite for the research train-
ing. The majority of professionals utilizing the ADOS-2
and the ADI-R in clinical practice likely have completed the
clinical training but have not attended the research training.
However, peer-reviewed journals may favor publication of
studies utilizing research-reliable clinicians. Therefore, the
identified sensitivity and specificity in clinical settings in
this study may overestimate the accuracy of these meas-
ures when conducted by providers who only have completed
the clinical training, but the effect of excluding non-peer-
reviewed articles is not known.

An additional consideration is the decision to include
only articles conducted in the United States, Canada, and
the United Kingdom. Sociocultural and language factors
are crucial to consider when conducting ASD evaluations.
Although many language translations are available for both
the ADOS-2 and the ADI-R, these measures were initially
designed in English using Western sociocultural norms, and
the vast majority of research and development of these meas-
ures was conducted under similar parameters. Notably, the
language of test administration was not reported for all but
one of the studies included in this meta-analysis. Therefore,
restricting the inclusion criteria to research conducted in
the United States, Canada, and the United Kingdom was
determined to be the best method available as a proxy to
representing the sample for which these measures were ini-
tially developed. It would be beneficial for future articles to
directly specify the language in which the evaluations were
conducted and the sociocultural background of the partici-
pants and their families.


This systematic review and meta-analysis of the ADOS-2
and the ADI-R determined that the ADOS-2 is more accu-
rate than the ADI-R. The ADOS-2 indicated high levels of
sensitivity and specificity across settings, and it should be
considered for any ASD evaluation. ASD diagnostic meas-
ures may be less accurate in clinical compared to research
settings, but more research utilizing solely clinical popula-
tions is needed.

Acknowledgements The authors would like to thank the UAB
Libraries Reference Department for their support in formalizing and
improving the search strategy for this project, Sarah Ryan, Ph.D. and

Journal of Autism and Developmental Disorders

1 3

Cassandra Newsom, Ph.D. for serving as article reviewers, and Dustin
Long, Ph.D. for assistance with biostatistical analyses. This research
was supported in part by the Health Resources and Services Adminis-
tration (HRSA) Maternal and Child Health Bureau (MCH) Leadership
Education in Neurodevelopmental and Related Disabilities (LEND; PI:
Biasini), UAB Civitan International Science Center and Foundation for
Children with Intellectual and Developmental Disabilities McNulty
Scientist Award (O’Kelley), and the UAB Civitan-Sparks Clinics.

Author Contributions JL designed the project and wrote the protocol
supervised by SO. JL and MS conducted the literature searches and
determined article eligibility. JL and CD completed data extraction,
and JL conducted the statistical analysis. JL wrote the first draft of the
manuscript and SO provided substantial edits and guidance. All authors
have approved the final manuscript.


Review Manager (RevMan) [Computer program]. Version 5.3. Copen-
hagen: The Nordic Cochrane Centre, The Cochrane Collaboration,

American Psychiatric Association (2000). Diagnostic and statistical
manual of mental disorders (4th ed., Text Revision). Washington,
DC: Author.

American Psychiatric Association. (2013). Diagnostic and statistical
manual of mental disorders (5th ed.). Arlington, VA: Author.

Baird, G., Simonoff, E., Pickles, A., Chandler, S., Loucas, T., Meldrum,
D., & Charman, T. (2006). Prevalence of disorders of the autism
spectrum in a population cohort of children in South Thames: the
Special Needs and Autism Project (SNAP). Lancet, 368, 210–15.

Camodeca, A. (2018). Utility of three N-Item scales of the child
behavior checklist 6–18 in autism diagnosis. Research in
Autism Spectrum Disorder, 51, 75–85. https ://

Bishop, S. L., Huerta, M., Gotham, K., Havdahl, K. A., Pickles, A.,
Duncan, A., et  al. (2017). The Autism Symptom Interview,
School-Age: A brief telephone interview to identify autism spec-
trum disorders in 5-to-12-year-old children. Autism Research,
10(1), 78–88. https ://

Cox, A., Klein, K., Charman, T., Baird, G., Baron-Cohen, S., Swetten-
ham, J., et al. (1999). Autism spectrum disorders at 20 and 42
months of age: stability of clinical and ADI-R diagnosis. Journal
of Child Psychology and Psychiatry, and Allied Disciplines, 40(5),

De Bildt, A., Sytema, S., Ketelaars, C., Kraijer, D., Mulder, E., Volk-
mar, F., & Minderaa, R. (2004). Interrelationship between autism
diagnostic observation schedule-generic (ADOS-G), autism diag-
nostic interview-revised (ADI-R), and the diagnostic and statis-
tical manual of mental disorders (DSM-IV-TR) classification
in children and adolescents with mental retardation. Journal of
Autism and Developmental Disorders, 34(2), 129–137.

De Bildt, A., Sytema, S., van Lang, N. D. J., Minderaa, R. B., van
Engeland, H., & de Jonge, M. V. (2009). Evaluation of the ADOS
revised algorithm: the applicability in 558 Dutch children and
adolescents. Journal of Autism and Developmental Disorders,
39(9), 1350–8. https :// 3-009-0749-9.

Deeks, J. J. (2001). Systematic reviews of evaluations of diagnostic and
screening tests. British Medical Journal, 323, 157–162.

Deeks J. J., Wisniewski S., & Davenport C. (2013). Cochrane Hand-
book for Systematic Reviews of Diagnostic Test Accuracy Version

1.0.0. The Cochrane Collaboration, 2013. http://srdta .cochr ane.

Dereu, M., Roeyers, H., Raymaekers, R., Meirsschaut, M., & Warreyn,
P. (2012). How useful are screening instruments for toddlers to
predict outcome at age 4? General development, language skills,
and symptom severity in children with a false positive screen for
autism spectrum disorder. European Child Adolesc Psychiatry,
21(10), 541–551.

DiLavore, P. C., Lord, C., & Rutter, M. (1995). The pre-linguistic
autism diagnostic observation schedule. Journal of Autism and
Developmental Disorders, 25(4), 355–379.

Dorlack, T. P., Myers, O. B., & Kodituwakku, P. W. (2018). A com-
parative analysis of the ADOS-G and ADOS-2 algorithms:
preliminary findings. Journal of Autism and Developmental
Disorders, 1–12.

Dykens, E. M., Roof, E., Hunt-Hawkins, J., Dankner, N., Lee, E. B.,
Shivers, C. M., et al. (2017). Diagnoses and characteristics of
autism spectrum disorders in children with Prader-Willi syn-
drome. Journal of Neurodevelopmental Disorders, 9(18), 1–12.
https :// 9-017-9200-2.

Falkmer, T., Anderson, K., Falkmer, M., & Horlin, C. (2013). Diag-
nostic procedures in autism spectrum disorders: A systematic
literature review. European Child & Adolescent Psychiatry,
22(6), 329–40. https :// 7-013-0375-0.

Gilchrist, A., Green, J., Cox, A., Burton, D., Rutter, M., & Le Couteur,
A. (2001). Development and current functioning in adolescents
with Asperger syndrome: a comparative study. Journal of Child
Psychology and Psychiatry, and Allied Disciplines, 42(2), 227–40.

Gillentine, M. A., Berry, L. N., Goin-Kochel, R. P., Ali, M. A., Ge,
J., Guffey, D., et al. (2017). The cognitive and behavioral phe-
notypes of individuals with CHRNA7 duplications. Journal of
Autism and Developmental Disorders, 47(3), 549–562. https :// 3-016-2961-8.

Gotham, K., Risi, S., Dawson, G., Tager-Flusberg, H., Joseph, R.,
Carter, A., et al. (2008). A Replication of the autism diagnos-
tic observation schedule (ADOS) revised algorithms. Journal of
the American Academy of Child & Adolescent Psychiatry, 47(6),
642–651. https :// e3181 6bffb 7.

Gotham, K., Risi, S., Pickles, A., & Lord, C. (2007). The Autism Diag-
nostic Observation Schedule: Revised algorithms for improved
diagnostic validity. Journal of Autism and Developmental Dis-
orders, 37(4), 613.

Gray, K. M., Tonge, B. J., & Sweeney, D. J. (2008). Using the Autism
Diagnostic Interview-Revised and the Autism Diagnostic Obser-
vation Schedule with young children with developmental delay:
evaluating diagnostic validity. Journal of Autism and Develop-
mental Disorders, 38(4), 657–667.

Grzadzinski, R., Dick, C., Lord, C., & Bishop, S. (2016). Parent-
reported and clinician-observed autism spectrum disorder (ASD)
symptoms in children with attention deficit/hyperactivity disor-
der (ADHD): implications for practice under DSM-5. Molecular
Autism, 7(7), 1–12. https :// 9-016-0072-1.

Guthrie, W., Swineford, L. B., Nottke, C., & Wetherby, A. M. (2013).
Early diagnosis of autism spectrum disorder: stability and change
in clinical diagnosis and symptom presentation. Journal of Child
Psychiatry, 54(5), 582–590. https :// .

Harris, S. W., Hess, D., Goodlin-Jones, B., Ferranti, J., Bacal-
man, S., Barbato, I., et  al. (2008). Autism profiles of males
with fragile X syndrome. American Journal on Intellectual
and Developmental Disabilities, 113(6), 427–438. https ://doi.

Havdahl, K. A., von Tetzchner, S., Huerta, M., Lord, C., & Bishop, S.
L. (2016). Utility of the child behavior checklist as a screener for
autism spectrum disorder. Autism Research, 9(1), 33–42. https ://

Journal of Autism and Developmental Disorders

1 3

Howes, O. D., Rogdaki, M., Findon, J. L., Wichers, R. H., Charman, T.,
King, B. H., et al. (2017). Autism spectrum disorder: Consensus
guidelines on assessment, treatment and research from the British
Association for Psychopharmacology. Journal of Psychopharma-
cology, 32(1), 3–29. https :// 81117 74176 6.

Kamp-Becker, I., Albertowski, K., Becker, J., Ghahreman, M., Lang-
mann, A., Mingebach, T., Poustka, L., Weber, L., Schmidt, H.,
Smidt, J., Stehr, T., Roessner, V., Kucharczyk, K., Wolff, N.,
& Stroth, S., (2018). Diagnostic accuracy of the ADOS and
ADOS-2 in clinical practice. European Child & Adolescent
Psychiatry, 1–15.

Kim, S. H., & Lord, C. (2012). Combining information from multi-
ple sources for the diagnosis of autism spectrum disorders for
toddlers and young preschoolers from 12 to 47 months of age.
Journal of Child Psychology and Psychiatry, 53(2), 143–151.

Langmann, A., Becker, J., Poustka, L., Becker, K., & Kamp-Becker,
I. (2017). Diagnostic utility of the autism diagnostic observa-
tion schedule in a clinical sample of adolescents and adults.
Research in Autism Spectrum Disorders, 34, 34–43.

Le Couteur, A., Haden, G., Hammal, D., & McConachie, H. (2008).
Diagnosing autism spectrum disorders in pre-school children
using two standardised assessment instruments: The ADI-R and
the ADOS. Journal of Autism and Developmental Disorders, 38,
362–372. https :// 3-007-0403-3.

Lord, C., Risi, S., Lambrecht, L., Cook, E. H., Leventhal, B. L.,
DiLavore, P. C., et al. (2000). The autism diagnostic observation
schedule – generic: A standard measure of social and communi-
cation deficits associated with the spectrum of autism. Journal
of Autism and Developmental Disorders, 30(3), 205–223.

Lord, C., Rutter, M., DiLavore, P. C., Risi, S., Gotham, K., & Bishop,
S. (2012). Autism diagnostic observation schedule: ADOS-2.
Los Angeles, CA: Western Psychological Services.

Lord, C., Rutter, M., DiLavore, P. C., Risi, S., Gotham, K., & Bishop,
S. (2012b). ADOS-2. Autism Diagnostic Observation Schedule.
Manual (Part I): Modules 1-4. Western Psychological Services
Los Angeles, CA.

Luyster, R., Gotham, K., Guthrie, W., Coffing, M., Petrak, R., Pierce,
K., et al. (2009). The Autism diagnostic observation schedule
– toddler module: A new module of a standardized diagnostic
measure for autism spectrum disorders. Journal of Autism and
Developmental Disorders, 39(9), 1305–1320.

Maddox, B. B., Brodkin, E. S., Calkins, M. E., Shea, K., Mullan,
K., Hostager, J., et  al. (2017). The accuracy of the ADOS-2
in identifying autism among adults with complex psychiatric
conditions. Journal of Autism and Developmental Disorders,
47(9), 2703–2709. https :// 3-017-3188-z.

Mazefsky, C., & Oswald, D. P. (2006). The discriminative ability
and diagnostic utility of the ADOS–G, ADI–R, and GARS for
children in a clinical setting. Autism, 10(6), 533–549. https :// 61306 06850 5.

McInnes, M. D., Moher, D., Thombs, B. D., McGrath, T. A.,
Bossuyt, P. M., Clifford, T., et al. (2018). Preferred reporting
items for a systematic review and meta-analysis of diagnos-
tic test accuracy studies: the PRISMA-DTA statement. JAMA,
319(4), 388–396.

Molloy, C., Murray, D. S., Akers, R., Mitchell, T., & Manning-
Courtney, P. (2011). Use of the autism diagnostic observation
schedule (ADOS) in a clinical setting. Autism, 15(2), 143–162.
https :// 61310 37924 1.

Neuhaus, E., Beauchaine, T. P., Bernier, R. A., & Webb, S. J. (2017).
Child and family characteristics moderate agreement between
caregiver and clinician report of autism symptoms. Autism
Research, 11(3), 476–487.

Oosterling, I. J., Roos, S., de Bildt, A., Rommelse, N., de Jonge,
M., Visser, J., et al. (2010). Improved diagnostic validity of the
ADOS revised algorithms: a replication study in an independent

sample. Journal of Autism and Developmental Disorders, 40(6),
689–703. https :// 3-009-0915-0.

Papanikolaou, K., Paliokosta, E., Houliaras, G., Vgenopoulou, S.,
Giouroukou, E., Pehlivanidis, A., et al. (2009). Using the autism
diagnostic interview-revised and the autism diagnostic obser-
vation schedule-generic for the diagnosis of autism spectrum
disorders in a Greek sample with a wide range of intellectual
abilities. Journal of Autism and Developmental Disorders,
39(3), 414–420.

Penner, M., Anagnostou, E., Andoni, L. Y., & Ungar, W. J. (2017).
Systematic review of clinical guidance documents for autism
spectrum disorder diagnostic assessment in select regions.
Autism, 22(5), 517–527.

Reaven, J. A., Hepburn, S. L., & Ross, R. G. (2008). Use of the
ADOS and ADI-R in children with psychosis: Importance of
clinical judgment. Clinical Child Psychology and Psychiatry,
13(1), 81–94. https :// 04507 08634 3.

Risi, S., Lord, C., Gotham, K., Corsello, C., Chrysler, C., Szatmari,
P., et al. (2006). Combining information from multiple sources
in the diagnosis of autism spectrum disorders. Journal of the
American Academy of Child and Adolescent Psychiatry, 45(9),

Rutter, C. M. (1995). Regression methods for meta-analysis of diag-
nostic test data. Academic Radiology, 2, S48–S56.

Rutter, C. M., & Gatsonis, C. A. (2001). A hierarchical regression
approach to meta-analysis of diagnostic test accuracy evalua-
tions. Statistics in Medicine, 20(19), 2865–2884.

Rutter, M., Le Couteur, A., Lord, C., et al. (2003). Autism diagnos-
tic interview-revised. Los Angeles, CA: Western Psychological
Services, 29, 30.

Stewart, J. R., Vigil, D. C., Ryst, E., & Yang, W. (2014). Refin-
ing best practices for the diagnosis of autism: A comparison
between individual healthcare practitioner diagnosis and trans-
disciplinary assessment. Nevada Journal of Public Health,
11(1), 1.

Takwoingi, Y. & Deeks, J. (2010). MetaDAS: A SAS macro for meta-
analysis of diagnostic accuracy studies. User Guide Version 1.3.
2010 July. http://srdta .cochr

Tomanik, S. S., Pearson, D. A., Loveland, K. A., Lane, D. M., &
Shaw, J. B. (2007). Improving the reliability of autism diag-
noses: Examining the utility of adaptive behavior. Journal of
Autism and Developmental Disorders, 37(5), 921–928.

Ventola, P. E., Kleinman, J., Pandey, J., Barton, M., Allen, S., Green,
J., et al. (2006). Agreement among four diagnostic instruments
for autism spectrum disorders in toddlers. Journal of Autism and
Developmental Disorders, 36(7), 839–47.

Vllasaliu, L., Jensen, K., Hoss, S., Landenberger, M., Menze, M.,
Schütz, M., et  al. (2016). Diagnostic instruments for autism
spectrum disorder (ASD). The Cochrane Library.

Whiting, P. F., Rutjes, A. W. S., Westwood, M. E., Mallett, S.,
Deeks, J. J., Reitsma, J. B., et al. (2011). QUADAS-2: A revised
tool for the quality assessment of diagnostic accuracy stud-
ies. Annals of Internal Medicine, 155(8), 529–536. https ://doi.
org/10.7326/0003-4819-155-8-20111 0180-00009 .

Wiggins, L. D., Reynolds, A., Rice, C. E., Moody, E. J., Bernal,
P., Blaskey, L., et  al. (2015). Using standardized diagnostic
instruments to classify children with autism in the Study to
Explore Early Development. Journal of Autism and Develop-
mental Disorders, 45, 1271–1280. https ://

Wiggins, L. D., & Robins, D. L. (2008). Brief Report: excluding the
ADI-R behavioral domain improves diagnostic agreement in
toddlers. Journal of Autism and Developmental Disorders, 38,
972–976. https :// 3-007-0456-3.

Zander, E., Sturm, H., & Bӧlte, S. (2015). The added value of the
combined use of the autism diagnostic interview-revised and the

Journal of Autism and Developmental Disorders

1 3

autism diagnostic observation schedule: Diagnostic validity in
a clinical Swedish sample of toddlers and young preschoolers.
Autism, 19(2), 187–199.

Zander, E., Willfors, C., Berggren, S., Choque-Olsson, N., Coco, C.,
Elmund, A., et al. (2016). The objectivity of the Autism Diag-
nostic Observation Schedule (ADOS) in naturalistic clinical set-
tings. European Child Adolescent Psychiatry, 25(7), 769–780.

Zander, E., Willfors, C., Berggren, S., Coco, C., Holm, A., Jifält, I.,
et al. (2017). The interrater reliability of the autism diagnostic
interview-revised (ADI-R) in clinical settings. Psychopathol-
ogy, 50(3), 219–227.

Ziats, M. N., Goin-Kochel, R. P., Berry, L. N., Ali, M., Ge, J.,
Guffey, D., et al. (2016). Genetics in Medicine, 18(11), 1111–
1118. https ://

Zwaigenbaum, L., Bryson, S. E., Brian, J., Smith, I. M., Roberts, W.,
Szatmari, P., et al. (2016). Stability of diagnostic assessment for
autism spectrum disorder between 18 and 36 months in a high-
risk cohort. Autism Research, 9, 790–800. https ://

Publisher’s Note Springer Nature remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.

  • Systematic Review and Meta-Analysis of the Clinical Utility of the ADOS-2 and the ADI-R in Diagnosing Autism Spectrum Disorders in Children
    • Abstract
    • Introduction
    • Methods
      • Measures for Index Tests
        • Autism Diagnostic Observation Schedule, Second Edition (ADOS-2; Lord et al. 2012a)
        • Autism Diagnostic Interview, Revised (ADI-R; Rutter et al. 2003).
      • Eligibility Criteria
        • Reference Standard for Diagnosis
        • Study Design
      • Search Strategy
      • Assessment of Methodological Quality
      • Study Selection
      • Data Extraction
      • Data Analysis
        • Outliers and Sensitivity Analysis
    • Results
      • Quality of the Included Studies
      • Diagnostic Accuracy of Measures
        • ADOS-2
        • ADI-R
    • Discussion
      • Sources of Heterogeneity
      • Risk and Sources of Bias
    • Conclusion
    • Acknowledgements
    • References

Our writing experts are ready and waiting to assist with any writing project you may have. From simple essays, research papers, lab reports, and dissertations, to online classes, you can be sure we have a service that perfectly matches your needs.

Order a Similar Paper Order a Different Paper