Chapter 5. Assessment Tools
At this point in the process, your department has a program mission statement
(Step 1) and a set of learning outcomes (Step 2). Step 3 is to review, evaluate,
and select a tool or tools to assess student achievement in each of the learning
outcomes. Selection of a tool involves a tradeoff between the ability to obtain
detailed information and the need to keep the process feasible and manageable.
For this reason, the chapter gives advantages and disadvantages for each of
the various assessment tools. Most assessment experts believe it is important
to use multiple assessment tools to overcome the disadvantages of a single tool,
albeit with added work and expense.
A. Issues to Consider in Choosing an Assessment
Tool
Assessment tools can generally be placed in two categories, direct and indirect.
Sometimes a tool from each of these categories is used to get a more holistic
view of student learning.
Direct measures of assessment are those in which the products
of student work are evaluated in light of the learning outcomes for the program.
Evidence from coursework such as projects or specialized tests of knowledge
or skill are examples of direct measures. In all cases, direct measures involve
the evaluation of demonstrations of student learning.
Indirect measures of assessment are those in which students
judge their own ability to achieve the learning outcomes. Indirect measures
are not based directly on student academic work but rather on what students
perceive about their own learning. Alumni may also be asked the extent to which
the program prepared them to achieve learning outcomes. In another example,
people in contact with the students, such as employers, may be asked to judge
the effectiveness of program graduates. In all cases, the assessment is based
on perception rather than direct demonstration.
Direct measures tend to be more time- and labor-intensive than indirect measures,
which can often be handled through surveys. Without the direct evaluation of
student work, larger sample sizes may be possible, which adds to the value of
the results.
Validity and reliability are two factors
to consider in choosing an assessment tool. Validity is “the degree to
which an assessment measures (a) what is intended, as opposed to (b) what is
not intended, or (c) what is unsystematic or unstable” (Source: University
of Maryland Center for the Study of Assessment Validity and Evaluation).
Face validity refers to validity taken at face value; does an assessment tool
give an immediate impression that it measures the intended outcome? Content
validity refers to sample representativeness; how well do the results capture
the characteristics of a population of students? Reliability refers to the tool’s
consistency in producing results over time and with different samples of students.
B. Direct Methods
1. Capstone course
Capstone courses draw upon and integrate knowledge, concepts, and skills
associated with the entire curriculum of a program. Taken normally in the
senior year, capstone courses ask students to demonstrate facility in the
program’s learning outcomes, in addition to other outcomes associated
with the particular course.
Within a capstone course, evidence of student learning may include comprehensive
papers, portfolios, group projects, demonstrations, journals, or examinations.
But how does one use this evidence to assess the overall program? The final
grade for the course, being a single measure, does not dissociate into an
assessment of student achievement in the various learning outcomes for the
program (although achievement in each of the learning outcomes may combine
into the final grade). One method of assessment in capstone courses is to
evaluate student work with an eye toward the multiple dimensions of the program’s
outcomes. More than one faculty member can be invited to assist in the assessment
of student work, e.g. in a project presentation. The assessment of a major
paper or project, or set of papers or projects, may be broken down into sub-assessments
of each learning outcome.
The benefits of a capstone course include the increased
ability of students to integrate their learning; close association between
the assessment tool and the program’s particular learning outcomes;
rapid feedback on the program; and association of a faculty member who contributes
to program assessment through the professor’s normal teaching load.
Disadvantages include disciplinary differences in the appropriateness
of a capstone course in the curriculum; the time required to develop and gain
approval for the course; staffing requirements; slotting the course into the
curricular sequence; and normal variations in course content as the course
changes hands from instructor to instructor, i.e. reliability. Also, this
type of assessment does not lend itself easily to comparisons of student work
early and late in their academic career (“pre- and post-test”
assessment, see item 6 below).
Examples: BSC Philosophy Department, capstone
course syllabus
2. Course-embedded assessment
In course-embedded assessment, student work in designated courses is collected
and assessed in relation to the program learning outcomes, not just for the
course grade. As in the capstone course, the products of student work need
to be considered in light of the multiple dimensions of the learning outcomes.
Products may include final exams, research reports, projects, papers, and
so on. The assessment may be conducted at specific points (e.g., introductory
course and upper-level course) in a program.
Benefits include the fact that assessment is conducted as
part of the normal workload of students and faculty, although additional work
may be needed to incorporate program assessment into the course.
Disadvantages include the potential for a faculty member
to feel that her or his work in a particular course is being overseen, even
if it is not. Also, rubrics may need to be chosen or developed that are associated
with the particular learning outcomes, increasing the preparation time (see
Chapter 6, Section B, Rubrics).
Example: Physics at
BSC
3. Standardized tests
The Educational Testing Service and other companies offer standardized tests
for various types of learning outcomes, such as critical thinking or mathematical
problem solving. Scores on tests such as the GRE or the Massachusetts Test
of Educator Licensure (MTEL) may be used as evidence of student learning.
Benefits include the reliability and validity of an assessment
instrument that is commercially developed, eliminating the arduous process
of developing an instrument in-house; simplicity in administration and evaluation
of test results; and the potential for cross-institutional comparisons of
results.
Disadvantages include the generic nature of standardized
tests and their potential lack of fit with a particular program; a possible
lack of motivation by students to take the test or do well on it; and the
debatable question of whether a standardized test gives a true measure of
student learning. Also, ETS and other services charge substantial fees for
these tests, which is an added administrative cost or possibly a cost to the
students.
Example: California
Critical Thinking Dispositions Inventory
4. Locally developed tests
Faculty in a program may decide to develop a test that is reflective of the
program’s mission and learning outcomes. The test is usually graded
by multiple evaluators. Locally developed tests are less costly than a standardized
test, but require work by the program’s faculty in development and scoring.
Benefits include the ability to tailor a test to a specific
program.
Disadvantages include the challenge of developing a test
with proven reliability and validity, the potential need to develop rubrics
and train multiple test evaluators in the use of these rubrics, and the need
to develop a new test periodically.
Examples: English Placement
Test at California State University, English
Placement Test at the University of Wisconsin-Madison
5. Portfolio evaluation
A portfolio is a compilation of student work that, in total, demonstrates
a student’s achievement of various learning outcomes. Portfolios can
be created for a variety of purposes aside from program assessment, such as
fostering reflection by students on their education, providing documentation
for a student’s job search, or certifying a student’s competency.
Portfolios created over the span of a student’s academic career, compared
to those consisting of a student’s work only at the end, provide the
basis for a developmental assessment.
Portfolios may combine multiple types of evidence and are not necessarily
limited to classroom work. For example, portfolios may contain research papers,
presentations, videos, audio recordings, work done through employment, or
journal entries discussing co-curricular activities or programs. Once the
material is collected, it falls upon an individual or group to establish a
system by which to evaluate the contents of the portfolio in terms of a program’s
learning outcomes (see Chapter 6, Section B,
Rubrics).
In the School of Education and Allied Studies at BSC, portfolios are used
to document each student’s competence in teacher preparation. This is
a different purpose from that for program assessment. In program assessment,
a cross section of students may be sampled to evaluate student learning outcomes,
but in teacher preparation, the intent is to validate every student’s
competence.
A key question in portfolios arises in the collection of evidence. In teacher
preparation, students themselves collect and save the material, and online
systems are now available to assist in that process. But for program assessment,
the department itself may have to assemble the student portfolios; in this
case, issues must be considered about how the students are to be informed
of the fact that their work is being assessed for programmatic reasons. Some
faculty ask students to sign consent forms to copy work products and to use
student work products in accreditation reports. More information on this can
be obtained from the BSC
Institutional Review Board.
Benefits of portfolios include the ability to document student
development over time, and the potential benefit to the students of seeing
their own development and in collecting material that may support their career
goals. Thus, program assessment becomes an integral part of the learning process.
Disadvantages include a labor-intensive process in the evaluation
of evidence in student portfolios. Also, there is an expense in storing and
organizing the evidence.
Example: Alverno College Diagnostic
Digital Portfolio
6. Pre-test/Post-test evaluation
One of the questions that comes up in assessment is not only whether students
can demonstrate the learning outcomes when they graduate, but how much of
what they can demonstrate was actually gained during their time in the program.
This suggests the need to assess the students' knowledge and skills at the
point of entry into the program and, later, at the point of exiting the program.
In pre-test/post-test assessment, student work is assessed both early and
late in their academic career, from which the growth and development of the
students can be deduced.
Several of the previously-described tools lend themselves to pre-test/post-test
evaluation. Portfolios that collect evidence throughout a student’s
academic career can intrinsically be a type of pre- and post-test evaluation.
Course-embedded assessment in which student work is collected from introductory
and upper-level courses also provides a type of pre- and post-test evaluation,
although the level of difficulty in the two courses can be expected to differ
considerably. Standardized or locally developed tests can be administered
at two times in a student’s career to assess learning. However, if the
test is exactly duplicated at the two times, then students may improve simply
by having seen it twice. On the other hand, if different tests are administered
at the two times, it can be difficult to ensure that both tests are of the
same nature and difficulty, so the reliability of this method becomes a question.
Benefits include the ability to gain insight into students’
academic development.
Disadvantages include the increased amount of work involved
in assessing student work more than once, and the difficulty of designing
tests or assessment tools that are truly comparable at different times.
Example: Astronomy Diagnostic Test (e-mail
Professor Martina Arndt for a copy)
This is a standardized, multiple choice test to assess content knowledge
in astronomy. It is administered by the BSC Physics Department before and
after students take the Astronomy course. The test does not count toward student
grades. The performance is compared to that of students at other institutions.
This diagnostic test was developed for undergraduate, non-science majors taking
their first astronomy course by the multi-institutional Collaboration for
Astronomy Education Research (CAER).
The test is free of charge but is not to be altered in any way, questions
are not to be left out, questions are not to be removed and placed on course
quizzes or exams, and the test items are not to be distributed to students
in case the items find their way into "student files." In exchange
for using the ADT, the authors respectfully request that faculty submit scores
to the national database.
C. Indirect Methods
1. Student self-efficacy
Students have a sense of their own competence. Student self-efficacy involves
the rating by students of their perception of their own achievement in particular
learning outcomes. Research shows a significant, although imperfect, correlation
between actual and perceived competence. What can be problematic are gender
and demographic differences in the accuracy of self-efficacy. For example,
certain groups of students may rate their quantitative skills at a level below
that indicated by standardized tests. Also, unless the answers are anonymous,
students will be likely to overrate their abilities. The same is true if students
perceive they can be penalized by their answers.
Self-efficacy as an assessment tool is relatively simple. For example, a
researcher/assessment expert at Clemson University has designed a test that
asks students to rate the perceived importance and self-efficacy of leadership
skills, communication skills, interpersonal skills, analytical skills, decision-making
skills, technological skills, the global economy, ethics, and business practices
(see Example).
Benefits include the inexpensive nature of the tool. A relatively
simple survey can be constructed which simply asks students to rate their
competence in different areas. Also, pre- and post-test assessment can be
conducted to examine changes both in self-efficacy and perceived importance
of a topical area. Another benefit is that all learning outcomes can be assessed
simultaneously, in one test.
Disadvantages include an imperfect relationship between self-efficacy
and actual competence; student self-reporting may not always be congruent
with their actual level of achievement.
Example: Charles Duke research article. See abstract
below.
Learning Outcomes: Comparing Student Perceptions of Skill Level and Importance
Charles R. Duke, Clemson University
Journal of Marketing Education, Dec2002, 24(3): 203-218
"Presents a study that discusses the process used for developing learning
outcomes and the illustrative analysis of student perceptions as one step
in the process of implementing a learning outcomes approach to curricular
design. Illustration of a potential method of applying the analysis of student
perception within the evaluation process; Methodology that can improve the
faculty's understanding of the students' feeling about the program; Performance
evaluation approaches."
2. Student satisfaction surveys
Given that student satisfaction with a program or course is not a learning
outcome, satisfaction may or may not relate to outcomes assessment. But satisfaction
may correlate with other variables. For this reason, a common component of
assessment systems is the student satisfaction survey. Such surveys may consider
the extent to which students are satisfied with their interactions with faculty,
with their introductory or advanced courses, or with their preparedness coming
out of the program (see Student Self-Efficacy). Use of
individual course evaluations for program assessment is problematic because
the evaluations reflect on individual instructors – a serious pitfall
to be avoided in program assessment.
Benefits include the relative simplicity of administering
this type of survey. Standardized, commercial surveys are available that provide
comparison data from other institutions.
Disadvantages include the difficulty of designing questions
appropriately, or, again, a potential hazard in linking student satisfaction
and achievement of learning outcomes.
Example: Noel
Levitz Student Satisfaction Inventory, sample
survey
3. Student attitudinal surveys
If learning outcomes include elements of appreciation or understanding of
particular issues of concern, student attitudes can be measured as part of
the assessment program. For example, informed appreciation for the arts may
be assessed using an attitudinal survey. Another example may be students’
empathy toward disadvantaged groups, which can be measured in an attitudinal
survey. A further example would be attitudes toward learning or toward the
profession. Both standardized tests and locally designed surveys can be used
for this purpose, although the responses are very sensitive to the wording
of the questions.
Benefits include the simplicity of administering the system.
Disadvantages include the challenge of determining student
attitudes in a reliable manner.
Examples: Cooperative
Institutional Research Program (CIRP) “Freshman” Survey, BSC
Psychology Department
4. Exit interviews
Rather than assess students’ attitudes, self-efficacy, or satisfaction
through the use of surveys, students may be interviewed directly in individual
or focus-group settings. Such interviews allow a more thorough, free-form
exploration of the issues through the use of follow-up questions that depend
on students’ responses.
Benefits include the depth and richness of information that
can be obtained through interviews.
Disadvantages include the time- and labor-intensive nature
of conducting such interviews and in analyzing the information obtained from
interviews for comparison across multiple interviews. Also, student anonymity
needs to be protected in this tool, and stray comments about individual faculty
must not become part of the assessment data.
5. Alumni surveys
The perspective that students have on their education may change significantly
after time away from school. Some learning outcomes lend themselves more naturally
to questions posed some time after graduation. For example, an outcome involving
preparation for professional practice can best be assessed after the student
has graduated and been employed in the job market.
Benefits include the real-world perspective that can be
obtained from alumni.
Disadvantages include the difficulty of finding and reaching
alumni, the possibly self-selective nature of those who choose to respond,
and the relatively narrow scope of learning outcomes that can be assessed
in this manner.
Examples: BSC Program
Review alumni survey, Career
Services survey
6. Employer surveys
It is possible that some of the students' knowledge and skills are evident
to the employers who rely on these characteristics. Thus, some accrediting
bodies either require or encourage programs to perform an assessment through
the major employers of their students. These may range from information as
basic as hiring data, to site supervisor evaluations, to detailed surveys
of the characteristics that the employers perceive in program graduates. Advisory
boards, anecdotal information, and placement data may be used in place of
formal surveys.
Benefits of this tool include the real-world perspective
that employers might be able to provide.
Disadvantages include the potentially limited ability of
employers to assess their employees’ characteristics in terms of specific
learning outcomes, or the inability of employers to assess graduates only
from a particular school. Also, this tool depends on surveying employers with
sufficient numbers of graduates. In large corporations, it may even be difficult
to find the right person to contact for this information. In addition, former
students may object to having their employers surveyed in this way.
Example: Tennessee
Tech University engineering employer survey
7. Curriculum analysis
Historically, accrediting bodies have required institutions or programs to
document the information that students are receiving and the content that
the program delivers in its courses (see Course
Mapping in Chapter 4, Learning Outcomes). Documentation can be obtained
from the curriculum and syllabi of individual courses.
With the move toward learning-outcomes assessment, programs are required
to show that students actually exhibit the skills and qualities that the program
wishes to develop. However, a curriculum analysis may still be relevant and
is often included in accreditation documents. For example, some accrediting
bodies may require the documentation of the number of hours devoted to a particular
subject in the curriculum.
Benefits include the relatively straightforward task of
analyzing the content of the curriculum, for which only course syllabi may
be needed.
Disadvantages include the potential inequality between delivery
of material and documentation of learning for specific outcomes.
Example: BSC Anthropology program
curriculum analysis
On to Chapter 6. Implementation
Last Modified: October 21, 2004