At this step faculty members use one or more methods to gather information about whether students are indeed achieving the intended learning outcomes. This step includes selecting appropriate assessment tools, set performance targets or benchmarks, administering the assessments, and obtaining results from the assessments. This can be a resource-intensive step, which may require some or all of the following, depending on the assessment methods selected: expertise for developing and scoring instruments, faculty and/or staff time for administering and scoring assessments, money for purchasing instruments, and the use of some class time for administering assessments.
Selection of an assessment tool involves a tradeoff between the ability to obtain detailed information and the need to keep the process feasible and manageable. For this reason, this section lists advantages and disadvantages for each of the various assessment tools. Most assessment experts believe it is important to use multiple assessment tools to overcome the disadvantages of a single tool, albeit with added work and expense. Assessment tools can generally be placed in two categories, direct and indirect. Sometimes a tool from each of these categories is used to get a more holistic view of student learning.
Direct measures of assessment are those in which the products of student work are evaluated in light of the learning outcomes for the program. Evidence from coursework such as projects or specialized tests of knowledge or skills are examples of direct measures. In all cases, direct measures involve the evaluation of demonstrations of student learning. In assessing a learning outcome, at least one direct measure should be used.
Indirect measures of assessment are those in which students judge their own ability to achieve the learning outcomes. Indirect measures are not based directly on student academic work but rather on what students perceive about their own learning. Alumni may also be asked the extent to which the program prepared them to achieve learning outcomes. In another example, people in contact with the students, such as employers, may be asked to judge the effectiveness of program graduates. In all cases, the assessment is based on perception rather than direct demonstration.
Direct measures tend to be more time- and labor-intensive than indirect measures, which can often be handled through surveys. Without the direct evaluation of student work, larger sample sizes may be possible, which adds to the value of the results. Each outcome must be assessed by one or more direct measures.
Instrument validity is an important factor to consider in choosing an assessment tool. Validity is the degree to which an instrument measures what it purports to measure. There are several types of validity. For the purposes of a program-level assessment probably the most important type is content validity. This refers to the degree of overlap between what intended learning outcomes and the items on the instrument chosen to measure those outcomes. One question to ask is whether all of the intended learning outcomes are covered on the instrument. A different version of this question is whether the proportion of items on the instrument mirrors the importance placed on that learning outcome within the program. For example, if a national standardized test is used as a measure of learning outcomes, faculty members should consider whether all the intended learning outcomes are covered and also whether each one is given enough weight by the instrument. There are other types of instrument validity that may be of importance in a particular assessment. These are described in detail in any research methods textbook. Whether faculty members choose a commercially developed instrument or a locally developed assessment tool, the validity of the measure is an important issue to address.
Instrument reliability refers to the instruments consistency. Test-retest reliability is a measure of the consistency of scores when the instrument is administered more than once. Internal reliability is a measure of the consistency of scores within the instrument (e.g., split-half reliability, which measures whether scores in the first half of the test are consistent with scores in the second half of the test). For programs using rubrics, inter-rater reliability is very important as it provides a measure of the extent to which two or more scorers are in agreement relative to the use of a given rubric. The type of reliability measure that is important in a given assessment will depend on the assessment itself. If an instrument is not reliable, it cannot be a valid measure. Therefore, it is important to learn the reliability of commercially purchased instruments or to establish the reliability of locally developed tools.
Sampling Method and Sample Size
Critical to the question of implementation is how evidence of student work will be collected, and how much evidence will be used for assessment. For small programs, these questions may be relatively easy, since it might be feasible or even necessary to assess every student in the program. For large programs, you will need to consider how many students will have their work sampled, and from which required courses or tests evidence will be extracted. Cost and time are major issues. Be prepared to compromise on the sample size in order to attain feasibility in implementation. The more work collected for analysis, the greater is the cost in time and other resources of the implementation.
If the assessment is intended to look at student development, then the work of individual students has to be tracked over a period of time. That is different from, say, collecting a random sample at year 1 and another random sample at year 4. If students are selected to have their work tracked, the faculty will need to determine how the identity of those students will be safeguarded and how the students will be notified.
Interactive sample size table can be used to determine sample size of any population.
Identification of Resources Needed
The resources required to carry out an assessment plan should be considered in total and then proposed to the administration. This may include the following categories:
Cost of standardized tests, including the purchase of the tests and the cost of processing the results. Test companies will typically supply both of these steps as a service, for a fee.
Cost of surveys, including the design of the survey, mailings (if needed), web-based postings, e-mail, and compilation of the results. Staff members from the Office of Assessment are available to assist in developing web-based surveys, analyzing data and preparing preliminary reports in light of academic program reviews and/or major institutional projects.
Time and cost of collecting and evaluating student work. A rough estimate for scoring student writing is that, once faculty members become adept at applying rubrics to evidence from portfolios, course-embedded assignments, or capstone courses, scoring 20 pages of student writing requires about an hour.
Training for faculty evaluators. If the assessment tool involves scoring student work or conducting interviews, faculty members may need to be trained to do this work so that it is consistent among all of the evaluators. Training would involve practice scoring the same work and normalizing the outcomes.
Assessment coordinator. For large programs requiring complex assessment tools, it may be necessary to assign a faculty member the ongoing task of assessment coordinator. The Assessment Coordinator would work with the Office of Assessment to develop and implement assessment in his/her program.
This section presents a brief overview of techniques for analyzing qualitative and quantitative assessment results. It is not possible in this brief space to provide all the information that faculty members will need to become proficient in a particular analytical technique. However, many textbooks are available on each of these topics, and faculty colleagues can provide assistance as well.
Last Modified: October 19, 2012