Fiske et al 1970 AP

Fiske, D. W., Hunt, H. F., Luborsky, L., Orne, M. T., Parloff, M. G., Reiser, M. F., & Tuma, A. H. Planning of research on effectiveness of psychotherapy. American Psychologist, 1970, 25, 727-737.

PLANNING OF RESEARCH ON EFFECTIVENESS OF PSYCHOTHERAPY 1

DONALD W. FISKE

University of Chicago

LESTER LUBORSKY

University of Pennsylvania

MORRIS B. PARLOFF

National Institute of Mental Health, Chevy Chase, Maryland

HOWARD F. HUNT

New York State Psychiatric Institute and Columbia University

MARTIN T. ORNE

Institute of the Pennsylvania Hospital and University of Pennsylvania

MORTON F. REISER

Albert Einstein College of Medicine

AND

HUSSAIN TUMA 2

National Institute of Mental Health, Chevy Chase, Maryland

THERE have been few convincing research studies on the effectiveness of the various psychotherapies. The studies of large scope have typically had insufficient or inadequate controls. Other studies, often more adequate in methodology, have been so limited in scope that any generalization of their findings must be very tenuous. Until recently, few studies have built cumulatively on earlier ones to provide comparable rather than conglomerate data. As a consequence, we have little systematic experimental knowledge of psychotherapy, of its effectiveness, and of the factors facilitating its effects. Certainly the practice of psychotherapy has been influenced very little by the research literature.

The major problems that must be faced by any attempt to evaluate psychotherapy are that we are dealing with a treatment modality that has not been defined, the effects of which are presumed to require a long period of treatment, and the evaluation of which demands long-term follow-up. Furthermore, the treatment procedures are designed to modify conditions where the spontaneous recovery rate is considerable and the means of evaluating recovery controversial. Even fairly strong treatment effects might therefore easily be swamped by the heterogeneity of subject populations and the inevitable effects of intercurrent changes in life situations.

The authors came together to consider what suggestions might be made toward improving research investigation of this topic, so that research could contribute to advances in the theory and techniques of psychotherapy. Although we doubt the wisdom, value, or desirability of outlining idea. research designs, we believe it will be helpful to indicate the kinds of considerations with which an investigator should be concerned, as he works out his own plan. While one should not prescribe the decision that should be made at each point in the plan, it is possible to state the issues on which the investigator must reach his own decision, thus hopefully sensitizing or alerting him to the many crucial aspects of such a plan. In many instances, the neglect of just one matter, the failure to include it in the plan, implementation, and report, may be sufficient to turn an expensive and extended experiment into a compilation of case studies. For example, in a study comparing two types of treatment, the therapists using one treatment cannot be allowed to select their own patients while the other therapists are assigned the rejected patients. Since our objective here is to call attention to issues,

1 This article reports the discussion at a workshop sponsored and supported by the Clinical Projects Research Review Committee, National Institute of Mental Health. The authors acknowledge with thanks their appreciation for the assistance of Martin Katz, Chief, Clinical Research Branch, National Institute of Mental Health, and his staff. (Reprinted from Archives of General Psychiatry, 1970, 22, 22-32. Copyrighted 1970 American Medical Association.)

2 The first author took responsibility for drafting this article. Requests for reprints should be sent to Donald W. Fiske, Department of Psychology, University of Chicago, Chicago, Illinois 60637.

727

728 American Psychologist

to provide a checklist, 3 many points have been stated concisely, without elaboration.

This report is not intended to be definitive or exhaustive. Although it cites some published discussions of the problem (papers containing useful bibliographies), and although we had the good fortune to have access to several reviews (e.g., Luborsky, Chandler, Auerbach, Cohen, & Bachrach, in press; Paul, 1969; Strupp & Bergin, 1969), we have made no attempt to determine and cite the sources of the ideas presented here. This article includes the content of the workshop's discussions on which there seemed to be general agreement, without identifying the contributor who initiated each suggestion.

SOME GENERAL RECOMMENDATIONS

Underlying the more specific considerations that follow are several general propositions. Like other research, studies of psychotherapy should be linked as closely as possible to theory and concepts. Thus, the theoretical basis for the treatment and the concepts presumed to be indexed by the dependent variables should be indicated. Again, every effort should be made to coordinate each study with prior or future studies. Measurement procedures should be standardized, and each study should use such standardized procedures wherever possible, in addition to whatever particular measures are required by the design of the given study. Moreover, the inclusion of measures developed in the context of other theoretical orientations to therapy is most desirable. The report should specify as much detail as possible about the treatment, setting, patients, and measuring operations, so that, in principle, the study can be replicated by another investigator: with replication made possible by such specification, the study of different therapists and patients in a different city should not affect the findings and conclusions. Finally, while economy must inevitably affect the planning of such research, it is false economy to omit the collection of potentially relevant data (e.g., concerning demographic characteristics of patients and therapists and the fee schedules) that can be recorded with little effort and minimal expenditure of professional time. The research should seek to maximize the probable theoretical yield and empirical contribution for the costs involved.

In the long run, science builds on replicated findings. The most essential requirement of a research study is its reliability -- it must be potentially replicable. Such replicability requires not only reliable measuring procedures but also operationally specified procedures for selecting patients, identifying the nature of the treatment, and other experimental steps, to maximize the fidelity of any such replication.

BASIC DESIGNS

As is generally true for research designs in the behavioral sciences, all plans for studying psychotherapy can be criticized. The objective should not be the impossible one of providing controls for every conceivably relevant factor, but rather to design a study that can reasonably be expected to tell the field something not now known. As completely as possible, the design should be replicable by others, and not depend on the particular personalities of the therapists and of the judges used to provide ratings.

A basic decision concerns the investigator's objective: Is he testing a theory of therapy or is he evaluating a method of treatment? If the latter, does he wish to test the method when applied to a broad class (e.g., neurotics) or to a smaller class (e.g., those judged able to profit from that treatment) ?

There are two, types of approach (cf. Cronbach, 1957), the experimental and the correlational. The experimental design involves two or more groups considered comparable in all respects except those explicitly varied by the investigator. The paradigmatic question is: Does Treatment X have a different effect from Treatment Y? The correlational approach asks whether a measured effect varies with another variable, such as length of treatment or initial anxiety level of patient. More completely, the question is whether the effect varies discriminably with the second variable, and, if so, what is the magnitude of the relationship (in a

3 A "Check-List of Issues in Designs for Research on the Effectiveness of Psychotherapy," based on this article, has been deposited with the National Auxiliary Publications Service. Order Document No. 01054 from the National Auxiliary Publications Service of the American Society for Information Science, c/o CCM Information Sciences, Inc., 909 3rd Avenue, New York, New York 10022. Remit in advance $5.00 for photocopies or $2.00 for microfiche and make checks payable to: Research and Microfilm Publications. Inc.

729 EFFECTIVENESS OF PSYCHOTHERAPY

sample with specified common characteristics, but with the empirical test being made in a context in which other unspecified variables may also covary with that effect)?

The experimental approach may be used for several purposes. When the emphasis is on treatment, we believe that an experimental study of psychotherapeutic effectiveness must compare two or more treatments. It is impossible to conceive of a true (i.e., untreated) control group which can be precisely implemented in such research. A group not explicitly treated (in some way) as part of the study is likely to suffer serious attrition or to seek treatment elsewhere. Also, withholding therapy is itself a treatment. Patients in an own-control group who wait a period of time before treatment are affected by that waiting (e.g., they may become less amenable to the treatment they subsequently receive). Hence, the comparison must be between levels of a single type of treatment (e.g., time-limited versus unlimited) or between qualitatively different treatments.

Since numerous factors can conceivably influence treatment effects, it is essential that the investigator identify his total sample and then assign subjects randomly to the treatments. The assignment must be truly random, the method of randomization being presented in his final report. Moreover, the investigator should define his intended population, the population to which he hopes his findings can be generalized, and indicate exactly how he has drawn his total sample from that population. Many studies imply that their population is much larger than it actually is. Typically, it is impossible, strictly speaking, to generalize beyond patients at a particular clinic in a given year. The investigator must let the generalization of his findings to other times and places remain in abeyance until comparable studies justify that step.

The population sampled can be restricted to a greater or lesser degree. Other things being equal, the more homogeneous the population, the more useful the findings. The population may be limited as to diagnosis or severity of condition, or both.

Alternatively, the experimental study of treatment can appropriately be confined to one type of therapeutic treatment. For instance, the comparison might be between a group treated as usual and a group receiving similar therapeutic treatment but also subjected to a complex of nonspecific effects, such as a maximizing of expectancies (a topic considered below).

Again, the comparison can be of the effects of the one treatment on patients falling in particular categories. The adequacy of such a study depends obviously on the adequacy of the procedures for classification. The danger lies in the possibility that any observed differences in outcome are due to unrecognized factors associated with the diagnostic groupings, rather than with the diagnosed conditions themselves.

Groupings can also be made post hoc, on the basis of some characteristic of the treatment protocol. While such studies may be valuable explorations of the treatment process, confidence in their findings will vary with the adequacy of their classification procedure and with the care with which the investigator searches for confounding differences on other variables.

Individualized goals for treatment. Increasing concern is being expressed in the literature about the appropriateness of assessing outcome in the same way for all patients. The patient typically comes in with one major complaint for which he seeks relief. Similarly, the therapist may develop particular outcome goals for each patient. Should not outcome be defined to be congruent with such individual aims? The methodology for studies with this orientation needs further development.

Within-patient controls. It has been argued (Goldiamond & Dyrud, 1968) that aspects of therapeutic techniques can be studied over the course of treatment of each patient; comparisons of the patient's behavior can be made between periods associated with the application of the technique and other periods. Studies of this kind are possible and may be useful in behavior therapy. The design requires that the effects be reversible if multiple applications are made in the same course of treatment. Some applications, such as interpretation in psychoanalytic treatment, cannot be repeated --repetitions of the same interpretation would not ordinarily be considered independent events. Furthermore, all such studies would be restricted to within-therapy measures of effect; rarely would it be possible to obtain successive concomitant extratherapy measures.

730 AMERICAN PSYCHOLOGIST

THE TREATMENT

In any study of psychotherapeutic effectiveness, it is essential that the treatment or treatments be comprehensively described, and that the investigator indicate his reasons for believing that any effects associated with the treatment can be ascribed to it and not to differences in such conditions as the therapists, their prestige, and their ability to discern promising patients.

A complete description would include many things. What is the theory underlying the treatment? According to the theory, what is the etiology? What is the rationale for the treatment and for its presumed effects? How does it differ from other theories? It is doubtful whether this ideal statement can be realized (cf. Kiesler, 1966, who argues that such theorists as Freud and Rogers have not provided full accounts in their writings).

The investigator should attempt to portray the treatment in ideal form, in terms of what it is intended to be, and what aspects are seen as determining the quality of the outcome. He should then indicate the extent to which the ideal was approximated; that is, he should describe the treatment as it actually was. What are the goals of the treatment? Symptom resolution? Personality reconstruction? To what extent were those goals approached in the particular application being studied? What are the methods? What are the components of the intended treatment process? Ideally, the investigator would seek to demonstrate the presence of each component and its effect.

The investigator's general description of the treatment must be supplemented by other specifications of it. Statements of the several therapists provide one source. Independent observers can furnish descriptions based on listening to recordings of the therapeutic sessions. Ordinarily, it is sufficient for observers to listen to selected sections of the recordings. It may, however, be desirable to consider points selected by the therapist as particularly critical, that is, high points where treatment was especially effective.

Useful also would be descriptions provided by the patients themselves. The reports of therapists and the judgments of observers are particularly important when two somewhat similar treatments are being compared. In that instance, the features common to both treatments as well as those differentiating them must be identified. Also desirable is some attention to how the treatment (and its intended effects) differs from other treatments and from no treatment (and effects from other sources).

A number of other aspects of treatment should be recorded and summarized. In addition to total calendar time and to number and duration of sessions, the number of sessions missed, canceled, and changed can readily be noted. Administrative aspects would include the fees charged (in absolute terms and as a proportion of income, or of disposable income) and the source of the money used for payments. (Size of payments may be related to socioeconomic status which itself seems related to outcome.) Again, what are the physical arrangements within the therapy room, including the physical orientation of patient and therapist and the distance between them? Less readily recorded are the appearance of the patient and his gestures. (See Luborsky, Fabian, Hall, Ticho, & Ticho, 1958, for an extended discussion of treatment variables.)

The preceding discussion has used treatment in the singular form. It should be obvious that it applies to each treatment included in the study.

The therapist. Part of the treatment is obviously the therapist. His in-therapy behavior can be described and rated, to supplement his own report on his activity. Standard rating scales should be used for this purpose, along with any special scales the investigator wishes to add. Attributes such as empathy, warmth, genuineness, and effectiveness of response are candidates for standardized scales. With standardized instructions and an appropriate amount of supervised practice, nonprofessional raters may be used for such work.

Also needed are data on the therapist's training and experience. Very desirable is information on the therapist as a person outside the therapy room. (The experimenter should take cognizance of the accumulating evidence that some therapists are actually ineffective, at least with particular patients. An optimal design would permit statistical evaluation of differences among the therapists in their effectiveness.)

While the characteristics of the therapist have been considered separately from the description of the therapy, it is obvious that they interact, and their complete separation may not be possible. Thus, the therapist's belief in the efficacy of treatment is a variable of great significance.

731 EFFECTIVENESS OF PSYCHOTHERAPY

OTHER ASPECTS OF THE CONDITIONS

A major problem in assessing the outcome of therapeutic treatment is determining the effects contributed by sources other than the treatment, the therapist, and the patient himself. The nature of the patient's environment during therapy, the attitudes of significant others toward him and his treatment, and the occurrence of changes in the patient's life situation and of any special events or traumas should be known. It is particularly important to determine what other treatments the patient is receiving or administering to himself. Is he counseling (formally or informally) with another therapist, with a physician, or with a spiritual adviser? What drugs is he taking? This information may be difficult to elicit.

In general, it is likely that such sources of effects will be controlled by truly random assignment of subjects to treatment groups. It seems advisable, however, to gather data on these matters to insure that they do not have systematic effects that bias the findings.

THE DEPENDENT VARIABLES

We are firmly convinced of several propositions about the dependent variables in therapy research. There should be more than one measure of outcome, and preferably several diverse measures, since outcome measures tend to have low intercorrelations. Standard measures, measures used in other studies, should be used as much as possible. Each of several measures should be coordinated with a concept in the theory underlying the treatment, although the measures should also include ones related to alternative theories of therapy, to permit comparisons and to uncover possible side effects.

The dependent measures can be obtained in three contexts: within therapy sessions; outside of treatment, but in the same institutional setting (e.g., tests administered by a diagnostician in the same clinic); and in external settings (e.g., work performance). Related to the latter differentiation is the contrast between unobtrusive measures (of which the patient is unaware) and obtrusive measures (in which the patient typically knows he is being assessed and reacts to the measurement procedure itself). See the thoughtful treatise on Unobtrusive Measures: Nonreactive Research in the Social Sciences (Webb, Campbell, Schwartz, & Sechrest, 1966). Note that researchers are inclined to trust the self-reports of patients before treatment, but not at termination.

Another pertinent classification is that of vantage point or perspective. The patient has one view, hopefully revealed in his self-reports of subjectively experienced change; the therapist has another, usually restricted to observations of the patient in therapy and to the patient's reports to him; observers have different views, based on recorded therapy behavior, diagnostic tests, or outside behavior -- the observers may be professionals or laymen, and may be emotionally involved with the patient or relatively detached.

From the professional's viewpoint, outcome is important as it affects major areas of living, including work or occupational adjustment, nature of the patient's interpersonal relationships (both close and remote), sexual adjustment, symptom status (with special reference to target symptoms and to amount and intensity of negative affect), and insight into his own mental processes. While an investigator may not choose to assess the patient's status in all areas, he must decide which domains his theoretical orientation requires him to tap.

Perhaps the single most used index of outcome is the therapist's rating. It seems highly desirable to have some standardization of this easily obtained criterion. The therapist can rate status (generally, or in several specific variables) before therapy and at termination, the index being derived by procedures discussed in a later section. Alternatively, or additionally, he can rate degree of improvement, as of the end of treatment. This rating has several potentially serious limitations. The therapist should be trained in this rating, so that there is some justification for assuming that his frame of reference and his interpretation of the rating scale are shared by other therapists. Even so, it is a measure of uncertain value except in the unusual circumstance that the same therapists administer both treatments being compared and are equally involved in both.

The times of administration of the dependent variables should be standardized. The typical times are before and after treatment, with a follow-up assessment made in some cases. Testing during therapy is undesirable: it is likely to have obtrusive effects, and it would seem to have limited

732 American Psychologist

value unless associated with independently specified points in the course of treatment. It is not known whether the exact point of initial testing makes a difference: it can be done on the day of first appearance in the clinic, on the day of first therapeutic contact, or any time in the varying interval between. Similarly, should termination testing be immediately after the last therapy hour, two days later, or two weeks later? Again, what is the optimal time for follow-up, considering the probable increase in attrition with length of time since treatment? These time points, and especially the last one, should be determined by one's theoretical position. For example, it can be argued that status at termination is sufficient, that the course of status during the follow-up period is a separate topic to be studied in its own right. On the other hand, psychoanalysis predicts gains during that period as a function of successful treatment, and, hence, a follow-up assessment is an appropriate part of a comprehensive study of its outcome.

Related to time of administration is the investigator's classification of his dependent measures as pertaining to ultimate or instrumental criteria: does each specific measure get at an end point, a final goal of treatment, or does it assess a condition prerequisite for attaining such an end point? (Resolution of an acute contemporary problem might be a final goal; development of patient insight into his contribution to interpersonal conflicts might be an instrumental goal.)

The psychometric properties of each dependent measure should be reported by the investigator. In addition to showing the a priori connection between each measure and his theoretical framework, he should cite evidence for its construct validity (see Standards for Educational and Psychological Tests and Manuals, APA, 1966) and especially evidence for its validity when used in the therapy context. The three kinds of reliability or generalizability are all pertinent to psychotherapy research. Each measure should have reasonable stability: it should not reflect such an ephemeral or fluctuating quality as momentary mood. It should be internally consistent -- its several elements, components, or items should be shown to be measuring the same quality. Finally, there is the matter of interobserver agreement. This kind is not relevant to patient reports. It is relevant to therapist ratings, but unavailable. It must be determined for ratings by persons in other roles, such as diagnostician or observer. As is shown later, any major deficiency of a measure in any of these respects can seriously limit the value of findings involving it. The study of change or gain is particularly sensitive to effects associated with unreliability.

THE PATIENTS

It is obvious that the patients should be described, but in what terms? The descriptions should include as many as possible of the dimensions that previous research has shown to make a difference in outcome, such as age and severity of sickness. If possible, frequencies of favorable combinations should be given: for example, high anxiety plus high ego strength. More generally, any information on the extent of healthy adaptive functioning is most valuable. Diagnoses should be reported where known. Other general variables include sex, socioeconomic status, and education. Information on any prior psychotherapy is essential. Data on knowledge about therapy and on number of friends and family who have had psychotherapy are clearly desirable. These may be part of a more general attribute of favorableness of patient's environment, in terms of opportunity for positive social reinforcements and for growth.

Special attention should be given to the subjective variable of patient attractiveness for psychotherapy. Another critical aspect is the nature of the referral: How did the patients get there? What proportion were self-referrals? Again, did the therapists select their patients, and, if so, on what basis? Such selection seems difficult to incorporate in a systematic design; it is preferable that the researcher control the selection. The patient's beliefs about therapy are considered in a later section on expectancies. A separate consideration, the patient's use of drugs, was mentioned earlier.

The general attribute underlying patient variables is, of course, prognosis: What is the probable degree of improvement with the selected treatments? Ideally, any groups of patients being compared experimentally should be matched on prognosis. If judgments of prognosis can be made reliably (over judges), one-to-one matching is feasible. If several variables are considered, such individual matching is quite impractical, and truly random assignment to treatment groups seems to be the technique of choice.

733 EFFECTIVENESS OF PSYCHOTHERAPY

THERAPIST AND PATIENT EXPECTANCIES

Although little systematic research has been done on the topic, there is good reason to believe that the expectancies of the therapist and of his patient regarding the outcomes of therapy contribute substantially to those outcomes: While such expectancies could be subsumed under treatment, they are not, strictly speaking, part of the therapeutic technique itself. But the therapist's expectancies for the patient will affect the techniques he uses and his degree of commitment to the treatment. Patient expectancies may also play a significant role in the period between termination and follow-up. Thus, expectancies are sufficiently important to consider them in a separate section, rather than including them under the topics of the therapist and the patient.

Expectancies are always present in both therapist and patient. Positive expectancies seem to be a necessary condition for therapeutic effectiveness. The objective in treatment is presumably to make these attitudes maximally beneficial, to avoid negative levels, and to develop appropriate positive levels. The investigator should ascertain and report these expectancies. In our present stage of methodology, this is a difficult but important requirement: for example, positive outcome in the face of negative expectation would be of unusual significance. Clearly, research is needed on this entire topic.

The expectancies of the therapist should be ascertained by careful inquiry that avoids direct questions. The therapist has expectancies for patients with a particular diagnosis; within each diagnostic group, he has expectancies according to the prognosis as he sees it. But his total expectancies for a given patient are not determined by these alone: they are also a function of his perception of the individual patient and of the patient's attractiveness, etc., and, hence, of an inference about the outcome when that patient is treated by him. The problem is complicated by the fact that such expectancies are not fixed at a point in time. They appear likely to vary from first contact through at least the first several therapeutic hours. Consider also the effects of positive expectancies not being fulfilled during the course of treatment.

There are also the expectancies of the person referring the patient and of the patient himself. These are relatively unexplored areas. (See, however, the book on expectancies by Goldstein, 1962, and the paper on anticipatory socialization for psychotherapy by Orne and Wender, 1968, and one on a successful application of it by Hoehn-Saric, Frank, Imber, Nash, Stone, and Battle, 1964.)

RESEARCH DESIGNS AND STATISTICAL TREATMENTS

The problem of designing research on the effectiveness of psychotherapy is rendered more difficult by our limited knowledge of the variables that affect such effectiveness. There are many variables that, on the basis of experimental or clinical evidence, appear to influence the presence or absence of such effectiveness and also, presumably, its extent. There are many sources of potential invalidity in designs for research in this area. (See Goldstein, Heller, & Sechrest, 1966, Ch. 2.)

For experimental studies, a classical design is

O X O

O ... O

where each line represents the time course for a group of subjects with O indicating a time point for assessment and X indicating the experimental treatment. To make comparisons between the two groups, it is necessary (as noted earlier) that subjects be assigned on a truly random basis to the two groups. (The investigator should report the details of the randomization procedure.) Of course, there may be more than two groups, with each group receiving a specified treatment.

The investigator may be concerned about the possibly intrusive effects of the pretesting: for example, he may feel that the pretesting will interact undesirably with the treatment. If so, he may choose to omit the pretesting for each group.

An optimal design would include both of the above designs, as follows:

O X O

O.....O

... X O

....... O

Since this Solomon design requires a marked increase in total sample, it may not be feasible for many investigators. (For a careful critique of these designs and for a discussion of statistical techniques appropriate for each, the rater is re-

734 American Psychologist

ferred to the definitive chapter by Campbell and Stanley, 1963; note, of course, that the strengths and weaknesses of each design in educational research may take different values from those in research on psychotherapy.)

Paul (1969) considers the potentialities of factorial designs in behavior-modification research. In such designs, each selected variable (such as experience of the therapist) can be manipulated systematically. While such powerful designs may prove profitable as the field advances, there is little empirical evidence on which to judge their suitability at the present time.

Quasi-experimental designs are considered by Campbell and Stanley (1963) and further discussed by Campbell (1963). For example:

O.. O.. O.. O X O.. O.. O.. O

The rationale is that the repeated measures provide a basis for deciding whether the change during the period containing the treatment is greater than that occurring earlier or later for periods of similar duration. Such an interrupted time-series design has great intuitive appeal. It can be refined by adding a second group assessed at each of the time points, but not given the experimental treatment. It seems likely, however, that such designs may be feasible in psychotherapy research only under special conditions. Ordinarily, consideration of the effects of the repeated assessments and of the extended chronological time necessary for them will lead to their rejection.

The above designs are appropriate for trying to answer questions of the form: "Does Treatment X have a greater effect than Treatment Y?" Treatment Y, as indicated earlier, should be viewed as a treatment (rather than as the absence of any treatment), whether the treatment is formal and explicit or not. An alternative type of design can be used when the question is: "Does the effect of Treatment X covary over patients with Variable Y?" Here, Y may be a treatment variable (such as frequency rate for sessions), a therapist variable (experience), or a patient variable (verbal fluency). Such studies may be desirable precursors to the factorial designs considered by Paul (1969). As in all correlational studies, the possibility of obtained correlations being a function of some third variable must be taken into account.

Statistical analysis. Bereiter (1963) comments that only in connection with psychological change has he "ever heard colleagues admit to having abandoned major research objectives solely because the statistical problems seemed insurmountable [p. 3]." And, Lord (1963) warns that in the measurement of change, the usual commonsensible notions can be shown to be inadequate. Consultation of the highest quality is required if the investigator is to avoid the larger pitfalls in this treacherous jungle.

In experimental studies, the analysis of covariance is the method to be preferred, even when the assignment to groups has been truly random (Lord, 1963). In this technique, the posttreatment scores are adjusted for pretreatment differences. It should be noted that this adjustment does not take care of any pretreatment differences on other variables that are independent of those on the variable under analysis. Furthermore, the adequacy of such adjustment is contingent on the reliability of the pretreatment scores.

For correlational studies, it is necessary to have an appropriate index of the treatment effect for each patient. It is well-known that the simple raw gain score (the difference between pretreatment and posttreatment scores) has the severe defect of being determined unduly by the pretreatment score (under some circumstances, the raw gain will correlate .71 with initial level). The residual gain score adjusts the post score for that part contributed by the initial score (see Bereiter, 1963; DuBois & Manning, 1961). But the correlation coefficient used here must be corrected for attenuation, and, once again, the initial level must be measured quite reliably. Tucker, Damarin, and Messick (1966) show that the true difference score can be expressed as the sum of the true independent gain score (independent of initial measure) and the true dependent change score (entirely dependent on initial measure). They provide equations for estimating these terms. They also offer a thoughtful discussion of the various available indices of change and their assets and liabilities. The choice is not one to be made by the uninformed.

In general correlational work, unreliability can be expected merely to reduce the observed values to a predictable extent. Where reliability can be well estimated, there is little danger of arriving at false conclusions. But in studies of treatment

735 EFFECTIVENESS OF PSYCHOTHERAPY

effects, failure to take appropriate account of reliability considerations can lead to a conclusion diametrically opposed to that reached by more careful methods.

Individually defined specification of variables. It has been argued that the changes sought in therapy are highly specific to the individual patient. Thus, we should not expect all patients to improve on every dependent variable, but should determine whether each patient achieves remission of his target symptoms (Battle, Imber, Hoehn-Saric, Stone, Nash, & Frank, 1966), or perhaps whether he gains in those areas for which the therapist or the pretreatment assessment determines that his condition is most unfavorable. In evaluating such patient-specific goals, the possibility of symptom substitution must be considered. Little systematic consideration has been given to the design and analysis of studies oriented toward testing whether therapy produces particular effects designated as desirable for the individual patient. Such methodological work is needed.

It is clear that subjects selected as deviantly low on a measure can be expected to have higher values on retest. Such a change can be viewed as statistical regression. It seems more helpful to consider it as a function of errors of measurement that operate at one time, but not at the later point or as a function of temporal fluctuation in the patient's actual condition. However viewed, retest with no therapeutic intervention is bound to show an average improvement for patients identified by their extremely low initial scores.

To study empirically such individualized goals for change, patients can be classified in terms of these specific objectives, as defined by the patient, the therapist, or both. Each resulting class can then be studied separately, its members being as signed randomly to treatments. With their small Ns, these miniature experiments will ordinarily have findings that do not reach the usual levels specified for "statistical significance." It is possible, however, to pool the probability values for the several small studies to evaluate the set as a whole (cf. Jones & Fiske, 1953). Such an investigation would merely determine whether the two treatments differed in their effectiveness for specialized groups, identified in terms of objectives. A common outcome measure is still involved.

A more applicable design would involve applying the same treatment to a specific complaint group and to a comparison group with another complaint (or a group with heterogeneous complaints). The dependent variable would assess relief from the specified complaint (or achievement of the therapist's aim if the classification were based on that). Pooling the results of several such small studies would seem of only secondary interest and somewhat dangerous, since the several studies would be using diverse dependent variables. Furthermore, the design fails to control regression effects. More complex designs need to be worked out for this problem.

Sample size. While small samples can be expected to show large sampling fluctuations, it is obviously impossible in psychotherapy research to obtain large samples and still hold constant many relevant variables associated with chronological time, institutional setting, etc. As in all research, the control of potential biases is more critical than sample size, and confidence in findings increases primarily with replication over a series of studies by different investigators. The necessity of using standard, commonly used measures, in addition to those created by the investigator for the problem at hand, has been emphasized repeatedly in this article.

Statistical significance does not guarantee psychological significance. Once the effect of a condition or treatment has been demonstrated, an estimate of the proportion of variance it accounts for becomes the important concern.

PROGNOSIS FOR RESEARCH ON THERAPEUTIC EFFECTIVENESS

We believe that for some problems concerning the effectiveness of psychotherapy, it is possible today to develop appropriate research designs and to execute worthwhile studies. We doubt that it is wise to tackle such broad questions as "Does psychotherapy help neurotics?" and would urge instead the investigation of narrower questions, restricting such components as the population sampled and the types of treatment. When such studies of limited scope and objective have been carefully planned, executed, and replicated, the field will be ready for larger studies. We see much promise in the growing efforts to analyze and solve the complex substantive and methodological prob-

736 American Psychologist

lems in the experimental investigation of the effectiveness of psychotherapy.

We have been urging that the investigator provide full specifications of his variables, independent and dependent, and of his samples of therapists and patients. We also urge him to be candid about the limitations of his data as he sees them. Moreover, confidence in reported research findings is increased when the data are recorded and processed in such a way that they can be made available for reexamination by others.

Finally, we urge the investigator to include in his report any available information on the extent of negative, undesirable effects observed in his study. Any powerful treatment has the potentiality of harming some patients, perhaps more so when applied by particular therapists. Hence, a report should include not only the central tendency or average of the effects, but also their full distribution from largest positive to largest negative.

AREAS OF NEEDED RESEARCH

Our discussions brought out a number of topics that should be studied to further the research work in this area. While several were mentioned earlier, these topics are noted here for the sake of emphasis.

Standard scales are needed for describing the independent variables, therapeutic treatments. Standard procedures for assessing the dependent variables must also be developed. There is a great need for devising measures of outcome that are unobtrusive, but of clear theoretical relevance. Of particular interest is the therapist's judgment of the outcome, both relative to initial status and objectives and in terms of absolute level of wellbeing. While the therapist has a unique vantage point, his judgments must be obtained by methods that minimize biases associated with his involvement and commitment to the patient, the treatment, and the underlying theory, as well as possible biases associated with his own personality and his judging behavior.

Scales are needed for assessing the therapist's belief in the efficacy of the treatment. His expectancies for the particular patient and his commitment to that patient must also be measured dependably. Does the opportunity for the therapist to select his patients contribute to his effectiveness? How does such selection interact with his expectancies? We also need to learn more about how patients view psychotherapy, what they expect, and how these expectations are developed in their contacts with the clinic before treatment starts. Exactly how do expectancies contribute to outcome?

The great specificity of complaints, deficits, strengths, and treatment objectives requires major theoretical and empirical attention. It would appear that the unit in psychotherapy research is the collection of subjects assessed on a particular outcome measure; it does not seem possible at this time to base findings on analysis of different variables for different groups. But for how large a group is a given measure relevant: all neurotics, all anxiety neurotics, or just those troubled with anxiety in a specific context?

In addition to the unsolved theoretical issues and the perplexities of design, there is the methodology of obtaining data on patient complaints and on therapist goals. Further, the changes in these two sets of objectives need further study, both in their own right and (at least for patient reports) as potential measures of outcome.

Finally, and perhaps only after much preliminary work on total treatment change, there is the exceedingly difficult matter of studying changes in the patient during the course of treatment: when do they occur and what is their course? Here, perhaps more than anywhere else, the field needs the development of unobtrusive measures to which the patient does not react.

This list of topics crying for investigation is clearly not exhaustive. Anyone who knows the current literature on psychotherapy and its effectiveness can readily add to it.

REFERENCES

AMERICAN PSYCHOLOGICAL ASSOCIATION. Standards for educational and psychological tests and manuals. Washington, D. C.: APA, 1966.

BATTLE, C., IMBER, S., HOEHN-SARIC, R., STONE, A., NASH, E., & FRANK, J. Target complaints as criteria of improvement. American Journal of Psychotherapy, 1966, 20, 184-192.

BEREITER, C. Some persisting dilemmas in measuring change. In C. Harris (Ed.), Problems in measuring change. Madison: University of Wisconsin Press, 1963.

CAMPBELL, D. T. From description to experimentation: Interpreting trends as quasi-experiments. In C. Harris (Ed.), Problems in measuring change. Madison: University of Wisconsin Press, 1963.

737 EFFECTIVENESS OF PSYCHOTHERAPY

CAMPBELL, D. T., & STANLEY, J. C. Experimental and quasi-experimental designs for research on teaching. In N. L. Gage (Ed.), Handbook of research on teaching. Chicago: Rand McNally, 1963.

CRONBACH, L. J. The two disciplines of scientific psychology. American Psychologist, 1957, 12, 671-684.

DuBOIS, P. H., & MANNING, W. H. (Eds.) Methods of research in technical training. [Tech. Rep. No. 3, ONR, No. Nonr 812(02)] St. Louis: Washington University, 1961.

GOLDIAMOND, I., & DYRUD, J. E. Some applications and implications of behavioral analysis for psychotherapy. In J. Shlien (Ed.), Research in psychotherapy. Vol. III. Washington, D. C.: American Psychological Association, 1968.

GOLDSTEIN, A. P. Therapist-patient expectancies in psychotherapy. New York: Pergamon Press, 1962.

GOLDSTEIN, A. P., HELLER, K., & SECHREST, L. B. Psychotherapy and the psychology of behavior change. New York: Wiley, 1966.

HOEHN-SARIC, R., FRANK, J., IMBER, S., NASH, E., STONE, A., & BATTLE, C. Systematic preparation of patients for psychotherapy. 1. Effects on therapy, behavior and outcome. Journal of Psychiatric Research, 1964, 2, 267-281.

JONES, L. V., & FISKE, D. W. Models for testing the significance of combined results. Psychological Bulletin, 1953, 50, 375-382.

KIESLER, D. J. Some myths of psychotherapy research and the search for a paradigm. Psychological Bulletin, 1966, 65, 110-136.

LORD, F. M. Elementary models for measuring change. In C. Harris (Ed.), Problems in measuring change. Madison: University of Wisconsin Press, 1963.

LUBORSKY, L., CHANDLER, M., AUERBACH, A. H., COHEN, J., & BACHRACH, H. Factors influencing the outcome of psychotherapy: A review of the quantitative research. Psychological Bulletin, 1971, in press.

LUBORSKY, L., FABIAN, M., HALL, B., TICHO, E., & TICHO, G. Treatment variables. Bulletin of the Menninger Clinic, 1958, 22, 126-147.

ORNE, M. T., & WENDER, P. H. Anticipatory socialization for psychotherapy: Method and rationale. American Journal of Psychiatry, 1968, 124, 88-89.

PAUL, G. L. Behavior modification research: Design and tactics. In C. M. Franks (Ed.), Assessment and status of the behavior therapies and associated developments., New York: McGraw-Hill, 1969.

STRUPP, H., & BERGIN, A. Some empirical and conceptual bases for coordinated research in psychotherapy: A critical review of issues, trends, and evidence. International Journal of Psychiatry, 1969, 7, 18-90.

TUCKER, L. R., DAMARIN, F., & MESSICK, S. A base-free measure of change. Psychometrika, 1966, 31, 457-473.

WEBB, E. J., CAMPBELL, D. T., SCHWARTZ, R. D., & SECHREST, L. Unobtrusive measures: Nonreactive research in the social sciences. Chicago: Rand McNally, 1966.

The preceding paper is a reproduction of the following article (Fiske, D. W., Hunt, H. F., Luborsky, L., Orne, M. T., Parloff, M. G., Reiser, M. F., & Tuma, A. H. Planning of research on effectiveness of psychotherapy. American Psychologist, 1970, 25, 727-737.). It is reproduced here with the kind permission of the American Psychological Association © 1970. No further reproduction or distribution of this article is permitted without written permission of the publisher.