Orne, M. T., & O'Connell, D. N. Diagnostic ratings of hypnotizability. International Journal of Clinical and Experimental Hypnosis, 1967, 215, 125-133.

The International Journal of Clinical and Experimental Hypnosis 1967, Vol. XV, 125-133



Institute of the Pennsylvania Hospital and University of Pennsylvania

Abstract: A clinically derived system for the diagnostic rating of hypnotizability is described. 5 major categories of hypnotizability are differentiated. Diagnostic ratings are contrasted with standardized verbatim tests of hypnotic susceptibility. Potential usefulness in both laboratory and clinical settings is stressed.

The purpose of the present communication is to describe a system for diagnostically rating hypnotizability that has been in extensive laboratory use under a variety of conditions by persons of various backgrounds, including clinical psychology, experimental psychology, and psychiatry. While it is derived from clinical practice, it has proven to be sufficiently reproducible to be used in the experimental situation as well. The present description is intended to supplement a previous brief report by Orne (1959) and to present the rationale for the use of a diagnostic system of this type.

The measurement of hypnotic susceptibility has been made over the years in a variety of ways. During the 19th century, authorities made global judgments based on their own experience and the clinical tradition in which they had been trained. Later workers developed more standardized scales based on items of increasing difficulty. The development of these scales has been ably reviewed elsewhere (Hilgard, 1965; Hilgard, Weitzenhoffer, Landes, & Moore, 1961).

The older clinically derived scales were composed of lists of types of suggestion which formed the criteria of depth. Each type of suggestion comprised a class, the members of which could be chosen at will by the person using the scale. Thus, any of a wide range of visual hallucinations could be suggested. More recent scales have used specific items, usually given verbatim. There are important operational differences between these two procedures, and they lend themselves to different definitions of hypnotic susceptibility.

Manuscript submitted November 3, 1966.

1 This study was supported in part by grant AF-AFOSR 707-65, Air Force Office of Scientific Research.

2 The authors wish to thank their colleagues Julio M. Dittborn, Frederick J. Evans, Ulric Neisser, Emily C. Orne, Campbell W. Perry, Peter W. Sheehan, and Richard I. Thackray for their helpful comments in the preparation of this manuscript.




Scales composed of a series of specific verbatim items given under highly controlled conditions define susceptibility as the depth achieved under those conditions. They constitute a standard work sample of hypnotic performance (Hilgard, 1965). This is a standard psychometric procedure for defining a trait. It has a number of advantages, among them high reliability and wide applicability, since a minimum of specialized training is required for administration.

Scales of depth based on increasingly difficult classes of suggestion are more suited to the definition of susceptibility as the maximum depth that can be achieved under the most favorable conditions. This type of scale is essentially an achievement test after practice rather than a work sample under standard conditions. The assumption is made that sufficient practice has been provided for performance to become asymptotic. Such a scale has the disadvantage from the psychometric standpoint of being less easy to define, particularly since the depth of hypnosis achieved on any particular occasion is the result of interaction between basic hypnotic ability and the motivational and attitudinal factors present at the time of testing. In order for maximum depth to be achieved, these factors must be aligned toward the goal of achieving maximum depth. This may require a certain amount of clinical acumen.

A major reason for using a scale of this kind is to attempt to quantify hypnotic depth on the basis of clinical data as judged by experienced hypnotists. A rating is thereby obtained which is based upon an evaluation of S using hypnotic techniques best suited to the individual rather than restricting the hypnotist to a standard set of items which would not allow him to maximize the effect for a particular S. This is particularly important in the upper range of hypnotic depth.

Establishing Rapport

The intent of a diagnostic rating is to assess clinically the depth of hypnosis a S is able to achieve during a particular session. Therefore it is crucial that adequate rapport be established with S prior to any overt induction procedure. Depending on the individual and the situation, this may be achieved by a relatively short introductory conversation about neutral topics designed to put S at ease, or it may be necessary to spend a considerable period of time with him in order to discuss questions he might have about hypnosis, his response to it, the procedure, what it means, some half-formulated vague fears about the induction, etc. In some instances E may even spend a session or two establishing rapport prior to any induction procedure.

It should be emphasized that the focus of these sessions when work-



ing with a volunteer subject population, as opposed to a patient group, is on the establishment of rapport and dealing with anxieties about the specific situation rather than on the uncovering of personality patterns and basic conflicts or their interpretation, as would be the case in a clinical situation. The hypnotist is merely attempting to set up a situation where the individual is comfortable with him and willing to enter hypnosis. As much as possible he tries to create the conviction that S will be able to do this with him despite any past difficulties he might have had, emphasizing that entering hypnosis will in no way change the individual. Thus in the experimental context it is made very explicit that no therapeutic interventions will be carried out regardless of S'’s desires in this regard. Even if he requests help in concentrating on his studies, stopping smoking, or similar "superficial" problems, it is made clear that no such suggestions will be given. Elsewhere (Orne, 1965), the importance of clarifying the non- therapeutic context of experimental hypnosis has been emphasized.

In order to convey the conviction that hypnosis will indeed be possible, it is often useful to utilize a procedure such as the Chevreul pendulum, to which almost all Ss respond, and where the response tends to be subjectively impressive. Certainly, a S's failure to respond to the Chevreul pendulum should cause E to continue his attempt to establish a positive relationship prior to proceeding with any further induction. It should be clear, of course, that the adequate establishment of rapport is a necessary precondition to any hypnotic procedure. In the context of carrying out diagnostic ratings in large numbers of Ss, we have often found that a concerted, prolonged effort to establish an appropriate relationship is a more efficient utilization of E's time in inducing deep hypnosis than a similar length of time spent on a drawn-out induction procedure itself.


The Diagnostic Ratings

Diagnostic ratings are made on the basis of five categories, designated by the numbers 1 through 5 with plus and minus signs for distinctions within each category. The major categories and their main defining criteria are presented in Table 1. Essentially, they correspond to the major categories of the Davis-Husband scale (Davis & Husband, 1931).

A rating of 1 indicates that S is totally unresponsive to attempts to hypnotize him.3 He fails even the simplest kinds of ideomotor suggestions without, however, showing signs of overt resistance. A virtue of a diagnostic rating is that it allows an experienced hypnotist to evaluate whether S'’s poor response is due to lack of ability to re-

3 The rating of 1- is not used. The scale extends from 1 to 5+.




Major categories of the diagnostic rating system a


Defining criterion


no response


ideomotor response


challenge response with subjective involvement


hallucinatory response


amnesia and true posthypnotic response

a The criteria are cumulative, a S receiving a given rating also showing the criteria of all lower ratings.

spond to respond or to negativism. Thus the rating of 1 is used only if it is felt that despite S's cooperation he is unable to have any hypnotic experiences. A rating of 1+ is made if S responds to motor suggestions with slight, often intermittent movements.

If there is evidence that S is showing resistance, no rating is given. This can be determined from interview material or from behavioral indications. Total lack of response on the Chevreul pendulum may provide such an indication. If hand levitation is used as an induction procedure, an overt indication of resistance is the appearance of slight, slow, upward movements counteracted by voluntary pushing down of the hand when S is aware that the suggestions are taking effect.

The rating of 2 indicates that S is able to respond to such ideomotor suggestions as arm levitation, eye closure, and relaxation but fails to respond positively to challenge suggestions, nor is he able to experience any hallucinatory suggestions, develop amnesia, or give posthypnotic responses. Whether he is rated 2, 2+, or 2- depends upon the degree of automatism described by S in his ideomotor responses. Thus a S who experiences arm levitation as something happening to his arm without any volitional effort would be rated as 2+ whereas one who experiences his response as largely cooperation and is unable to describe any feeling of dissociation is rated 2-. Similarly, the degree to which the individual feels relaxed, in a peculiar state, strange, etc., is integrated into the final rating.

The diagnostic rating of 3 is given when S is able to respond positively to the items included in the rating of 2 and also to challenge suggestions such as an inability to bend his arm, open his eyes, etc. He is unable to experience hallucinations with any degree of reality, develop amnesia, or posthypnotic behavior. A practical way of evaluating the effectiveness of the hypnotic suggestion in producing a hallucination is to ask S to repeat the task in the waking state after the termination of trance and compare the waking with the hypnotic



performance. Occasional Ss show very strong waking imagery. An extreme example of this is a S who has a degree of eidetic imagery, which can readily be mistaken as a striking visual hallucination during hypnosis if further inquiry is not made.

It is permissible for the rating of 3 to be assigned despite the ability to develop minor degrees of glove anesthesia and dryness of the mouth. However, really good responses to this type of suggestion place S in the diagnostic category of 4. Again, the rating of 3+, 3, or 3- depends upon the degree of reality value S attaches to challenge suggestions. The S who experiences a true inability to resist arm catalepsy, describes really trying to do so (having given evidence of so doing in his behavior), and also describes the feeling of strangeness, loss of time and place at some time during hypnosis, but is not quite able to develop good hallucinatory experiences would be a typical 3+. A S who becomes quite relaxed and during challenge suggestions describes the feeling of not wanting to bend his arm (and actually not bending his arm) but also the conviction that if he were really to try he probably could bend it, perhaps demonstrating a partial response to this kind of suggestion, would be characteristic of the 3 rating.

The diagnostic rating of 4 is given when an individual is able to develop all of the responses characteristic of the 3 rating and, in addition, can have adequate hallucinatory experiences when they are suggested. Typical suggestions successful at this stage include glove anesthesia with little or no response to minor painful stimuli, very difficult challenge items such as an inability to speak his own name during hypnosis (but not posthypnotically), positive visual or auditory hallucinations, hypnotic dreams, hearing only the hypnotist's voice, etc. Suggestions not successful at this stage include good posthypnotic amnesia, convincing age regression experiences, negative visual hallucinations, or similar very difficult items. Of all of these, the most crucial distinction between the 4 and 5 category is the absence of .posthypnotic amnesia. Again, whether S is rated 4+, 4, or 4- depends upon the vividness of his experience and how real it is to him.

The diagnostic rating of 5 is given to Ss who show posthypnotic amnesia, either suggested or non-suggested, plus good posthypnotic suggestion and other classic somnambulistic criteria. It is important to note that posthypnotic amnesia must be recoverable. This is obviously necessary to distinguish the active repressive quality of posthypnotic amnesia from lack of recall due to memory factors such as decay or retroactive inhibition. The recall of amnestic material upon the lifting of amnesia need not be total, but the majority of such material must be available.

Distinctions within the 5 category are extremely important. Most



of the dramatic phenomena described in the literature are based on the relatively small number of Ss who fall within the upper limits of this range, perhaps on the order of 1 to 2% of the population.

A typical S rated 5- will show complete or nearly complete amnesia, convincing hallucinations, usually in all sense modalities, good naturalistic posthypnotic suggestion behavior, and some age regression phenomena. He would not, however, be likely to show really bizarre posthypnotic phenomena, posthypnotic positive hallucinations, apparent hypermnesia, clear negative visual hallucinations, time distortions, etc. These are the mark of the 5+ S.

These categories are defined not only by behavioral criteria but also by the subjective report of the S. Subjective report is mostly obtained after termination of hypnosis and lifting of any posthypnotic amnesia present but may also be obtained during hypnosis. This is, for example, the case when hallucination items are to be evaluated, where the reality of hallucination can be assessed while the hallucination is being experienced.


The diagnostic ratings described above conform closely to clinically derived definitions of hypnotic depth. The important demarcation points are the distinction between the 2 and 3 categories, which is defined by the presence or absence of subjective involvement, and the distinction between the 4 and 5 categories, which is defined by the presence or absence of amnesia. Both boundaries have long traditions. As Hilgard (1965) points out, many 19th century authorities divided the range of hypnotic depth on the basis of motor behavior versus perceptual or ideational effects (Gurney, Delboeuf, Dessoir, Hirshlaft) or in terms of amnesia (Bernheim, Liebeault).

It should be emphasized that the clinical evaluation of hypnotic depth is based in large part on the postexperimental interview with S, where the vividness, reality value, and overall subjective experience can be carefully evaluated. It is our view that the most characteristic aspect of hypnosis is the subjective experience and the behavorial responses are meaningful only insofar as they reflect changes in subjective experience. The diagnostic ratings, then, are based upon the combination of behavior in the light of an evaluation of the subjective experience of the individual as elicited by postexperimental interview.

It has been our experience that once plateau has been reached, which may be assessed by repeated testings, there is not likely to be any appreciable change in diagnostic rating. A S who is classified as 2+ will tend to remain in this category even with repeated training.



The speed with which he attains his maximum depth may increase with practice but his maximum depth does not; also, once depth is attained individual suggestions may take effect more rapidly. A S who may take 10 minutes to achieve complete arm levitation to the forehead on his initial diagnostic session may with practice be able to do so in a minute or less.

The diagnostic value of the present scale rests on the assumption that representative classes of suggestion can be used to predict hypnotic behavior on not only related but often qualitatively different classes of suggestion of equal (or lesser) difficulty. Thus a S who shows complete posthypnotic amnesia, thus rating in the 5 range, can with a high degree of confidence be relied upon to show, in addition, good hallucinations, etc. Conversely, a S who fails challenge suggestions will be most unlikely to show visual hallucinations. The assumptions underlying this diagnostic property are supported by extensive laboratory and clinical evidence. The fact that they still await detailed psychometric corroboration does not seriously decrease their present usefulness.

The criteria of each category are cumulative, i.e., any S passing a class of items of given difficulty will be able to pass any other class of items of equal or lesser difficulty. This may not be strictly the case in all instances, however. Items of equal difficulty within a class of items may not be passed with equal facility if for that particular S the idiosyncratic meaning of one of them produces resistance to that item. This is, of course, rare and can usually be ascertained by further clinical inquiry. Failure to respond to such an idiosyncratically meaningful item should not interfere with a S'’s rating if similar suggestions of comparable difficulty are passed.

An occasional rare S consistently fails to pass one type of suggestion even though passing others of greater difficulty, e.g., one S failed to show auditory hallucinations even though hallucinations in other modalities, including vision, were excellent, amnesia was complete, and posthypnotic suggestibility was very good. Such Ss are so rare as to be laboratory curiosities. They can either be given a tentative rating or be left unrated.

The idiosyncratic aspects of the hypnotic interrelationship may be exaggerated in the clinical situation. Patients apparently in light trance by laboratory standards may show extremely striking age regressions with heavy affect but exhibit no other phenomena considered typical of deep hypnosis. Depth of hypnosis may be less relevant in the clinical context to the amount of symptomatic improvement or therapeutic advance shown (Kline, 1955). Use of the present rating system, taking into consideration such idiosyncratic aspects of the



clinical situation, should be of value in investigating more fully the relevance of hypnotic depth.

The distribution of hypnotizability evaluated by diagnostic ratings is not known for a randomly selected population. The population, upon which the scale has been used has been heavily preselected, either on the basis of standardized test scores or on the basis of less formal group screening procedures. Only those Ss potentially at the high and low extremes of the distribution have been selected, in most instances, for further evaluation by means of diagnostic rating.

Inter-rater reliability is very high. A reliability coefficient of .98 has been found in a sample of 13 Ss based on judgments by two raters independently observing the same hypnotic sessions.4 This compares favorably with a coefficient of .96 reported in a previous study using similar ratings on a sample of 25 Ss (Shor, Orne, & O'Connell, 1962).

Inter-rating reliability, based on ratings of separate hypnotic sessions of the same S by two different raters, has been reported as .79 in a sample of 46 paired ratings (O'Connell, Orne, & Shor, 1966) and has been found to be .91 in a more recent sample of 111 Ss, which will be reported elsewhere.

In summary, some of the advantages of the diagnostic scale may be listed as follows:

1. The categories of the scale have pragmatic relevance in the experimental situation. When Ss extreme in hypnotizability are required in an experimental design, such as the double-blind design (Orne, 1959), the difference between 2+ and 3- and between 4+ and 5- can mean the difference between an adequately met design and an inadequately conducted experiment.

2. A greater stress is made on subjective involvement in the evaluation of hypnotic depth in addition to observed behavior. This is done both during hypnosis and in the postexperimental inquiry.

3. The use of the diagnostic categories allows a separation in practice of induction of hypnosis and evaluation of depth. This facilitates maximization of depth, which is difficult to achieve with standardized inductions.

4. The scale can be used repeatedly, thus further facilitating maximization. This is a problem with standardized scales, although alternate forms are now available for repeated inductions.

5. The scale allows differentiation in the upper range of hypnotizability, which is not possible with standardized tests that lack sufficient top.

6. The scale can be used in a clinical context as well as in the lab-

4 The authors wish to express their thanks to Julio Dittborn who made the independent observer ratings.



oratory, of both particularly since induction and testing procedures can be tailored to the needs the hypnotherapist and the patient.



DAVIS, L. W., & HUSBAND, R. W. A study of hypnotic susceptibility in relation to personality traits. J. abnorm. soc. Psychol., 1931, 26, 175-182.

HILGARD, E. R. Hypnotic susceptibility. New York: Harcourt, Brace & World, 1965.

HILGARD, E. R., WEITZENHOFFER, A. M., LANDES, J., & MOORE, ROSEMARIE. The distribution of susceptibility to hypnosis in a student population: A study using the Stanford Hypnotic Susceptibility Scale. Psychol. Monogr., 1961, 75, No.8 (Whole No. 512).

KLINE, M. V. Theoretical and conceptual aspects of psychotherapy. In M. V. Kline (Ed.), Hypnodynamic psychology: An integrative approach to the behavior sciences. New York: Julian Press, 1955. Pp. 75-203.

O'CONNELL, D. N., ORNE, M. T., & SHOR, R. E. A comparison of hypnotic susceptibility as assessed by diagnostic ratings and initial standardized test scores. Int. J. clin. exp. Hypnosis, 1966, 14, 324-332.

ORNE, M. T. The nature of hypnosis: Artifact and essence. J. abnorm. soc. Psychol., 1959, 58, 277-299.

ORNE, M. T. Undesirable effects of hypnosis: The determinants and management. Int. J. clin. exp. Hypnosis, 1965, 13, 226-237.

SHOR, R. E., & ORNE, EMILY C. The Harvard Group Scale of Hypnotic Susceptibility, Form A. Palo Alto, Calif.: Consulting Psychologists Press, 1962.

SHOR, R. E., & ORNE, EMILY C. Norms on the Harvard Group Scale of Hypnotic Susceptibility, Form A. Int. J. clin. exp. Hypnosis, 1963, 11, 39-47.

SHOR, R. E., ORNE, M. T., & O'CONNELL, D. N. Validation and cross-validation of a scale of self-reported personal experiences which predicts hypnotizability. J. Psychol., 1962, 53, 55-75.


Evaluaciones Diagnosticas de la Hipnotizabilidad

Martin T. Orne y Donald N. O'Connell

Resumen: Describese un sistema derivado de la clinica para evaluar la hipnotizabilidad, el cual, divide tal rasgo entre 5 categorias, cada una de ellas de un cierta amplitud. Tales evaluaciones se contrastan con la susceptibilidad a la hipnosis lograda por medio de tests literales bien estandarizados. Insinuanse los posibles usos experimentales y clinicos del procedimiento.

Diagnostische Hypnotisierbarkeitsbewertungen

Martin T. Orne und Donald N. O'Connell

Abstrakt: Ein von klinischen Erfahrungen stenunendes System fur die diagnostische Bewertung der Hypnotisierbarkeit ist bier erlautert. Es wird zwischen 5 wichtigen Hypnotisierbarkeitskategorien unterschieden. Diagnostische Bewertungen stehen im Kontrast zu festgelegten, wortgetreuen Hypnotisierbarkeitstesten. Es wird mit Nachdruck auf die Anwendungsmoglichkeit im Laboratorium sowie im klinischen Rahmen hingewiesen.

The preceding paper is a reproduction of the following article (Orne, M.T., & O’Connell, D.N. Diagnostic ratings of hypnotizability. The International Journal of Clinical and Experimental Hypnosis, 1967, 15, 125-133.). It is reproduced here with the kind permission of the Editor-in-Chief of the International Journal of Clinical and Experimental Hypnosis.