AN UPDATED META-ANALYSIS OF POST-PRL ESP-GANZFELD EXPERIMENTS: THE EFFECT OF STANDARDNESS

John Palmer

Institute for Parapsychology

The controversy about scientific evidence for psi phenomena has raged unabated for over a century. In recent years, much of the debate has focused on a relatively large number of free-response ESP experiments using a short-term sensory isolation procedure called the ganzfeld. The controversy began over 20 years ago when Charles Honorton (1978) conducted a survey of 42 previously reported ganzfeld experiments of which 23 (55%) reported statistically significant results. Honorton’s conclusions were challenged by critic Ray Hyman (1983, 1985), and Honorton (1983, 1985) responded. The combatants subsequently came together to publish a "joint communiqué" in which they agreed that the results could not be attributed to chance, but disagreed as to whether they could be explained by methodological artifacts (Hyman & Honorton, 1986).

Honorton proceeded to conduct a successful series of automated ganzfeld experiments (autoganzfeld) in which he eliminated the possible artifacts pointed out by Hyman (Honorton et al., 1990). The ganzfeld gained considerable notoriety when these experiments were summarized, along with a meta-analysis of the earlier studies, in a prestigious mainstream psychology journal, Psychological Bulletin (Bem & Honorton, 1994). The authors noted that results of Honorton’s automated ganzfeld studies fell within the confidence limits of, and were in fact quite similar to, the results of the earlier studies. A debate between Hyman (1994) and Bem (1994) on the possibility of methodological and statistical artifacts followed.

Bem and Honorton (1994) ended their paper by noting that conclusions about the broader replicability of the ganzfeld effect await results obtained by other investigators in the future. This observation inspired Julie Milton and Richard Wiseman to conduct a meta-analysis of 30 new ganzfeld studies begun after 1987 and published by February, 1997. They found that the new studies yielded overall chance results, with a Stouffer z of 0.70, p = .24, and a mean effect size of .013. These results were also reported in Psychological Bulletin (Milton & Wiseman, 1999).

The Milton/Wiseman meta-analysis has been subjected to criticism, much of which surfaced in a debate originally conducted over the Internet and edited for publication by Schmeidler and Edge (1999). The most important of these criticisms is that Milton and Wiseman failed to note that the results were significantly heterogeneous and, in the opinion of the critics, this heterogeneity invalidated the analysis. Specifically, they claimed that several studies contributing negative z-scores used non-standard ganzfeld procedures and should not have been included with the others. Presumably, the elimination of these studies would raise the Stouffer z and mean effect size to significant levels more in keeping with earlier ganzfeld results. Milton maintained that the methods used in these studies were in fact standard, and added that the methods of a successful study published after the deadline for their meta-analysis did not use standard procedures (c.f., Symmons & Morris, 1997).

The significant heterogeneity in the database suggests that some sort of procedural elements must have influenced the results, and departures from the standard ganzfeld protocol are certainly one candidate. Unfortunately, most of the judgments of non-standardness presented heretofore are susceptible to bias because they were made with knowledge of the experimental outcomes. Inspired by a suggestion from Richard Broughton, I decided to explore the possibility of obtaining non-biased, or at least less biased, estimates. Dr. Daryl Bem found graduate students from his psychology department at Cornell who agreed to do the necessary evaluations. These evaluations were based on the method sections of the relevant reports, sans results.

This exercise also provided an opportunity to update the Milton/Wiseman meta-analysis by including studies that had been published since February 1997. Thus, the method sections from these reports were provided to the raters as well.

Method

Publication Sources

The following publications were surveyed to find ganzfeld experiments reported after those analyzed by Milton and Wiseman (1999), up to the present survey, which began in January 2000. They were, in alphabetical order, European Journal of Parapsychology: 1997 and 1998; Journal of the American Society for Psychical Research: January 1997 to July 1998; Journal of Parapsychology: December 1996 to June 1999; Journal of Scientific Exploration: Spring 1997 to Winter 1999; Journal of the Society for Psychical Research: April 1997 to October 1999; and Proceedings of the Parapsychological Association: 1997, 1998, and 1999. In keeping with Milton and Wiseman (1999), experimental series segregated as such within a given report were treated separately, although experimental conditions within a given series were not. Using this criterion, 10 additional studies were found from six reports (Alexander & Broughton, 1999; Dalton, 1997; Parker & Westerlund, 1998; Symmons & Morris, 1997; Wezelman & Bierman, 1997; Wezelman, Gerding, & Verhoeven, 1997). All studies described as using some sort of ganzfeld procedure were included; no efforts were made at this stage to identify non-standard procedures.

When reports appeared both in the Proceedings of the Parapsychological Association and a journal, the journal publication was chosen. This led to two changes from the sources used by Milton and Wiseman (1999). The journal paper by Broughton and Alexander (1997) replaced the Proceedings paper by Broughton and Alexander (1996). The journal paper by Parker, Frederiksen, and Johansson (1997) replaced the Proceedings paper by Johansson and Parker (1995). The journal papers were not available when Milton and Wiseman did their meta-analysis. These substitutions did not affect the statistical outcomes reported by Milton and Wiseman (1999) for these studies.

Raters

The raters were three advanced graduate students in psychology at Cornell University, selected by Bem.

Preparation of Rating Materials

Editing. Method sections were photocopied from each of the experimental reports and edited to eliminate the following three types of material:

Paragraphs were eliminated through a scissors-and-paste procedure, whereas sentences were eliminated by covering them with a black magic marker. The resulting text was then photocopied again.

There were only 36 separate method sections, because of four instances in which the methods were identical for two separate series.

On top of each method section was stapled one or more rating sheets (depending on the number of series described in each section). The rating sheet consisted of a 7-point Likert-type scale with "standard" and "non-standard" defining the left and right poles respectively. Underneath the scale were ten blank spaces in which the raters could specify methodological deviations that influenced their ratings.

The raters were also provided with material to define a standard ganzfeld. This material consisted of the following two texts:

Finally, the raters were told by Bem to not consult with one another about their ratings, that is, to make them independently.

Randomization. The plan was to have the experimental series evaluated in random order. This objective could not be completely met because the method sections for some series referred the reader back to the method sections describing previous series by the same experimenters. This required that some series be bundled together, as follows:

Each of the 36 method sections had their own code number, selected by random permutation using a computer program Palmer wrote using Turbo basic software. Code numbers were written on the upper left corner of each rating sheet by an assistant not otherwise involved in the study. Then, three additional random permutations of the numbers 1-20 were generated. The assistant used these three permutations to order the sequence of packets differently for each rater. Palmer remained blind to these orders. The assistant then placed each set of materials in envelopes marked 1, 2, and 3 on the outside and given to Richard Broughton, who confirmed that they were properly assembled. He sealed the envelopes, which were subsequently mailed to Bem for distribution to the raters.

Instructions to Raters

The instruction were emailed to Bem, who downloaded them and gave them to the raters along with the sealed envelopes. They are reproduced verbatim below:

We are engaged in a meta-analysis of ESP experiments using an altered-states induction procedure called the ganzfeld. We wish to set apart studies that used non-standard methodologies, but it is a matter of judgment when the deviance from the standard ganzfeld method is great enough to merit the label of non-standard. We do not want to make this judgment ourselves, as we might be biased by our knowledge of the experimental outcomes.

This is where you can help us. In the enclosed envelope you will find 20 packets containing 36 method sections reproduced from reports describing 40 ESP-ganzfeld experiments or experimental series. They are complete, except for the removal of most sub-sections describing psychological tests and hypotheses, and references to results of other experiments that are included in the set. The method sections are arranged in random order, except for those that are included in the same report or that refer the reader to an earlier report for details not repeated in the later report. In such cases, descriptions given in the first method section will apply to some degree to series described in the subsequent method sections. These sections are stapled together to form single packets. All the packets are numbered 1 - 20 in the upper-right corner and should appear in this order in your envelope.

In front of each method section you will find one or more 7-point rating scales with "standard" and "non-standard" at the poles. Your task is to circle the number that corresponds most closely to your estimate of the degree to which the method for the series in question deviates from the standard. Your ratings need not fall in any particular distribution or cover any particular range. At the bottom of the sheet, you will be asked to list each deviant element you found in the protocol that you considered important enough to influence your rating. (Standardness and importance will be defined later in these instructions.) Read over all the method sections first, to get a sense of the range of procedures utilized. Then go back and read each one again carefully and give it a rating. It is important that for each run-through you read them in the order they arrived in the ENVELOPE. ALSO, DO NOT REMOVE ANY OF THE STAPLES.

If there are two or more experimental conditions, all of which involve the ganzfeld, base your rating on the average standardness of the conditions. If a ganzfeld and non-ganzfeld condition are compared, base your rating on the ganzfeld condition, except that in a within-subjects design you can consider any impact the non-ganzfeld condition might (or might not) have had on scoring in the ganzfeld condition.

Of course, to rate series for standardness you need a description of what a standard ganzfeld is. This information is supplied in the packet labeled "STANDARD" inside your envelope. It consists of two sections. The section entitled "The Ganzfeld Procedure" comes from an article co-authored by Dr. Bem and published in Psychological Bulletin. It specifies the main ingredients of the standard ganzfeld method, and these elements must be included in any ganzfeld procedure if it is to be considered purely standard. You will note that for a few procedural elements the section says that they are used "most often", "typically", or something to that effect. In these instances, the opposite procedure can still be considered standard. For example, the page states that "most often" the procedure includes a sender (telepathy). However, the minority of studies that did not use a sender (clairvoyance) can still be considered standard. Deviant elements can either be substitutes for standard elements or additions to them.

The second section comes from a paper describing a prototypical ganzfeld experiment that served as the model for many of the studies you will be rating. This experiment is often referred to in the to-be-rated method sections as "the PRL experiment" or "the Honorton experiment". Experiments need not conform to all the details of this protocol to be considered standard, but procedures cited in this section should not be considered non-standard if they are incorporated in the studies you will be rating. (Note: One feature of the PRL experiment not mentioned in its methodological description is that the experimenter, while still blind to the target, sometimes helped the subject do the judging.)

You should take note of authors’ declarations that their procedures were standard or non-standard, but you are not bound by such declarations.

You should treat as standard the use of artistic or "creative" subject samples (as one of the most successful components of the PRL experiment used such a sample) or subjects having had previous psi experiences or having practiced a mental discipline such as meditation (as such subjects were shown to be the best scorers in the PRL experiment).

There are a few kinds of deviations you should not count at all. Do not pay attention to psychological tests that might have been given to the subjects, unless they are given while the subject is actually in the ganzfeld or influence the selection of subjects. Even in these cases it is up to you to decide how much, if any, such factors make the method non-standard. Also, do not consider sample size or the method of statistical analysis. Finally, do not count deviations the only effect of which is to influence the likelihood of artifacts, such as sensory leakage of the target information. Such deviations are important in the broader scheme of things, but not for this exercise.

You should base your judgment of standardness not only on the number of deviant elements but also on their importance. Judgments of importance should reflect how likely you think it is that the deviant element might have influenced the results, based on common sense and your understanding of how such judgments are made for other kinds of psychology experiments. In so doing, you should pay attention to the rationale or theory parapsychologists have developed to explain why the ganzfeld should facilitate high ESP scores (although lack of such relevance does not preclude a deviant element from being important). You will find that the Psychological Bulletin article discusses this rationale.

You will note that the method sections come in a variety of typefaces. Some come from journal articles while others come from unedited manuscripts accepted for presentation at the annual convention of the Parapsychological Association, parapsychology’s equivalent to the APA. Many of these latter papers will subsequently appear in professional journals. In some cases, you may be able to infer from the typeface and presence of typos whether the paper is a journal article or part of the convention Proceedings. The point here is that the quality or characteristics of the studies are not related to the publication source, so this variable should have no influence whatsoever on your ratings.

Our hope is that you will be able to complete your ratings without asking us for help. However, if you find it is essential to clarify something in order to give a meaningful rating, your question should be directed to Dr. Bem.

After you have finished rating all the method sections, please fill out the Summary Sheet, which asks you to simply copy the numerical rating you gave to each series, DEFINED BY ITS CODE NUMBER. (Code numbers appear in the upper left corner of each rating sheet.) We will use this information to determine how to analyze the data, and it is important that while making this decision we are blind to which ratings correspond to which particular studies. (We do not now know which code numbers go with which method sections.) Please make sure not to write anything on the Summary Sheet that could compromise this blind. Then replace the packets back in the envelope, and return the envelope and Summary Sheet separately to Dr. Bem.

 

Results

The Basic Update

Table 1 presents the z-scores and effect sizes for all 40 studies in the sample, and the meta-analyses are summarized in Table 2. The figures from Milton and Wiseman (1999) were accepted for the 30 studies in their analysis, and the procedures they used were duplicated to the extent possible for the ten new studies. In cases where the number of direct hits was reported, an exact binomial probability was computed and converted to a one-tailed z score. There were three studies (Symmons & Morris, 1997; Wezelman & Bierman, 1997, Series V and VI) in which hits were reported for both subject judges and outside judges. In these cases, z-scores were computed for both counts and averaged. This was the procedure Milton and Wiseman (1999) apparently used in the most comparable case from their survey (McDonough et al., 1994). In the Serial Series of Parker and Westerlund (1998), the total number of hits for the 30 subjects, averaged over the four trials per session, was calculated to be 6.75, and the binomial probability of this value was obtained using .75 extrapolation between 6 and 7. Effect sizes were calculated using the formula employed by Milton and Wiseman (1999), z/N½ (hereafter labeled ES).

The 10 new ganzfeld studies yielded more positive results than the 30 reported by Milton and Wiseman (1999). In contrast to their Stouffer z of 0.70, the Stouffer z for the new studies is 3.98, p = 3.5 x 10-5. This value is inflated by a z of 5.20 assigned to the experiment of Dalton (1997), which also had the largest sample size of the entire 40. The mean effect size for the 10 new studies, ES = .165, does not quite differ significantly from that in the earlier sample, ES = .013; U = 101.0, p = .125.

The most important results are those of the combined 40 studies, which represent the current status of ganzfeld replications. The Stouffer z is 2.59, which is now significant, p = .0048. The mean ES is .051, which still falls outside the confidence interval of .117 to .408 for the 28 direct-hit manual ganzfeld studies, as well as the .059 to .269 confidence interval for the 10 autoganzfeld series (c.f., Bem & Honorton, 1994).

The heterogeneity of the z-scores for the 30 studies examined by Milton and Wiseman (1999) is represented by c 2 (30) = 46.17, p = .030. Adding the 10 new studies markedly increases it, c 2 (40) = 93.48, p = 3.6 x 10-6.

Mismatch on Direct Hits

The comparison between the post-PRL ganzfeld studies and the earlier manual series summarized by Bem and Honorton (1994) is not entirely valid because the earlier meta-analysis was restricted to those studies that reported the proportion of direct hits (DH), which was 28 out of 42. Honorton (1985) isolated these DH studies to counter a criticism by Hyman (1985) that Honorton’s original meta-analysis was biased because it included several different kinds of ESP scores, some of which may have been selected post-hoc.

Although Honorton’s move was justified as a response to Hyman’s critique, it left a truncated sample of studies that may not be representative of the whole. Thus, it was decided to calculate z-scores for the non-DH studies. For the three studies that used binary hits (Braud & Wood, 1977; Habel, 1976; Parker, 1975), the same procedure was used as for the DH studies, except that .25 was changed to .50 for calculating the exact probabilities. For the four studies that used binary coding of ten target content categories to arrive at an ESP score (Rogo, 1977; Smith, Tremmel, & Honorton, 1976; Terry, 1976; Terry et al., 1976), the p-value of the t-test value based on the mean number of matches (MCE = 5) was converted to a z-score. In the Rogo (1977) study, the t was computed from raw data available in the report. For Terry (1976), an estimate of the standard deviation was necessary to compute the t. Taking the average standard deviations from the other three studies in this subsample, which were all close to each other, allowed the estimate to be derived. For Dunne, Warnock, and Bisaha (1977), the mean rank scores of Parts A and B were averaged, and the exact probability of this average was found using the tables of Solfvin, Kelly and Burdick (1978). For three other studies (Keane & Wells, 1979; Roney-Dougal, 1982; Stanford & Neylon, 1975), single-mean t-tests of continuous ESP scores provided the p-values that were converted to z-scores. Finally, three studies provided too little information on which to base an appropriate estimate (Palmer, Whitson & Bogart, 1980; Parker, Millar & Beloff, 1977; Stanford, 1979). In all three cases, the mean ESP score was close to chance.

The 28 DH studies had a mean ES of .263, compared to.055 for the 11 other studies for which effect sizes were calculated. The difference, although substantial, was not quite significant, U = 204.0, p = .119. The mean ES for all 39 studies is .204, with a confidence interval from .080 to .328. The mean ES for the 40 post-PRL studies, .051, still falls outside this interval.

An alternative approach to comparing the pre- and post-PRL ganzfeld studies is to exclude from the post-PRL sample those studies that did not include a direct hit measure in the primary report, and compare the remainder to the pre-PRL DH studies. There were six post-PRL series from five reports that did not report direct hits (Kanthamani et al., 1988; Kanthamani & Broughton, 1990, Kanthamani & Khilji, 1992, Parker & Westerlund, 1998 [Serial Series], Stanford & Frank, 1991). These series had a mean ES of -.135, compared to .084 for the 34 DH studies. The difference once again approached significance, U = 151.5, p = .060. Thus, in both the pre- and post-PRL samples, studies that did not report direct hits had markedly inferior results. The mean ES of the 34 post-PRL DH studies was .084, which still falls outside the confidence interval of .117 to .408 for the manual ganzfeld DH studies but inside the confidence interval of .059 to .269 for the 10 autoganzfeld series. The mean hit rate for the 34 post-PRL DH studies is 30.1%, compared to 38.4 % for the manual ganzfeld database and 34.4% for the autoganzfeld.

Standardness of Ganzfeld Test Procedures

Chronbach’s α for the ratings of the three raters was .78, which is large enough to justify pooling. The mean of the three sets of ratings on the 7-point scale (with 7 meaning maximally non-standard) was 2.69. Examination of the histogram revealed a generally continuous distribution with no clear breaks. It had previously been decided that in this case the mean ratings would serve as weights to be applied to the ESP scores to produce a revised meta-analysis taking account of standardness. (For this purpose, each mean weight was subtracted from 8, so that the most standard studies received the highest weights.) It was decided as a secondary analysis to divide the studies into discrete standard and non-standard groups using the scale midpoint of 4 as the cutoff. (Three studies in which the mean fell exactly at 4 were eliminated from these analyses.)

The degree of standardness is positively and significantly correlated with ES, rs = .361, p = .022. Thus, the expectation that the studies using the most standard methodology produced the most positive ESP results is confirmed. The same finding holds when the studies are divided into standard and non-standard subgroups. The mean ES of the 28 standard studies is .104, compared to -.104 for the non-standard studies, U = 212.0, p = .017.

Weighting by standardness scores increases the Stouffer z of the 40 post-PRL studies from 2.59 to 3.43, and the mean ES from .051 to .078. This new ES falls just outside the confidence interval of .080 to .328 for the 39 pre-PRL manual ganzfeld studies and within the interval of .059 to .269 for the 10 PRL autoganzfeld series. Using the categorical approach, the Stouffer z for the 28 post-PRL studies classified as standard is 3.70. The mean ES of .104 falls well within the confidence intervals for both of the preceding databases.

Considering only the DH studies, weighting by standardness increases the Stouffer z of the 34 post-PRL studies from 3.42 to 3.94, the mean ES from .084 to .101, and the mean direct hit rate from 30.1% to 31.4%. The new ES falls outside the confidence interval of .117 to .408 for the 28 manual ganzfeld DH studies but within the interval of .059 to .269 for the 10 PRL autoganzfeld series. Using the categorical approach, the Stouffer z for the 27 post-PRL studies classified as standard is 4.03. The mean ES is .114, which falls just barely outside the confidence limits for the manual ganzfeld database but well within the limits for the autoganzfeld database.

Weighting by Sample Size

The right-hand portion of Table 2 provides meta-analytic results based on weighting by sample size. Details of these analyses will not be provided in the text, except to note that such weighting elevates the outcomes for the post-PRL studies and depresses them for the studies conducted earlier. Thus, using such weights has the effect of bringing the two sets of outcomes into closer proximity with each other.

 

Discussion

The Basic Update

The 10 ganzfeld studies published after the M/W meta-analysis revealed a marked improvement in outcomes as compared to their 30 studies, although the difference is not quite significant. If it is real, I have no plausible explanation for it, although it should be noted that all the new studies were presumably underway before the Milton and Wiseman (1999) meta-analysis was available and, thus, could not have been influenced by it. In any event, the new studies elevated the total post-PRL Stouffer z to statistical significance, although the effect size remained well outside the confidence intervals for both the manual ganzfeld direct-hit studies and the autoganzfeld series.

We confirmed that the studies in the M/W meta-analysis were indeed heterogeneous, albeit marginally. Adding the 10 new studies increased this heterogeneity. Some parapsychologists have argued that meta-analyses should not be undertaken at all if heterogeneity reaches statistical significance, in which case one solution is to remove outliers (cf., Schmeidler & Edge, 1999). Although heterogeneity indeed implies that results of some of the component experiments are influenced by extraneous factors, in my opinion this does not disqualify or render invalid a summary judgment of the whole, provided the above qualification is noted. (The controversy here is similar to whether a main effect in analysis of variance is interpretable if an interaction involving that variable is significant.) Finally, it should be pointed out that the manual ganzfeld database is strongly heterogeneous, and its meta-analysis too must be discounted if heterogeneity is considered fatal.

Mismatch on Direct Hits

Comparison of the post-PRL database with previous databases is complicated by the fact that the preferred meta-analysis of the old manual ganzfeld experiments was limited to studies in which the proportion of direct hits (DH) was reported, whereas the post-PRL database was not so constrained. I discovered that the non-DH studies in the manual ganzfeld database produced markedly less positive results than those reporting direct hits, and this trend was replicated in the post-PRL database. (A plausible explanation for this finding will be discussed below.) Comparisons between all studies in each database and between the DH studies in each database both failed to change the fundamental outcome, except that the post-PRL DH studies now fell within the confidence interval for the autoganzfeld series.

The Effect of Standardness

The one "extraneous variable" that has been explicitly suggested as causing the heterogeneity in the post-PRL database is departures from the standard ganzfeld methodology (Schmeidler & Edge, 1999), and this is what prompted the current project. Ratings by three psychology graduate students on a 7-point rating scale of methodological standardness were found to be reliable and to correlate significantly with the effect sizes of the post-PRL experiments. Departures from the standard ganzfeld protocol were associated with a drop in ESP scores, and it is reasonable to suppose that the association is causal. Using these ratings as weights in a modified meta-analysis produced elevations in both the Stouffer z and effect size measures.

The primary meta-analysis for which standardness ratings were used as weights yielded an effect size that fell just a hair outside the confidence limits of the full pre-PRL manual ganzfeld database and safely within the limits of the autoganzfeld database. A secondary meta-analysis, conducted on a subgroup of "standard" post-PRL experiments defined as having a mean standardness rating above the midpoint on the 7-point scale, yielded an effect size that fell well within the confidence limits for both databases. Thus, one can make a stronger claim for the standard studies replicating previous work than for all the studies. The one caveat is that the manual ganzfeld database might have contained some non-standard studies, the removal of which might raise the confidence interval to the point that it no longer encompassed the post-PRL effect size. As I have no standardness ratings for the manual ganzfeld studies, I cannot address this possibility further. This problem does not apply to the autoganzfeld database, all analyzed components of which can be considered standard.

Restricting the preceding analyses to the DH studies gave weaker results than using all studies, with the effect sizes falling within the confidence limits of the autoganzfeld studies exclusively.

It was pointed out above that the failure of the non-DH studies to produce positive results might be attributable to their use of non-standard methods in other respects. This speculation is confirmed by the data. The six non-DH studies in the post-PRL database had a mean standardness rating of only 3.56, compared to 5.54 for the DH studies; U = 171.0, p = .008. As the raters were told not to consider method of statistical analysis in making their judgments of standardness, this effect is not confounded.

The mean standardness ratings ranged from 1.00 (maximum standardness) to 6.67. The highest rating was given to the two experiments by Willin (1996a, 1996b) that used musical targets. This was the study cited most prominently by critics of the W/M meta-analysis as being non-standard (Schmeidler & Edge, 1999). Milton had focused on the Symmons and Morris (1997) study as being non-standard because of drum beats replacing pink noise as the auditory stimulation. This study received a 4.00 from the raters, which is the midpoint of the scale. Another study with negative results, which I had heard informally was considered non-standard by critics of the M/W meta-analysis, was Kanthamani and Palmer (1993), the reason being that the sender viewed the target subliminally and was doing an REG PK task in between exposures. This study received a quite benign 1.67 rating.

Although the raters were blind to the results of the experiments, the persons who made up the instructions (namely, the authors) were not, which in theory created a source of potential bias. This situation was unavoidable; even if I had given the raters no guidelines at all, that still would have been a decision, and potentially capable of tilting the results in one direction or the other. One motivating principle in deriving the guidelines was to prevent the raters from citing as deviations aspects of the procedure that both sides in the controversy would most likely consider within the bounds of standardness. If raters had been allowed more license, I feared that so many studies would be rated as non-standard that the exercise would not successfully discriminate between them.

In practice, our most important decision was to define a standard ganzfeld with reference to what had been done prior to the post-PRL studies. This, in turn, was defined operationally as Bem and Honorton’s (1994) description of the general ganzfeld procedure and the method section describing the PRL autoganzfeld series (Honorton et al., 1990). These descriptions were written before the post-PRL studies were reported, so there is no way the latter could have influenced the former. All procedural variations mentioned in the Bem/Honorton section were defined as standard, even if they were not used universally in the earlier work. It seemed reasonable that if the authors included them without additional qualification in a section devoted to defining a standard ganzfeld, such was sufficient grounds to define them as standard. Raters were told not to penalize subject-selection mechanisms that had been in the autoganzfeld studies (e.g., creative/artistic; Schlitz & Honorton, 1992) or were based on knowledge of the best performers in those studies (e.g., experience with a mental discipline; Honorton, 1997.) As the controversy has centered on procedures used in the ganzfeld session per se, the guidelines focused raters’ attention on that part of the proceedings, the one notable exception being that they were allowed to consider the impact of other conditions in a within-subjects design. Finally, I stressed that the raters should consider the importance of the deviations as well as their mere presence.

Conclusion

Our main results can briefly be summarized as follows. The updated meta-analysis of the post-PRL ganzfeld yielded a significant Stouffer z, and restricting the sample to studies rated as using standard methodology, in addition, caused the effect size to fall generally inside the confidence interval of earlier databases, clearly for the autoganzfeld database and more equivocally for the manual ganzfeld database. These conclusions differ from those of Milton and Wiseman (1999), which were based on a smaller group of studies, and support the replicability of the ganzfeld procedure.

References

Alexander, C. H., & Broughton, R. S. (1999). CL1-Ganzfeld study: A look at brain hemisphere differences and scoring in the autoganzfeld. In The Parapsychological Association 42nd Annual Convention: Proceedings of Presented Papers (pp.3-18). Durham, NC: Parapsychological Association.

Bem, D. J. (1994). Response to Hyman. Psychological Bulletin, 115, 25-27.

Bem, D. J., & Honorton, C. (1994). Does psi exist? Replicable evidence for an anomalous process of information transfer. Psychological Bulletin, 115, 4-18.

Bierman, D. J. (1995). The Amsterdam ganzfeld series III & IV: Target clip emotionality, effect sizes and openness. In The Parapsychological Association 38th Annual Convention: Proceedings of Presented Papers (pp.27-37). Durham, NC: Parapsychological Association.

Bierman, D. J., Bosga, D. J., Gerding, H., & Wezelman, R. (1993). Anomalous information access in the ganzfeld: Utrecht ¾ Novice Series I and II. In The Parapsychological Association 36th Annual Convention: Proceedings of Presented Papers (pp. 192-203). Durham, NC: Parapsychological Association.

Braud, W. G., & Wood, R. (1977). The influence of immediate feedback on free-response GESP performance during ganzfeld stimulation. Journal of the American Society for Psychical Research, 71, 409-427.

Broughton, R. S., & Alexander, C. H. (1996). Autoganzfeld II: An attempted replication of the PRL ganzfeld research. In The Parapsychological Association 39th Annual Convention: Proceedings of Presented Papers (pp. 45-56). Durham, NC: Parapsychological Association.

Broughton, R. S., & Alexander, C. A. (1997). Autoganzfeld II: An attempted replication of the PRL ganzfeld research. Journal of Parapsychology, 61, 209-226.

Dalton, K. (1994). A report on informal ganzfeld trials and comparison of receiver/sender sex pairing: Avoiding the file drawer. In D. J. Bierman (Ed.), The Parapsychological Association 37th Annual Convention: Proceedings of Presented Papers (pp. 104-113). Durham, NC: Parapsychological Association.

Dalton, K. (1997). Exploring the links: Creativity and psi in the ganzfeld. In The Parapsychological Association 40th Annual Convention: Proceedings of Presented Papers (pp. 119-134). Durham, NC: Parapsychological Association.

Dunne, B. J., Warnock, E., & Bisaha, J. P. (1977). Ganzfeld techniques with independent rating for measuring GESP and precognition. In J. D. Morris, W. G. Roll, & R. L. Morris (Eds.), Research in parapsychology 1976 (pp. 41-43). Metuchen, NJ: Scarecrow Press.

Edge, H., & Schmeidler, G. R. (1999). Should ganzfeld research continue to be crucial in the search for a replicable psi effect? Part II. Edited ganzfeld debate. Journal of Parapsychology, 63,

Habel, M. M. (1976). Varying auditory stimuli in the ganzfeld: The influence of sex and overcrowding on psi performance. In J. D. Morris, W. G. Roll, & R. L. Morris (Eds.), Research in parapsychology 1975 (pp. 181-184). Metuchen, NJ: Scarecrow Press.

Honorton, C. (1978). Psi and internal attention states: Information retrieval in the ganzfeld. In B. Shapin & L. Coly (Eds.), Psi and states of awareness (pp. 79-90). New York: Parapsychology Foundation.

Honorton, C. (1983). Response to Hyman’s critique of psi ganzfeld studies. In W. G. Roll, J. Beloff, & R. A. White (Eds.), Research in parapsychology 1982 (pp. 23-26). Metuchen, NJ: Scarecrow Press.

Honorton, C. (1997). Meta-analysis of ganzfeld research: A response to Hyman. Journal of Parapsychology, 49, 51-59.

Honorton, C. (1997). The ganzfeld novice: Four predictors of initial ESP performance. Journal of Parapsychology, 61, 143-158.

Honorton, C., Berger, R. E., Varvoglis, M. P., Quant, M., Derr, P., Schechter, E. I., & Ferrari, D. C. (1990). Psi communication in the ganzfeld: Experiments with an automated testing system and a comparison with a meta-analysis of earlier studies. Journal of Parapsychology, 54, 99-139.

Hyman, R. (1983). Does the ganzfeld answer the critics’ objections? In W. G. Roll, J. Beloff, & R. A. White (Eds.), Research in parapsychology 1982 (pp. 21-23). Metuchen, NJ: Scarecrow Press.

Hyman, R. (1985). The ganzfeld psi experiment: A critical appraisal. Journal of Parapsychology, 49, 3-49.

Hyman, R. (1994). Anomaly or artifact? Comments on Bem and Honorton. Psychological Bulletin, 115, 19-24.

Hyman, R., & Honorton, C. (1986). A joint communiqué: The psi ganzfeld controversy. Journal of Parapsychology, 50, 350-364.

Johansson, H., & Parker, A. (1995). Replication of the ganzfeld findings: Using simplified ganzfeld procedure. In The Parapsychological Association 38th Annual Convention: Proceedings of Presented Papers (pp. 156-160). Durham, NC: Parapsychological Association.

Kanthamani, H., & Broughton, R. S. (1992). An experiment in ganzfeld and dreams: A further confirmation. In The Parapsychological Association 35th Annual Convention: Proceedings of Presented Papers (pp. 59-73). Durham, NC: Parapsychological Association.

Kanthamani, H., & Broughton, R. S. (1994). Institute for Parapsychology ganzfeld-ESP experiments: The manual series. In D. J. Bierman (Ed.), The Parapsychological Association 37th Annual Convention: Proceedings of Presented Papers (pp. 182-189). Durham, NC: Parapsychological Association.

Kanthamani, H., & Khilji, A. (1990). An experiment in ganzfeld and dreams: A confirmatory study. In The Parapsychological Association 33rd Annual Convention: Proceedings of Presented Papers (pp.126-137). Durham, NC: Parapsychological Association.

Kanthamani, H., Khilji, A., & Rustomji-Kerns, R. (1988). An experiment in ganzfeld and dreams with a clairvoyance technique. In The Parapsychological Association 31st Annual Convention: Proceedings of Presented Papers (pp. 412-423). Durham, NC: Parapsychological Association.

Kanthamani, H., & Palmer, J. (1993). A ganzfeld experiment with "subliminal sending". Journal of Parapsychology, 57, 241-257.

Keane, P., & Wells, R. (1979). An examination of the menstrual cycle as a hormone related physiological concomitant of psi performance. In W. G. Roll (Ed.), Research in parapsychology 1978 (pp. 72-74). Metuchen, NJ: Scarecrow Press.

McDonough, B. E., Don, N. S., & Warren, C. A. (1994). EEG in a ganzfeld psi task. In D. J. Bierman (Ed.), The Parapsychological Association 37th Annual Convention: Proceedings of Presented Papers (pp. 273-283). Durham, NC: Parapsychological Association.

Milton, J., & Wiseman, R. (1999). Does psi exist? Lack of replication of an anomalous process of information transfer. Psychological Bulletin, 125, 387-391.

Morris, R. L., Cunningham, S., McAlpine, S., & Taylor, R. (1993). Toward replication and extension of ganzfeld results. In The Parapsychological Association 36th Annual Convention: Proceedings of Presented Papers (pp. 177-191). Durham, NC: Parapsychological Association.

Morris, R. L., Dalton, K., Delanoy, D. L., & Watt, C. (1995). Comparison of the sender/no sender condition in the ganzfeld. In The Parapsychological Association 38th Annual Convention: Proceedings of Presented Papers (pp. 244-259). Durham, NC: Parapsychological Association.

Palmer, J., Whitson, T., & Bogart, D. N. (1980). Ganzfeld and remote viewing: A systematic comparison. In W. G. Roll (Ed.), Research in parapsychology 1979 (pp. 169-171). Metuchen, NJ: Scarecrow Press.

Parker, A. (1975). Some findings relevant to the change of state hypothesis. In J. D. Morris, W. G. Roll, & R. L. Morris (Eds.), Research in parapsychology 1974 (pp. 40-42). Metuchen, NJ: Scarecrow Press.

Parker, A., Frederiksen, A., & Johansson, H. (1997). Towards specifying the recipe for success in the ganzfeld. European Journal of Parapsychology, 13, 15-27.

Parker, A., Millar, B. & Beloff, J. (1977). A three-experimenter ganzfeld: An attempt to use the ganzfeld technique to study the experimenter effect. In J. D. Morris, W. G. Roll, & R. L. Morris (Eds.), Research in parapsychology 1976 (pp. 52-54). Metuchen, NJ: Scarecrow Press.

Parker, A., & Westerlund, J. (1998). Current research in giving the ganzfeld an old and a new twist. In The Parapsychological Association 41st Annual Convention: Proceedings of Presented Papers (pp. 135-142). Durham, NC: Parapsychological Association.

Rogo, D. S. (1977). A preliminary study of precognition in the ganzfeld. European Journal of Parapsychology, 2(1), 60-67.

Roney-Dougal, S. M. (1982). A comparison of psi and subliminal perception: a confirmatory study. In W. G. Roll, R. L. Morris , and R. White (Eds.), Research in parapsychology 1981 (pp. 96-99). Metuchen, NJ: Scarecrow Press.

Schlitz, M. J., & Honorton, C. (1992). Ganzfeld psi performance within an artistically gifted population. Journal of the American Society for Psychical Research, 86, 83-98.

Smith, M., Tremmel, L. & Honorton, C. (1976). A comparison of psi and weak sensory influences on ganzfeld mentation. In J. D. Morris, W. G. Roll, & R. L. Morris (Eds.), Research in parapsychology 1975 (pp. 191-194). Metuchen, NJ: Scarecrow Press.

Solfvin, G. F., Kelly, E. F., & Burdick, D. S. (1978). Some new methods of analysis for preferential-ranking data. Journal of the American Society for Psychical Research, 72, 93-109.

Stanford, R. G. (1979). The influence of auditory ganzfeld characteristics upon free- response ESP performance. Journal of the American Society for Psychical Research, 73, 253-272.

Stanford, R. G., & Frank, S. (1991). Prediction of ganzfeld-ESP task performance from session-based verbal indicators of psychological function: A second study. Journal of Parapsychology, 55, 349-376.

Stanford, R. G., & Neylon, A. (1975). Experiential factors related to free-response clairvoyance performance in a sensory uniformity setting (ganzfeld). In J. D. Morris, W. G. Roll, & R. L. Morris (Eds.), Research in parapsychology 1974 (pp. 89-93). Metuchen, NJ: Scarecrow Press.

Symmons, C., & Morris, R. L. (1997). Drumming at seven Hz and automated ganzfeld performance. In The Parapsychological Association 40th Annual Convention: Proceedings of Presented Papers (pp. 441-454). Durham, NC: Parapsychological Association.

Terry, J. (1976). Comparison of stimulus duration in sensory and psi conditions. In J. D. Morris, W. G. Roll, & R. L. Morris (Eds.), Research in parapsychology 1975 (pp. 179-181). Metuchen, NJ: Scarecrow Press.

Terry, J., Tremmel, L., Kelly, M., Harper, S., & Barker, P. (1976). Psi information rate in guessing and receiver optimization. In J. D. Morris, W. G. Roll, & R. L. Morris (Eds.), Research in parapsychology 1975 (pp. 194-198). Metuchen, NJ: Scarecrow Press.

Wezelman, R., & Bierman, D. J. (1997). Process oriented ganzfeld research in Amsterdam. In The Parapsychological Association 40th Annual Convention: Proceedings of Presented Papers (pp. 477-491). Durham, NC: Parapsychological Association.

Wezelman, R., Gerding, J. L. F., & Verhoeven, I. (1997). Eigensender ganzfeld psi: An experiment in practical philosophy. European Journal of Parapsychology, 13, 28-39.

Williams, C., Roe, C. A., Upchurch, I., & Lawrence, T. R. (1994). Senders and geomagnetism in the ganzfeld. In D. J. Bierman (Ed.), The Parapsychological Association 37th Annual Convention: Proceedings of Presented Papers (pp. 429-438). Durham, NC: Parapsychological Association.

Willin, M. J. (1996a). A ganzfeld experiment using musical targets. Journal of the Society for Psychical Research, 61, 1-17.

Willin, M. J. (1996b). A ganzfeld experiment using musical targets with previous high scorers from the general population. Journal of the Society for Psychical Research, 61, 103-106.

Table 1

Number of Trials, z-Score, Effect Size (ES), and Standardness Rating for Each Study in the Updated Ganzfeld Database

¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾

Study Trials z-score ES DH% Std.

¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾

*Alexander & Broughton (1999) 50 1.60 0.23 36.0 1.33

Bierman et al. (1993) Series I 50 0.03 0.00 26.0 1.00

Bierman et al. (1993) Series II 50 -0.30 -0.04 24.0 1.00

Bierman (1995) Series III 40 1.94 0.31 40.0 3.67

Bierman (1995) Series IV 36 1.33 0.22 36.1 3.67

* Wezelman & Bierman (1997) Series IVb 32 -1.48 -0.26 15.6 4.00

* Wezelman & Bierman (1997) Series V 40 -0.91 -0.14 20.0 5.00

* Wezelman & Bierman (1997) Series VI 40 -0.15 -0.02 25.0 5.00

Broughton & Alexander (1997) FT1a 50 -0.30 -0.04 24.0 1.00

Broughton & Alexander (1997) FT2a 50 -1.33 -0.19 18.0 1.00

Broughton & Alexander (1997) EC1a 51 1.81 0.25 37.3 1.00

Broughton & Alexander (1997) CLAIR1a 50 -0.64 -0.09 22.0 1.33

Broughton & Alexander (1997) GEN1a 8 0.46 0.16 37.5 1.33

Dalton (1994) 29 1.76 0.33 41.4 1.00

*Dalton (1997) 128 5.20 0.46 46.9 1.00

Kanthamani & Broughton (1994) Series 3 40 -0.91 -0.14 20.0 1.33

Kanthamani & Broughton (1994) Series 4 65 2.01 0.25 36.9 1.33

Kanthamani et al. (1988) Series 5ab 4 0.22 0.11 5.33

Kanthamani et al. (1988) Series 5bb 10 -2.06 -0.65 5.33

Kanthamani & Broughton (1990) Series 6ab 20 -0.46 -0.10 4.67

Kanthamani & Khilji (1992) Series 6bb 40 0.52 0.08 4.33

Kanthamani & Broughton (1994) Series 7 46 0.03 0.00 26.1 2.67

Kanthamani & Broughton (1994) Series 8 50 0.03 0.00 26.0 2.00

Kanthamani & Palmer (1993) 22 -2.17 -0.46 9.1 1.67

McDonough et al. (1994) 20 1.02 0.23 30.0 2.67

Morris et al. (1993) Cunningham Study 32 1.78 0.31 40.6 1.00

Morris et al. (1993) McAlpine Study 32 -0.17 -0.03 25.0 2.00

Morris et al. (1995) 97 1.67 0.17 33.0 1.67

* Parker et al. (1997) Study 1c 30 -0.83 -0.15 20.0 2.67

* Parker et al. (1997) Study 2c 30 1.25 0.23 36.7 1.33

* Parker et al. (1997) Study 3c 30 1.25 0.23 36.7 1.33

Parker & Westerlund (1998) Study 4 30 2.40 0.44 46.7 1.33

Parker & Westerlund (1998) Study 5 30 1.25 0.23 36.7 1.33

Parker & Westerlund (1998) Serial Study 30 -0.49 -0.09 4.67

Stanford & Frank (1991) 58 -1.24 -0.16 2.33

* Symmons & Morris (1997) 51 2.97 0.42 45.1 4.00

* Wezelman et al. (1997) 32 2.15 0.38 43.8 3.33

Williams et al. (1994) 42 -2.30 -0.35 11.9 2.67

Willin (1996a) 100 -0.33 -0.03 24.0 6.67

Willin (1996b) 16 -0.24 -0.06 25.0 6.67

¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾

Note. DH = Direct Hit. Std. = Standardness. * denotes studies added to Milton and Wiseman (1999). FT1(and 2) = First Timers First (and Second) Experimental Series; EC1 = Emotionally Close First Timers Series; CLAIR1 = Clairvoyance Series; GEN1 = General Series.

a Cited as Broughton and Alexander (1996) in Milton and Wiseman (1999).

b Series summarized and numbered in Kanthamani and Broughton (1994).

c Cited as Johansson & Parker (1995) in Milton and Wiseman (1999).

 

Table 2

Stouffer z, Mean Effect Size (ES), and p-value of Heterogeneity (H) of z for Various Meta-Analytic Samples, Unweighted and Weighted by Sample Size (N)

---------------------------------------------------------------------------------------------------------

Unweighted Weighted by N

Z ES p(Hz) Z ES

Manual-All (39) 5.60 .204 3.4x10-8 4.54 .143

DH Only (28) 6.60 .263 6.2x10-5 6.46 .230

Automated (10) 2.55 .164 .787 2.16 .132

-----------------------------------------------------------------------------------------------

M/W (30) 0.70 .013 .030 1.07 .028

New (10) 3.98 .165 4.2x10-5 5.66 .230

M/W+New (40) 2.59 .051 3.6x10-6 4.10 .084

Weighted by Std. (40) 3.43 .078

Standard Only (28) 3.70 .104 2.9x10-5 4.82 .121

DH Only (34) 3.42 .084 7.6x10-6 4.57 .104

Weighted by Std. (34) 3.94 .101

Standard Only (27) 4.03 .114 6.2x10-5 5.22 .135

-------------------------------------------------------------------------------------------------------

Note: DH = Studies that reported direct hits. Std. = Standardness. Number of studies listed in parentheses after label.

Tillbaka till förstasidan