There is a difference between being right and being convincing.
Conducting the test properly, you want to get the claimants to demonstrate their special power in the experimental setup to their own satisfaction. In the dowsing example, there were six rows of six buckets. In each row, one bucket had water and the others sand. Assistants placed the water in a random bucket and left the tent, the experimenter lead the claimants through each row and, in their own time, attempted to find the water. Thus it is double-blind and we expect roughly 1 in 6 of the attempts (across all dowsers - some won't get any and a few will get two.) to find the water. To demonstrate their powers, a dowsers must perform significantly over the odds ... considering not a single dowser claims less than 100% success rate, this should be easy.
Claimants consistently tested at odds, but this did not phase them. Each usually cited something about the presence of sceptics, nervousness, or that the setup was somehow throwing the detection off. One claimant protested that the water being above the ground and having to walk around the buckets was a problem.
The way to counter this is to do two tests - in the first you have to establish that nothing about the setup is having an adverse effect. In this case, in the first test, show the claimants where the water is and get them to verify that they can dowse for it successfully. (Our null hypothesis is usually that the dowsers somehow read clues as to where the water is, so we expect 100% detection rate this way.) This should be quite a quick test. Get the claimant on the record accepting the test setup for the double blind. Then conduct the double-blind version.
This can be presented in a freindly way, "just to make sure that your ability works in here". Between tests they need to be encouraged to express
We want to be careful not to test the double blind version as the "real" test, so as to avoid the possibility that the situation is psychically different or that there is added pressure. Even add the possibility that there is a third test upon "success" in the second, which is the "real" test. Once the double-blind results are in, repeat the first test to completely demonstrate that the conditions have not changed.
What we expect is that the dowser will have a 100% hit rate on the first test, chance on the second one (a fail), and another 100% on the third. Though a clever charlatan may make sure they have only odds on the third experiment, we don't normally see this. This is also why we do not want to actually endorse people who pass - but, instead, to move to a more detailed study. This is more difficult and usually means we have to catch the faker in the act.
Strictly speaking, performing at odds in the double-blind does not mean that the claimant is not doing anything supernatural, just that whatever they are doing is not distinguishable from guessing. Even with the above rigour, we still expect protests and anecdotal evidence to show that the experiment had to be wrong ... after all, the claimants are usually honest people who have become convinced of their powers over time. However, the protests will be all the weaker as a result of the rigour.
When observing tests conducted by others we need to keep our brains engaged. For example:
Movie: Beyond a Reasonable Doubt
Bar Doubt LLC 2008
Jesse Metcalfe, Amber Tamblyn, and Michael Douglas
Features a journalist (Metcalfe) conducting a test involving coffee tasting.
Members of the public are invited to blind-taste coffee from three urns. One instant, one canned, and one expensive brew. We are told that 57% of those tested reported that one of the two cheaper coffees was the better tasting. The journalist then announces that this means that the expensive brew is not all it's cracked up to be.
There are two things wrong with this test - the method is suspect because the test subjects are self selecting from a limited population (people passing on the street volunteer) and it is single-blind (the journalist stands over the subject). But more seriously, the results do not support the conclusion - one of the most damning critiques available to science.
If there was no difference between coffees then we'd expect the selection to be random. Two out of three choices are the cheap ones so we'd expect cheap brands to be chosen as best about 67% of the time. 57% is much less than that, suggesting that there may be something to this expensive coffee thing.
However, this does not reflect poorly on the logic of the film. It turns out that everything Metcalfe's character is invloved with is faked in some important way. This test then becomes a foreshadowing of later revelations. It actually strengthens the film. Unfortunately, few of the audience pick up on this and some come away with the impression that the concusions were correct even though they went in knowing they were about to watch a bit of fiction.
For movies it is reasonable to assume that everything you see is false, and I do mean everything. "Based on a true story" means "this is fiction" and the portrayal should not be given any more weight than something which is self-professed as entirely fictional. Documentaries are similar. By convention what you see should have actually happened in front of the camera but the film-maker can play fast and loose with what is filmed. Treat it as if the film-maker has cherry-picked what to show you in order to make a point or tell a story. After all, that is what has actually happened.
Dawkins shows are like this. In Root of all Evil he cherry picks footage to show religious people as raving lunatics. Of course that is the point of his documentaries: that behind the smiles and the good works of religious people lies personal convictions which are flawed and dangerous. Thus he has selected his footage to highlight that.
Such things can be useful for promoting discussion, and the odd bar fight, but cannot be expected to be convincing.