AudioREVIEW's Forum Archives - Cables & Interconnects


Archive Home >> Cables & Interconnects(1 2 3 4 ) >> What is the minimum number of trials....(22 posts)


What is the minimum number of trials....pctower
Nov 1, 2003 4:54 AM
that would have to occur in a DBT for you to accept the results as reliable and on what authority do you base your answer?
re: What is the minimum number of trials....Norm Strong
Nov 1, 2003 12:36 PM
It's all a matter of probabilities. 8 out of 10 right is enough to spur interest. The chances of getting that score or better by sheer luck is about 1 in 20.

However you should note that if 13 people take the test there is an even up probability that at least one of them will get that score--even if there's nothing there.
re: What is the minimum number of trials....slbenz
Nov 1, 2003 5:01 PM
I think you mean the number of people that are participating in a trial. The N value should be as large as possible. The N value should be a minimum of 50 and the P value should be less than or equal to .05 for a double-blind trial to be of any significance.
re: What is the minimum number of trials....RGA
Nov 1, 2003 10:10 PM
Go to your local university...pick up a first year psychology text book and see what they have to say. More trials and more people is ALWAYS better. Few trials are statistically unsound especially when make a large generalization. Remember to use a brain based field not an engineers manual...one field studies people properly, the latter does not but thinks it does. And this I fear is why there is so much debate.

For example, studies on those claiming to be psychics are up in to the 100 trial per person. Scores are not averaged.

If you score 6/10 ten times(with one error =59/100) this is as statistically significant as a score of 9/10.

If you scored 6/10 on a test with only ten trials...you would be considered a failure...but if you could do that 10 times then in fact the probability that A and B are distinguishable is high and you pass. And of course more trials is reduces errors so it has a lot more credence.

Whay do audio tests use so few? Lazy, Incompetant, or a niche being served, or merely a lack of knowledge in the brain based sciences?

You only need to ask one thing. Has your specific productin question ever been tested by you? No. Then no one can say they don't sound different...of course you can't prove that they do either. Well no kidding - the only way that's going to happen is if the person you're telling it to listens as well - they will either agree or disagree.

Many Companies will deliberately make things sound different...if the test does not bare that out...then you need a new test.
re: What is the minimum number of trials....mtrycrafts
Nov 1, 2003 11:00 PM
95% confidence level, or better is the accepted norm in statistical significance.
Unfortunately, the University of Miami paper:

http://www.music.miami.edu/programs/mue/Research/dkoya/table_of_contents.htm

the author in following the Leventhal paper has elected to accept only a 70% confidence level.
Same applies to the leventhal paper, by balancing the probability of type 1 and 2 errors, he is willing to accept much lower confidence levels, hence, not very meaningful and certainly not statistically significant data results.
One only needs to look at research and the data in them that they are all at least 95% confidence level, not just medical trials.
re: What is the minimum number of trials....pctower
Nov 2, 2003 3:39 PM
I don't think that he and Leventhal are the only ones who consider that it is important to balance type I and type II errors:

http://www.pinkmonkey.com/studyguides/subjects/stats/chap8/s0808a01.asp

Seems that the miami researcher chose to increase the power of the test by reducing the confidence level. He could have increased N instead from what I understand.

And then isn't there also the chi-square issue to consider:

http://www.pinkmonkey.com/studyguides/subjects/stats/chap8/s0808l01.asp
For someone who didn't know shit from shinola about statistics..skeptic
Nov 2, 2003 4:11 PM
...which was obvious from your original question yesterday, you are suddenly an expert.

The question has to do with the degree of confidence of the outcome. When you are 98 percent sure that a given result is not the consequence of random chance, you can have a lot of reason to believe that the control element is the actual cause of the result. When it's only 60 percent, you aren't so sure. It takes hundreds of trials to be sure. Then it has to be repeatable independently by others requiring hundreds more as a minimum. Then the engineers and the audiologists go to work to find out why. And as in the Japanese experiment with ultrasonic sounds demonstrated, they have to re-examine the method of the experiment to be certain that there aren't other variables that they overlooked. Considering that you might have to accept or challenge this type of evidence in court one day, I'm surprised you didn't already know that. Or did you think that if you flipped a coin ten times and it came up heads nine out of ten you had somehow overcome the laws of probablitiy?

Since I assume you knew all this at your most recent posing but not when you first started this thread, you must be a quick study.

I suggest you take a course in statistics. Then you will learn something new of value and you won't be so gullable as to believe half the postings you read on message boards about who said they heard what about which cable.
Give pc credit for being a knowledge seekerRichard Greene
Nov 3, 2003 10:21 AM
A perfectly designed experiment requires only one trial.

Since there is no perfection in life, the number of trials is critical.

Not only for predicting the reliability of the experimental results ... but also for considering the "wear and tear" on the test participants.

Too few trials over a short period of time and maybe some subtle component differences will be overlooked.

Too many trials and the participants may get worn out and start guessing just to get the test over with.

Fill a DBT room with people who claim wires all sound the same and in 20 minutes they'll all be itching to get out of there and do something fun, like taunt a WireNut at some web site.
I already gave him credit for being a quick study..this too?(nt)skeptic
Nov 3, 2003 11:25 AM
The real knowledge seeker is...pctower
Nov 3, 2003 2:33 PM
okiemax.

He was asking some questions over at AA, and someone mention chi square. So I did a search and came up with that pink monkey site.

That reminded me of the Leventhal work and I thought I'd raise the issue again in case we've been joined by someone well versed in statistics.
re: What is the minimum number of trials....mtrycrafts
Nov 2, 2003 11:40 PM
Yes, increasing the trials will help in equalizing the type errors.
Or choosing a higher p value. In th eMiami paper, a 10 correct response would have givena Fairness Cof of .9834, almost unity. Much closer than the .7 they choose.

Neither of the links discussed conficdence levels. Less than 95% is not accepted.
re: Are 11 trials by one listener enough?okiemax
Nov 2, 2003 11:06 PM
I assume you mean a DBT experiment were the hypothesis and null hypothesis are as follows. Hypothesis: the difference between Component A and Component B is audible. Null hypothesis: Components A and B sound the same. These two things are stated or implied in all ABX DBT studies on the subject that I have seen.

If a DBT of Components A and B has has one listener and he correctly identifies the given component 11 times in 11 trials, we can conclude there is a negligible possibility of this happening by chance, and therefore the two components are audibly different. My authority is the binomial probability table at ABX web site.

How many trials is enough to conclude that Components A and B sound the same? There is no answer, because you can never conclude that two components sound the same through hypothesis testing. Again, my authority is the ABX web site.

Of course, if the sample was fairly large (say 100 listeners, each doing 10 trials), and the results were no better than random, there might be reason to suspect two components sound about the same. At least the two pieces sounded about the same to those listeners in that test. However, what would such results mean to an audiophile who hears a difference in the same two components at home with sighted listening?
re: Are 11 trials by one listener enough?mtrycrafts
Nov 2, 2003 11:45 PM
b However, what would such results mean to an audiophile who hears a difference in the same two components at home with sighted listening?

It would be meaningless to him as he trusts his ears not making him look gullible. Nothing would convince him. A closed issue.
re: Are 11 trials by one listener enough?FLZapped
Nov 3, 2003 7:26 AM
i However, what would such results mean to an audiophile who hears a difference in the same two components at home with sighted listening?

They will come to the conclusion that they spent the hard earned cash wisely....however.

Most probably don't read these typoe of forums to know what it means; And most are probably unaware that they are affected by their own internal processing of the source anyway.

-Bruce
Question is irrelevant ... so maybe answer is too?Richard Greene
Nov 3, 2003 10:14 AM
Ladies and germs of the jury:

The results of a DBT only apply to the participants involved, and stereo system/room/recordings used.

The number of trials correlates with the probability of guessing the "right answers". I would prefer the probability of lucky guessing to be under 5% (12 right out of 16 trials would be sufficient for me).

But no matter how many trials were used, I'd want participants to spend at least one hour listening to music. And another hour the next day (or at least another hour after a long break) is even better. This analytical listening style, with repeated songs, is tiring compared with normal listening.

A SBT done at home using a borrowed component versus your own component much better simulates the typical home listening experience
(DBT's tend to involve groups of listeners). Only one assistant
is needed to swap wires ... and his services may be paid for with adult beverages. This methodology means you get to listen to your own music 100% of the time and don't have to share the sweet spot with anyone else.

I prefer having DBT participants who thought they could hear differences among the two components under sighted conditions,
or at least were not sure. A DBT is just a controlled listening test of what people think they hear hear under sighted conditions.

DBTs actually have many more "trials" (A/B/X comparisons) than one would imagine. There will almost always be more than one comparison of A, B and X before a decision is made on whether X is really A or B. I'd call each comparison a "trial". While there may be only 12 or 16 decisions made in total for a DBT, the actual number "trials" (comparisons) is likely to be many dozens over
one or two hours.

After several decades of DBT's, one can only state that differences among wires have not yet been heard under controlled listening positions ... and then taunt Golden Ears by calling them WireNuts. Not that I would EVER do that.

The most common DBT "null" result is not scientific proof of anything, so one can rarely use DBT "results" to prove anything ... but DBT "experiences" can be useful for:
(1) Understanding differences between sighted and blind audition methodologies, and
(2) Comparing actual personal hearing abilities with beliefs about one's hearing abilities

Most interesting is how often blind audition participants claim to hear component differences during sighted warm-up auditions ... and then fail to hear those differences minutes later under double-blind conditions.

This happens so consistently in blind auditions that it implies to me sighted auditions are likely to be unreliable for making audio purchase decisions.

These experiences during DBT's and SBT's are at least evidence that recommendations based on sighted auditions are likely to be different from recommendations based on blind auditions, as if we didn't already know that.

Note that blind audition "warm-ups" are sighted auditions with several important differences from the usual sighted audition:
(1) A / B volume-matching, and
(2) A / B channel balancing
(3) Inclusion of a A/B/X switchbox in the circuit,
for both A and B
(4) Switching from A to B is very quick

Both (1) and (2), done to close tolerances, should eliminate real but meaningless,audible differences among components such as CD players and amplifiers that could explain some "sound quality differences" heard in ordinary
sighted auditions w/o volume matching /channel balancing.
It is possible that (3) may obscure subtle differences among components ... but (4) should help in A/B comparisons.

Ladies and germs of the jury, before you listen to
the response from pc, be aware that the rumor that his ears were damaged during "the war", has never been proven to be true,

I rest my case.
Question is irrelevant ... so maybe answer is too?pctower
Nov 3, 2003 2:29 PM
You rested at the perfect time. As much as I might like to find something to pick apart in what you say (which I really don't), I have to, instead, thank you for the very thoughtful reply.

Well, actually I think I might come up with one or two things, but probably not in the direction you might think.

b The most common DBT "null" result is not scientific proof of anything, so one can rarely use DBT "results" to prove anything ... but DBT "experiences" can be useful for:

b (1) Understanding differences between sighted and blind audition methodologies, and

b (2) Comparing actual personal hearing abilities with beliefs about one's hearing abilities

I'm not sure I entirely get your point here. While the test relates only to specific cables and components and specific listeners, each reliable test result adds to our body of knowledge, and the fact that each test produces a null result can't be ignored. A certain amount of extrapolation from that pattern to me seems reasonable.

But you really said as much:

b After several decades of DBT's, one can only state that differences among wires have not yet been heard under controlled listening positions ... and then taunt Golden Ears by calling them WireNuts. Not that I would EVER do that.

You also said:

b This happens so consistently in blind auditions that it implies to me sighted auditions are likely to be unreliable for making audio purchase decisions.

Mtrycrafts has convinced me that sighted auditions are by definition per se unreliable. That is no longer an issue with me and I simply find attempts people make to "logically" defend the reliablity of sighted tests to be plain silly. My interest now is in determining what are reasonable parameters, including numbers of trials and statistical analysis to be applied, in conducting blind tests.

Who knows, one of these days, I might finally get around to conducting one or two myself.

Hey BILLIAM - are you still around?
Question is irrelevant ... so maybe answer is too?Richard Greene
Nov 3, 2003 5:07 PM
I wrote:
"The most common DBT "null" result is not scientific proof of anything, so one can rarely use DBT "results" to prove anything ... but DBT "experiences" can be useful for:

(1) Understanding differences between sighted and blind audition methodologies, and

(2) Comparing actual personal hearing abilities with beliefs about one's hearing abilities"

You wrote:
"I'm not sure I entirely get your point here. While the test relates only to specific cables and components and specific listeners, each reliable test result adds to our body of knowledge, and the fact that each test produces a null result can't be ignored. A certain amount of extrapolation from that pattern to me seems reasonable."

My Reply:
In real life people almost always make decisions from insufficient evidence, partial data and extrapolated data.

Unfortunately null results are not scientific proof of anything, if one requires scientific proof for making decisions.

Also the number of people who have reported results from double-blind tests is too small to accurately represent the general population.

The evidence so far is sighted auditions far more often lead to claims of audible differences compared with blind auditions.

It's really possible the statistics are not relevant.

After ten minutes of a blind test, one often realizes that differences between the two components are not likely to be audible during casual listening ... or will be too subtle to make much of a difference to the sound quality of your system.

Listening for another hour or two and reviewing your scorecard results usually doesn't contradict the initial impressions typically formed during the first ten minutes: "Playing at exactly the same volume, these two components sound a lot more alike than I expected".
May be, and may be not.Tony_Montana
Nov 3, 2003 8:33 PM
Hey PC

You wrote:"Mtrycrafts has convinced me that sighted auditions are by definition per se unreliable."

That is not entirely true. If the switching between tested products are instananious, then it doesn't matter if it is sighted or blind. The differences (or lack of) will be obvious :)

Honestly, I don't trust sighted or blind tests where memory is a factor. That [human] factor have to be taken out if we want consistent results, especially when differences are subtle :)
May be, and may be not.mtrycrafts
Nov 3, 2003 9:39 PM
b If the switching between tested products are instananious, then it doesn't matter if it is sighted or blind.

Yes it does matter. While you may get confused which one you are listening to, in a sighted test you are allowed to peek, to know which component you are listening to regardless how fast you switch. You have not accounted for bias by switching fast.

b Honestly, I don't trust sighted or blind tests where memory is a factor.

Unfortunately you cannot take away the memory factor. It is all about the memory. The shorter the better chance to do well.
Now, if you compare one note in a loop, then maybe you are right, maybe.

b That [human] factor have to be taken out if we want consistent results, especially when differences are subtle :)

Impossible to take out memory or bias. You control for bias, not memory.
Yes, memory factor is a problem.Tony_Montana
Nov 4, 2003 9:26 PM
And as you said, the shorter the better.

But I disagree with you assessment of sighted test [instantaneous] being biased because we know which component we are listening to. Even if we are biased toward a product, instantaneous switching will reveal its weak or strong points. That is why we choose that testing method in the first place :)
Yes, memory factor is a problem.mtrycrafts
Nov 5, 2003 11:13 PM
b Even if we are biased toward a product, instantaneous switching will reveal its weak or strong points. That is why we choose that testing method in the first place :)

No. Quick switching is used as it is more sensitive to differentiate, shorter the memory, nothing more. It will not minimize bias. Impossible. Sighted testing is unreliable. Period. Incontestable. No research outfit worthy will use it, nor speaker company. JBL, Many Canadian speaker makers, all used DBT only for reliable testing.
Question is irrelevant ... so maybe answer is too?mtrycrafts
Nov 3, 2003 9:33 PM
b Mtrycrafts has convinced me

And people still claim I don't post anything useful :)You are a hard cookie to convince, as it should be :)

b My interest now is in determining what are reasonable parameters, including numbers of trials and statistical analysis to be applied, in conducting blind tests.
Who knows, one of these days, I might finally get around to conducting one or two myself.

Then, why don't you try for 20 trials, decisions as RG would say:) See how well you do:)

b Hey BILLIAM - are you still around?

Post it in a new message :)
 


Archive Home >> Cables & Interconnects(1 2 3 4 ) >> What is the minimum number of trials....(22 posts)
 MtbREVIEW.com  RoadbikeREVIEW.com  OutdoorREVIEW.com
 PhotographyREVIEW.com  VideogameREVIEW.com  ComputingREVIEW.com
 AudioREVIEW.com  CarREVIEW.com  GolfREVIEW.com
Copyright ©1996-2008 All Rights Reserved.ConsumerREVIEW.com, a division of E-centives, Inc.