PDA

View Full Version : Confidence - Terminology and Selection


Jim Vallem
October 12th, 2007, 11:40 AM
This post is to help clarify the terminology of "confidence" in the context of reliability requirements and results. It is a result of seeing a lot of questions/statements about confidence that are not quite right, such as "50% is no confidence?" and "Someone told me that 50% confidence level means nothing. Is that right?" and "it's just a coin toss" and (from one of the moderators) "There isn't really a 50% confidence band."

None of these is correct. After a few words on the terminology, I will add a brief discussion about using 50% confidence (and why it's okay).
________

My experience with this subject is that a lot of confusion exists around the terminology. First of all, there is subjective confidence ("I'm very confident the new design will meet the reliability requirements"), which is NOT what we are dealing with. We're talking about objective, statistical confidence, as it relates to estimates of some value (say, reliability at some point in time), based on testing a group of parts (the entire group is a "sample") from the entire population of parts. If we tested the entire population, we would know the true reliability, and would not need to use statistical confidence at all. So confidence is really a sampling issue (parts don't have confidence!). If we were to test a different sample, we would get a different estimate of the true (but unknown) population value, as well as different confidence lines.

There are upper and lower confidence bounds, each of which defines a one-sided interval. You would expect the true (but unknown) population value to be above a 75% lower bound about 75% of the time if you kept repeating your testing plan with a new sample each time. This is an example of a ONE-sided interval.

A TWO-sided interval (or band) falls between a lower and an upper bound. You can make a 50% two-sided interval as the values between the 75% lower (one-sided) bound and the 75% upper (one-sided) bound. There is a 75% chance of being above the lower bound, a 75% change of being below the upper bound, and a 50% chance of being between them -- thus, a 50% two-sided interval. This is NOT the same as a 50% lower bound, which is also the 50% upper bound, as well as your "best" estimate of the true (but still unknown) population value.

In the reliability arena, we don't much care if the population reliability is greater than our estimate, but we do worry about the possibility of the population reliability being less than the estimate. That's why with reliability, we generally talk about lower bounds only, that is, one-sided lower intervals. We don't often state that specifically, which is perhaps part of the reason for confusion in this area.

Incidentally, if we are using the success-run formula to determine a sample size for success testing, the confidence value used is for a lower bound on the reliability being demonstrated.
________________

Now, why is it okay to use 50% confidence?

Using a higher confidence level than 50% means that you are purposely under-stating your estimate of the true (unknown) reliability. You are in effect causing people to demonstrate a higher reliability than was stated. If the reliability requirement is 95% at "one test life" demonstrated with 50% (lower) confidence, the designers know they have to shoot for 95%. If the requirement is 95% with C=90%, the designers think they have to achieve R=95% but actually have to hit about 99.5% (the equivalent R with C=50% -- both R95C90 and R99.5C50 give about the same sample size in the success-run formula -- we use "n" instead of "n-1" when we use this formula, a preference of our reliability engineers). Wouldn't it be nice if the designers knew that they had to design to R=99.5% rather than the stated 95%?

Using a 50% value means you are reporting your best estimate of the results you have demonstrated. If the true reliability is right at the requirement, it is true that you have about a 50-50 chance of either demonstrating that the requirement was met, or not. However, if the true (unknown) value is better than the requirement, you have a greater than 50% chance of "passing" the test. But if the true value is worse, you have less than a 50% chance of passing. Either way, and regardless of the confidence you use, there is the chance of being wrong. Using a 50% confidence balances the risk of being wrong on the high side and wrong on the low side. You don't want field failures, but you don't want to overdesign either.

Over the last several years, my company (a domestic auto company) has converted all of our reliability requirements to using 50% lower confidence bound. If the previous requirement had a higher confidence, we increased the reliability number as we dropped the confidence, to get an equivalent requirement. We also ensure that our tests are based on very high severity customer usage and environmental factors (most people are surprised by how high), and have a high reliability required under those conditions. Both of these greatly reduce the risk of missing a problem and having it show up in the field. We would rather use high numbers for severity and reliability, which relate to the parts and test conditions, than to confidence. Remember, parts don't have confidence -- that is only an aspect of the sampling plan. We also typically specify a minimum number of parts to failure, if doing a test-to-failure validation plan. This is to minimize the greater uncertainty in the results caused by very small sample sizes, which is evidenced by any confidence bounds (such as 70% 2-sided) you might show on the plot.

Arai.M
October 12th, 2007, 05:37 PM
The terminology is right on target and a good reminder to keep terms straight as well as report things clearly to avoid confusion. A few words on the 50% confidence interval issue.

I will limit the following discussion to 50% one-sided lower confidence bounds. In this case, you are stating that you have a 50% chance of being above the lower bound and a 50% chance of being below the lower bound. So probabilistically it is a toss up on whether your true value is lower or higher than the estimated value.

In terms of reporting a product reliability, is there any added benefit of confidence bounds at 50% since the estimated value is equal to the lower confidence bound? As an analyst I get no information as to what the quality of the data is (e.g. You could have tested 2 units or 100 units and obtained the same results). If I am comparing 2 products and they both report their 50% 1 sided lower CB reliability at the time of interest, how can I possibly make a choice unless they both tested the same number of units (and all to failure for example)? So you bring a very good point: Confidence is associated with data not with the part. Confidence bounds give you information as to the quality of your data (number of units, censoring scheme, etc)

So let me do an additional distiction and limit the discussion to exclusively demonstrating a reliability as opposed to life data analysis or reporting a product reliability.

You bring here again a good point. Proving a reliability at 50% CL is equivalent to proving a lower reliability with a higher confidence level (with the same number of units and failures). As an analyst I want the confidence level to be associated with the risk I am willing to take of being wrong (my unknown true reliablity not being in the range stated). However, another good point is that a designer does not necessarily fully understand the role of confidence levels (probabilities around probabilities? who came up with that?). One way of dealing with this is what your company and other automotive companies have done in demonstration tests, take the confidence out of the statement (ok, not quite but in practical terms) so that what you see is what you want. Keep in mind however that if the units tested and number of failures are not fixed, there is not a 1 to 1 equivalency, there is an infinite number of solutions. The equivalency you show R95C90 = R99.5C50 is only valid with a specific sample size and allowed failures (http://www.weibull.com/hotwire/issue24/relbasics24.htm (http://www.weibull.com/hotwire/issue24/relbasics24.htm)) …and that is assuming you used a non-parametric approach because otherwise it is also then dependent on the distribution and parameters you chose to assume.

So to follow up in your discussion, nothing wrong with 1-sided lower 50% CL in demonstration tests as long as somebody in the background fully understands what that means and how to use it.

Thanks for bringing up such good discussion points!

Jim Vallem
December 3rd, 2007, 04:06 PM
I should have been more clear on when we use 50% Confidence. It is only in the context of validating a product's reliability against its requirement. If we are comparing two designs, or two manufacturing processes, for example, we would use 2-sided intervals to assist in the comparison.

I had to read your 2nd paragraph several times to decide if it was misleading. Sentence 2 maybe is, but the last sentence brings it back to reality. My take: It is true that the true (unknown) reliability has a 50/50 chance of being above the 50%C estimate line (assuming test-to-failure and fitting a Weibull line). However, if instead we are using the success-run formula to develop a success test plan, and the parts "pass" the requirement, we find that this is actually conservative, since we don't know how much further than the bogey the parts could have gone. (We do however, greatly prefer a test-to-failure plan.)

I think people could be confused by your statement "In this case, you are stating that you have a 50% chance of being above the lower bound and a 50% chance of being below the lower bound." (You meant upper here at the end, but that is not what I think is potentially confusing.)

Allow me to reword that a bit: "In this case, you are stating that the true reliability has a 50% chance of being above the lower bound and a 50% chance of being below the upper bound." That is OK. However, it is NOT true that you have a 50/50 chance of the product being worse (or better) than the REQUIREMENT. If the product's true (unknown) reliability is better than the requirement, there is a greater than 50% chance of the estimate being above the requirement, i.e., a better than 50% chance of being validated. Similarly, if the true reliability is worse than the requirement, the chance of passing is less than 50%.

You are quite correct that it is important to know how many parts were tested, and how many of those failed. We do ask for the test data, so we can assess the adequacy of the testing. Actually, we try to specify a minimum sample size up front, and we review/approve the testing plan ahead of time.

Is it necessary to state a confidence if you are doing testing to failure, and basing the decision on the best fit Weibull line? Not if you state that the best-fit line is the one to be used. But the alternative is to state the 50% confidence level, and that can be used for success-test planning as well [n = ln(1-C) / ln(R)].

I'm attaching a file illustrating 2 designs being compared. One was purposely made twice as strong as the other. The data was made up, but analyzed with Weibull++ and exported. See what happens when testing for both is suspended at the same point, with 50% vs. 90% lower bounds. The lesson is that you really do need to see the test data, not just rely on someone telling you the reliability estimate.

vnigam
December 18th, 2007, 08:07 AM
"Using a higher confidence level than 50% means that you are purposely under-stating your estimate of the true (unknown) reliability. You are in effect causing people to demonstrate a higher reliability than was stated. If the reliability requirement is 95% at "one test life" demonstrated with 50% (lower) confidence, the designers know they have to shoot for 95%. If the requirement is 95% with C=90%, the designers think they have to achieve R=95% but actually have to hit about 99.5% (the equivalent R with C=50% -- both R95C90 and R99.5C50 give about the same sample size in the success-run formula -- we use "n" instead of "n-1" when we use this formula, a preference of our reliability engineers). Wouldn't it be nice if the designers knew that they had to design to R=99.5% rather than the stated 95%?"

I don't understand why using a higher CI level than 50% will understate the estimate of the true unknown relaibility. Could you please explain? Also based the explaination that follows, how the target reliability changes with the CI?

Thanks for your time.

VN

vnigam
December 18th, 2007, 10:45 AM
Also, can someone help me understand what it means when we say R95C90. Does it say 95% reliability at 90% lower confidence bound or does it say min reliability of 95% with a two sided 90% CB?

Thanks!:confused:

Arai.M
December 18th, 2007, 10:51 AM
Just to make sure we are all in the same page, I’ll pick the context of life data analysis (as opposed to designing a test that will prove a certain reliability goal).
Choosing a higher confidence level does not underestimate the true reliability. A confidence bound is not an estimate of the true reliability; it is the bound for a range of possible values of the true reliability given the quality of the data. A 90% confidence interval on the reliability says that with a 90% probability, the true reliability will lie on that range. The confidence level is a measure of risk, the risk of being wrong. So if by "purposely under-stating your estimate of the true (unknown) reliability" you mean giving a worse case scenario associated with a certain risk then we are discussing semantics.

Jim, in your file you show an example of 2 products, one with 15 failures (product A) and one with 5 failures and 10 suspensions (product B). As a side note, it is missing important information (e.g. estimation method, confidence intervals method). You also have a note that says "Every point in “B” lasts 2 times longer than in “A,” making “B” better." From the plot that is not what I am seeing. I actually see the five failures of B occurring within the range of all of the failures of A. In other words, the latest failure of B is smaller than the latest failure of A. Now assuming that all the suspensions for B are at 3 lives, you do intuitively assume that B is better since 2/3 of the failures will occur after all A parts have failed.

But back to the point that you are trying to make. It shows that the confidence bounds are larger in the case of product B. And therefore the "wrong conclusion" would be reached since B is "known" to be better. In hypothesis testing (which is conceivably what you are doing when you use bounds to compare 2 designs, the conclusion is that there is not enough evidence to prove that the 2 designs are statistically different. It doesn't say that B is worse. It just says that you cannot prove B is better...statistically. If you chose to go with intuition, by all means pick B, but keep in mind that with 50% 1 sided CB (in other words, using only expected values) you are not doing a statistical comparison.

VM, I hope that answers the first part of your question. For the second part, I will go back to designing a test to prove a reliability goal. For more info refer to http://www.weibull.com/hotwire/issue24/relbasics24.htm (http://www.weibull.com/hotwire/issue24/relbasics24.htm). Using the cumulative binomial distribution you have an equation that relates the reliability to the CI, sample size and number of allowed failures. If you set the number of failures to 0, you obtain the equation: 1-CL = R^n (or n = ln(1-CL) / ln(R) as shown above by Jim). If you keep the sample size constant (again number of allowed failures is 0), then you have an equation with 2 unknowns: the CL and the reliability. In other words, if you test a fixed sample size (and allow zero failures) you are proving an infinite pair of CI and reliabilities. For example if you have 5 samples available and test them for 1000 hours without failure, I am proving 87% Reliability at a 1000 hrs with 50% CL ([1-0.5]^[1/5]). However, you are also proving 63% reliability at a 90% CL.

Hope this helps.

Arai.M
December 18th, 2007, 10:53 AM
In the context we were discussing, it meant Reliability of 95% at a lower 1 sided 90% confidence bound.

vnigam
December 18th, 2007, 11:20 AM
Thanks a lot Arai!

I will just repeat what I just understood.

Say, I have a requirement of R95C90. I do test on 30 parts and the time to failure is fitted to a distribution. I see the probability plot Vs time and a lower confidence band at 90% CI. If I read the reliability ( 1-Unreliability) at the time in question, using the lower confidence band, and it is equal to or greater than 95% then I have met my requirement. And the probability of meeting the requirement of R95C90, for the population from where I picked 30 parts sample, is 0.9. Did I understand right?

VN

Arai.M
December 18th, 2007, 11:38 AM
Yes, almost...the probability of meeting the requirement is 0.9 if the lower one sided confidence bound equals the target. If the confidence bound is greater than the goal, you have a higher than 90% chance. You could increase the CL until you reach that critical CL at which you still meet requirements for example.

vnigam
December 18th, 2007, 11:49 AM
Thank you.:)

Jim Vallem
February 13th, 2008, 02:55 PM
Quoting from the last paragraph of Arai's Dec 18, 12:51PM Post:
"In other words, if you test a fixed sample size (and allow zero failures) you are proving an infinite pair of CI and reliabilities. For example if you have 5 samples available and test them for 1000 hours without failure, I am proving 87% Reliability at a 1000 hrs with 50% CL ([1-0.5]^[1/5]). However, you are also proving 63% reliability at a 90% CL."

This is what I was referring to when I indicated that if you use a confidence higher than 50%, you need to back off on what you are claiming for the demonstrated reliability. You can claim a higher confidence, but only of meeting a lower reliability number. The same holds true with testing to failure and looking at the estimate line and the 90%C line (or any other confidence >50%). In another example, I am 50% confident that the average height of the adults in this state is at least 5'5", but I might be 70% confident that it exceeds 5'2". I need to under-state my "real" height estimate (5'5") if I want to claim higher confidence.

So using the numbers at the top, should we tell our designers they need to achieve 87% reliability, or 63%? Obviously, 87%, because the 63% number is an under-statement.

vnigam
March 24th, 2008, 10:57 AM
Since the equation we are referring to has two unknowns R and C and has infinite number of solutions ( no of samples as constant),in order to reduce uncertainty in the parameters or the reliability estimates the sample size should be increased before making costly changes in the design, based on a batch of tests where there is adoubt that the reliability is understated. Any Thoughts????

Arai.M
March 26th, 2008, 10:00 AM
Yes, what you are saying makes sense.

However, I think that out of all of the infinite number of solutions, the one with 50% CL is the one Jim feels most confortable with, which is fine as long as the reliablity is reported along with confidence intervals (50%, and any other assumptions for that matter).

Let me give an example where I feel like a 50% CL level might be missused. Let say that at the warranty time, my company does not want to pay returns for more than 2% of the units. What you just got there is a reliablity target:
R(T=warranty) = 98%
The problem is that there is no confidence level associated with this. If I select 50% CL, then I would be testing 35 units (assuming no failures will be allowed in the test and that we are testing all the way to the warranty time). However, I have a 50% probability of exceeding the 2% target. If I wanted to be very sure that I am testing enough units, let say I select 99% CL (I have a 1% chance of being wrong). Now I need 228 units to test.
So in this case, 35 units can be put to test and meet the target but you taking a lot of risk.