Jim Vallem
October 12th, 2007, 11:40 AM
This post is to help clarify the terminology of "confidence" in the context of reliability requirements and results. It is a result of seeing a lot of questions/statements about confidence that are not quite right, such as "50% is no confidence?" and "Someone told me that 50% confidence level means nothing. Is that right?" and "it's just a coin toss" and (from one of the moderators) "There isn't really a 50% confidence band."
None of these is correct. After a few words on the terminology, I will add a brief discussion about using 50% confidence (and why it's okay).
________
My experience with this subject is that a lot of confusion exists around the terminology. First of all, there is subjective confidence ("I'm very confident the new design will meet the reliability requirements"), which is NOT what we are dealing with. We're talking about objective, statistical confidence, as it relates to estimates of some value (say, reliability at some point in time), based on testing a group of parts (the entire group is a "sample") from the entire population of parts. If we tested the entire population, we would know the true reliability, and would not need to use statistical confidence at all. So confidence is really a sampling issue (parts don't have confidence!). If we were to test a different sample, we would get a different estimate of the true (but unknown) population value, as well as different confidence lines.
There are upper and lower confidence bounds, each of which defines a one-sided interval. You would expect the true (but unknown) population value to be above a 75% lower bound about 75% of the time if you kept repeating your testing plan with a new sample each time. This is an example of a ONE-sided interval.
A TWO-sided interval (or band) falls between a lower and an upper bound. You can make a 50% two-sided interval as the values between the 75% lower (one-sided) bound and the 75% upper (one-sided) bound. There is a 75% chance of being above the lower bound, a 75% change of being below the upper bound, and a 50% chance of being between them -- thus, a 50% two-sided interval. This is NOT the same as a 50% lower bound, which is also the 50% upper bound, as well as your "best" estimate of the true (but still unknown) population value.
In the reliability arena, we don't much care if the population reliability is greater than our estimate, but we do worry about the possibility of the population reliability being less than the estimate. That's why with reliability, we generally talk about lower bounds only, that is, one-sided lower intervals. We don't often state that specifically, which is perhaps part of the reason for confusion in this area.
Incidentally, if we are using the success-run formula to determine a sample size for success testing, the confidence value used is for a lower bound on the reliability being demonstrated.
________________
Now, why is it okay to use 50% confidence?
Using a higher confidence level than 50% means that you are purposely under-stating your estimate of the true (unknown) reliability. You are in effect causing people to demonstrate a higher reliability than was stated. If the reliability requirement is 95% at "one test life" demonstrated with 50% (lower) confidence, the designers know they have to shoot for 95%. If the requirement is 95% with C=90%, the designers think they have to achieve R=95% but actually have to hit about 99.5% (the equivalent R with C=50% -- both R95C90 and R99.5C50 give about the same sample size in the success-run formula -- we use "n" instead of "n-1" when we use this formula, a preference of our reliability engineers). Wouldn't it be nice if the designers knew that they had to design to R=99.5% rather than the stated 95%?
Using a 50% value means you are reporting your best estimate of the results you have demonstrated. If the true reliability is right at the requirement, it is true that you have about a 50-50 chance of either demonstrating that the requirement was met, or not. However, if the true (unknown) value is better than the requirement, you have a greater than 50% chance of "passing" the test. But if the true value is worse, you have less than a 50% chance of passing. Either way, and regardless of the confidence you use, there is the chance of being wrong. Using a 50% confidence balances the risk of being wrong on the high side and wrong on the low side. You don't want field failures, but you don't want to overdesign either.
Over the last several years, my company (a domestic auto company) has converted all of our reliability requirements to using 50% lower confidence bound. If the previous requirement had a higher confidence, we increased the reliability number as we dropped the confidence, to get an equivalent requirement. We also ensure that our tests are based on very high severity customer usage and environmental factors (most people are surprised by how high), and have a high reliability required under those conditions. Both of these greatly reduce the risk of missing a problem and having it show up in the field. We would rather use high numbers for severity and reliability, which relate to the parts and test conditions, than to confidence. Remember, parts don't have confidence -- that is only an aspect of the sampling plan. We also typically specify a minimum number of parts to failure, if doing a test-to-failure validation plan. This is to minimize the greater uncertainty in the results caused by very small sample sizes, which is evidenced by any confidence bounds (such as 70% 2-sided) you might show on the plot.
None of these is correct. After a few words on the terminology, I will add a brief discussion about using 50% confidence (and why it's okay).
________
My experience with this subject is that a lot of confusion exists around the terminology. First of all, there is subjective confidence ("I'm very confident the new design will meet the reliability requirements"), which is NOT what we are dealing with. We're talking about objective, statistical confidence, as it relates to estimates of some value (say, reliability at some point in time), based on testing a group of parts (the entire group is a "sample") from the entire population of parts. If we tested the entire population, we would know the true reliability, and would not need to use statistical confidence at all. So confidence is really a sampling issue (parts don't have confidence!). If we were to test a different sample, we would get a different estimate of the true (but unknown) population value, as well as different confidence lines.
There are upper and lower confidence bounds, each of which defines a one-sided interval. You would expect the true (but unknown) population value to be above a 75% lower bound about 75% of the time if you kept repeating your testing plan with a new sample each time. This is an example of a ONE-sided interval.
A TWO-sided interval (or band) falls between a lower and an upper bound. You can make a 50% two-sided interval as the values between the 75% lower (one-sided) bound and the 75% upper (one-sided) bound. There is a 75% chance of being above the lower bound, a 75% change of being below the upper bound, and a 50% chance of being between them -- thus, a 50% two-sided interval. This is NOT the same as a 50% lower bound, which is also the 50% upper bound, as well as your "best" estimate of the true (but still unknown) population value.
In the reliability arena, we don't much care if the population reliability is greater than our estimate, but we do worry about the possibility of the population reliability being less than the estimate. That's why with reliability, we generally talk about lower bounds only, that is, one-sided lower intervals. We don't often state that specifically, which is perhaps part of the reason for confusion in this area.
Incidentally, if we are using the success-run formula to determine a sample size for success testing, the confidence value used is for a lower bound on the reliability being demonstrated.
________________
Now, why is it okay to use 50% confidence?
Using a higher confidence level than 50% means that you are purposely under-stating your estimate of the true (unknown) reliability. You are in effect causing people to demonstrate a higher reliability than was stated. If the reliability requirement is 95% at "one test life" demonstrated with 50% (lower) confidence, the designers know they have to shoot for 95%. If the requirement is 95% with C=90%, the designers think they have to achieve R=95% but actually have to hit about 99.5% (the equivalent R with C=50% -- both R95C90 and R99.5C50 give about the same sample size in the success-run formula -- we use "n" instead of "n-1" when we use this formula, a preference of our reliability engineers). Wouldn't it be nice if the designers knew that they had to design to R=99.5% rather than the stated 95%?
Using a 50% value means you are reporting your best estimate of the results you have demonstrated. If the true reliability is right at the requirement, it is true that you have about a 50-50 chance of either demonstrating that the requirement was met, or not. However, if the true (unknown) value is better than the requirement, you have a greater than 50% chance of "passing" the test. But if the true value is worse, you have less than a 50% chance of passing. Either way, and regardless of the confidence you use, there is the chance of being wrong. Using a 50% confidence balances the risk of being wrong on the high side and wrong on the low side. You don't want field failures, but you don't want to overdesign either.
Over the last several years, my company (a domestic auto company) has converted all of our reliability requirements to using 50% lower confidence bound. If the previous requirement had a higher confidence, we increased the reliability number as we dropped the confidence, to get an equivalent requirement. We also ensure that our tests are based on very high severity customer usage and environmental factors (most people are surprised by how high), and have a high reliability required under those conditions. Both of these greatly reduce the risk of missing a problem and having it show up in the field. We would rather use high numbers for severity and reliability, which relate to the parts and test conditions, than to confidence. Remember, parts don't have confidence -- that is only an aspect of the sampling plan. We also typically specify a minimum number of parts to failure, if doing a test-to-failure validation plan. This is to minimize the greater uncertainty in the results caused by very small sample sizes, which is evidenced by any confidence bounds (such as 70% 2-sided) you might show on the plot.