PDA

View Full Version : Inter-arrival times


sdf123
April 18th, 2007, 12:18 PM
I have a sample data of inter-arrival times of people arriving at a store. I found the distribution to be weibull with shape parameter < 1.

Can anyone tell me how to explain why this distribution is weibull?

How do I explain this phenomenon?

What is meant by long tail distribution and heavy tail distribution? Which distribution models fit these?

Arai.M
April 18th, 2007, 04:37 PM
Traditionally inter-arrival times of this kind are modeled with an exponential distribution that translates into a Homogenous Poisson Process for the arrivals. Mathematically, a 2 parameter model (e.g. Weibull) will fit better than a 1 parameter model (e.g Exponential) and that might be the phenomenon you are experiencing. As a first step, I would check if confidence bounds around your shape parameter include 1, which would tell you whether an exponential distribution is a good fit (other goodness of fit tests might be preferable http://www.weibull.com/hotwire/issue71/relbasics71.htm).
Hope this helps,
Arai

sdf123
April 18th, 2007, 06:13 PM
Thanks Arai. Your response was very useful and I agree that inter-arrivals usually follow an exponential distribution. Definitely, the more the parameters the easier is the fit.

Here's my problem: I have a huge sample data and hence have binned them. When I try to fit the full (binned) data it doesn't fit any distribution (i.e., does not pass a goodness of fit test). When I truncate some of the data, then it fits both the Weibull and exponential distributions. I am using the K-S test and visually I see that the data fits Weibull better (by looking at the P-P plot and the CDF). Also, the K-S statistic for the Weibull distribution is lower than that for the exponential distribution (which indicates statistically that Weibull is a better fit).

So, suppose we consider that the Weibull is a good fit, is there an explanation to show why?

Thanks for your help.

Arai.M
April 19th, 2007, 09:37 AM
Binning is a tricky thing, especially if you have complete data, I would much rather use that. How many data points are we talking about? Also, what do you mean by truncating?

Even though statistically, the Weibull distribution might fit better, I would still go with an exponential distribution if it is a good fit.

If you believe you have a nonhomogeneous process then the way of treating your data is different, not quite by fitting a weibull distribution to your data but by allowing the arrival rate at time t to be a function of t as opposed to a constant, like in the exponential case (i.e. nonhomogeneous poisson process). You might want to bin your data but in this case by arrival times (e.g. people arriving from 8 to 9, 9 to 10, etc.), find lambda for those intervals and fitting a function to those lambdas. A special case of this is when you assume the first failure is governed by a Weibull distribution and the rest by a power law model http://www.weibull.com/RelGrowthWeb/Crow-AMSAA_(N.H.P.P.).htm. Keep in mind this model results in a monotonically increasing or decreasing failure intensity (i.e. people arriving at the store faster and faster or slower and slower with time not a combination). So if you are looking a for a general model then the first one provided might be a better choice.
-A

sdf123
April 19th, 2007, 04:49 PM
Thanks for the suggestion.

By truncated I meant that I am removing some of the bins to fit the data. Basically those bins that are not allowing the data to be fitted to a distribution.

For example, suppose there are 4 bins, and if I remove the first bin and then try to fit the data.