As we continue our journey in Six Sigma it seems pertinent to discuss the different types of distributions you will see in your analysis. Let’s start with one at a time. The most common distribution is the Normal Distribution and here’s what you should know about it.
First, what is a distribution?
Simply put, a distribution will tell you how often a variable occurs in your process. This is important because the commonness of your variables will inevitable create a foundation for your improvement project.
Types of Distribution
The Normal Distribution
A normal distribution (Gaussian Curve, the average person knows it as the Bell Curve) shows a equal distribution. The mean (the average) divides the data in half, 50% on the data on each side of the mean. The Normal Distribution will have the following hallmarks:
This distribution is considered to be the most important distribution.
The area under the curve should equal 1.
Physical aspects of the curve should resemble a hill and should be symmetrical.
Both directions on either side of the mean extend indefinitely and never touch the horizontal axis.
White noise in your process should produce a normal curve shape
The Z distribution has a mean of 0 and a standard deviation of 1.
The mean (average), median (mid-point) and the mode (most common value) should be the same data value.
Next week, it’s on to non-normal classifications. Get to analyzing and if you need any help, reach out and let us know!
As we keep walking down this wonderful world of 6Sigma it’s important that we talk about how capability is measured. We’ve been talking about process capability for a few weeks now, so let’s talk about the capability measurement methods. This week we are going to focus on capability index and process capability.
What does it mean?
The first thing we need to understand are the terms for measurement, so here are a few basic definitions.
Cpk and Cp are capability rates and Pp and PPk are performance rates.
Cp- When you see this, you’re talking about rate of your process capability. To find it you use this formula:
Pp-When this comes up, the conversation is speaking to the pure performance of your process. The formula to find this data is:
Cpk- This refers to your process capability index, basically telling you how close your project is running to the acceptable limits. The formula for finding Cpk is:
Ppk-This refers to the non-centered distribution, when you hear this term it’s referring to adjustments to the effects that distribution. The formula for Ppk is:
What’s the Difference?
The main difference is the way the information is calculated. Cp and Pp is really short term data that considers only the quantity of information determined by specified limits. Cpk and Ppk rates process capability based on centralization and variation within one specification limit.
Data is so much more than numbers, but by understanding the why and the how 6Sigma begins to teach us what is significant in our data.
As we cover Six Sigma Statistics, I want to make sure that I go over the illustrative part of the statistics. We know Six Sigma is technical but the key to making it stick, is to make it simple and understood by the non-technical people using it. So let’s talk about the Box Plot or the Whisker Plot. A key thing to remember in Six Sigma is that everyone using different terminology, so ask questions and make sure you are speaking the same language.
What is a Box Plot?
Simply put a box plot helps to put a picture to the data showing you where most of the data falls, how the data is distributed and where the outliers are. So it basically shows you what you’ve got, how it looks and what is unusual about it.
What does it measure?
Say you have a process that has multiple variables affecting it and you want to know what is what. If you have a delivery truck with 4 alternative routes a box plot can show you which ones, according to the data, are the most problematic. Additionally a box plot will tell you how symmetrical your data is. Knowing if your data is skewed or not can affect how you interpret your data. In a box plot, if the data is mostly symmetrical the median will appear in the middle of the box and the whiskers will appear to be mostly the same length. IF the data is skewed to one direction, the median will not be in the middle and the whiskers will be different sizes.
How does it work?
Box plot measurements are based on quartiles and the distributions are shown within the graphic. Think back to your SAT’s or ACT’s. Remember how they told you that you scored in the 25th percentile? Well that’s a box plot. You will have an upper limit and a lower limit and those limits will be determined by your organization’s goals. The outliers will be the extreme values, values that are so far outside of the normal distribution that it is unlikely they will be reproduced.
Interpreting your data is just as important as gathering it, so choose carefully and with purpose. Talk to your belt and use that advice to help you find the best method for your organization.
As we go over Six Sigma statistics, we have to talk about normal distribution. Before we get to that though we have to talk about why distribution is important to the way you interpret your data. In interpreting your data there is something you should know before you tackle how the information observed, confidence intervals. Confidence intervals is more complicated than this blog, but basically what you need to know is the greater the confidence level the less likely the variation is to occur and the more you can guarantee the accuracy of data analysis. In confidence levels there are 3 common ones that we use in data analysis, 99%, 95% and 90%. The standard of measurement is 95%, the higher the better but as a baseline 95% is a solid analytic benchmark.
Okay so back to normal distribution. Here’s what you need to know.
What is it?
You find normal distribution when you take all of your data and create a visual representation of the information. You will illustrate when recurring variations show up in your process. It is actually more helpful when you have a distribution that isn’t normal because then you can say ‘Aha it was the 3 hour traffic jam that affected the process’. When you hear people talk about the curve, this is what they are referring to.
When do you use it?
This is a tool that is best when used as a continuous probability model with measurements that you don’t have to create. Think about the weight of a cargo shipment or the number of a specific product you receive.
Raw scores and Z scores
Each normal distribution will have a raw score which is made up of two parameters: the mean and the standard deviation. The Z score measures how far you varied from a particular point on your data line. In real terms it means, if you want to see how many errors occurred on the 5th then standard deviation shows you that.
Why is it important?
The area under the curve shows the proportion of the curve and which tells you how important this data is to your business. Is the curve is small then you now that the distribution occurs within a relatively small set of circumstances which is easier to control within process. A wider distribution shows you that your process can be interrupted by a variety of factors and may need you to keep a close eye on it.
This is a micro blog this week, because next week we get into measures of variation which is a dry subject and will challenge my creative ability. As we continue our trek into statistics and how to interpret them, there is a very specific area that I want you to pay attention to, variation. The reason variation is so important is that it tells you why something is different and how that matters to the data set as a whole. It also provides you the knowledge of what the data won’t be able to tell you because of the interference the variation causes. This is important because when you interpreting data understanding the limitations is almost more important than understanding what is being told to you.
The first thing to consider is range. Range will tell you the difference between the most obvious observation and the smallest one. This is important because this is where you identify your outliers (variables that are outside the norm, think of road work on a delivery path or a maternity event as an obvious observation). A large range would be the maternity event; it’s so big there is no way to avoid noticing it. A small range would be a traffic event, it may have impact but the impact will not be evenly distributed and it may or may not impact the final result.
There is a measurement range that is good for a sample size of 2, it’s called the inter-quartile range. For a bigger sample stick to standard deviation; Standard deviation tells you the average number of times a variation occurs from the mean.
By all means as usual this is not a step by step approach to understanding variation, but it is enough of a foundation to have a conversation with your belt about the metrics and what they mean to your organization and its strategic goals. If you need help starting this conversation, give us a call and we will be happy to get you started.