Last week we talked about normal distribution in your data. This week let’s kick the conversation off with non-normal distribution. There are a few different types of non-normal distribution, let’s take a look.
Skewed data is quite simply, a data distribution that is not symmetrical. Usually the longest tail points should point in the direction of the skew. Here’s what a skew looks like
Natural limits-these are the limits of sample size. The problem with natural limits is that these natural limits can bias the estimation of results and in some cases ensure that there can be no specific correlation between the sample and the data field.
This is also known as artificial limits and it’s important to realize that limits are imposed by the person analyzing the data. Basically artificial limits set an arbitrary point for acceptable and not acceptable. Say you make 40 chairs and hour, your designer decides that any chair that doesn’t make a rating of 80 is unacceptable. That acceptable rating is completely arbitrary based on the designer’s standards.
Mixtures occur when data from different sources is expected to be the same and is different. Say you’re looking for error data from two cashiers Shift A credit card receipts and Shift B, cash receipts and the skew is not the same. You were expecting the error rate for each method to have a normal distribution and what you got showed something like this.
Next week we will pick up with a continuation of non-normal distributions. Until then, Happy analyzing
In metrics the most honest finding will be that your metrics will have degrees of variation. Understanding where and how those metrics occur, is the key to using your data in a forward thinking strategy. Let’s start with something simple, like toy production. We are going to track some standard variation sources.
Within Unit Encoding
This variation source occurs when you are measuring output from a single production cycle. Some places that variation is likely to occur are the width of parts, color shading, length of toy etc. Now you can choose to analyze different production cycles on the same day or alternating days, but you will always be comparing samples from the same cycle. A new production sample means a new data point.
Between Unit Encoding
These names are dead giveaways, but I digress! This implies that you are looking at samples from two different production cycles. This is different in that you would want to identify two different samples from different production cycles. The variations you are looking for will give you some clue as to whether the variations are operation influenced or process influenced.
This is the trickiest variation source. This specifically calls for you to compare your variation averages from all of your data points in a single day. So you can theoretically have both within unit variation data and between unit variations data, depending on how specific you need to get.
The key to getting the most out of your data is to understand what it’s telling you. Understanding where the variations are coming from is the first step to getting the most out of your data.
As we keep walking down this wonderful world of 6Sigma it’s important that we talk about how capability is measured. We’ve been talking about process capability for a few weeks now, so let’s talk about the capability measurement methods. This week we are going to focus on capability index and process capability.
What does it mean?
The first thing we need to understand are the terms for measurement, so here are a few basic definitions.
Cpk and Cp are capability rates and Pp and PPk are performance rates.
Cp- When you see this, you’re talking about rate of your process capability. To find it you use this formula:
Pp-When this comes up, the conversation is speaking to the pure performance of your process. The formula to find this data is:
Cpk- This refers to your process capability index, basically telling you how close your project is running to the acceptable limits. The formula for finding Cpk is:
Ppk-This refers to the non-centered distribution, when you hear this term it’s referring to adjustments to the effects that distribution. The formula for Ppk is:
What’s the Difference?
The main difference is the way the information is calculated. Cp and Pp is really short term data that considers only the quantity of information determined by specified limits. Cpk and Ppk rates process capability based on centralization and variation within one specification limit.
Data is so much more than numbers, but by understanding the why and the how 6Sigma begins to teach us what is significant in our data.
We opened last week with Process Capability and before we go full-fledged into that area, I want to pause and put some focus on capability studies.
What is a Capability Study?
To review from last week, a capability study is a way to ensure that your process is consistent over an extended period of time. For example if step 3 in your process produces 3 errors per cycle for 3 years, your process in consistent.
How Do You Find Stability?
There are a ton of tools you can use to test the stability of your process, but some of the most common tools are Time Series Plots and Control Charts. In addition to these tools there is a step by step process (of course!) to test the capability of your process, here they are.
What should know about capability studies?
As with all 6Sigma tools, the effectiveness of this tools lies more in how you understand and how you apply it. The most important things to remember are:
- Capability studies are used to measure the same parts of the process, at the same stage in the process at exactly the same time every time it is measured.
- You can use the capability study on discrete and continuous data.
- You get the best (ie most meaningful) information when you run the study on already stable and predictable data. New processes are not the best place for this tool.
- When you hear Sigma Level, they are talking about capability.
- Capability studies require you to understand:
- The limits of your customer or organization.
- The difference between short-term and long-term
data and what those differences mean to your organization or customer.
- Mean and standard deviation.
- How to assess normality of your data.
- How your organization or customer determine Sigma level.
Capability Studies can give you a great deal of insight on how your organization is running and what is making it difficult. This is one way to get a sense of the information flow and the quality of the information you can get your hands on. So let’s start off the new year with a look at what your data is telling you. Happy Hunting!