Last week we talked about normal distribution in your data. This week let’s kick the conversation off with non-normal distribution. There are a few different types of non-normal distribution, let’s take a look.
Skewed data is quite simply, a data distribution that is not symmetrical. Usually the longest tail points should point in the direction of the skew. Here’s what a skew looks like
Natural limits-these are the limits of sample size. The problem with natural limits is that these natural limits can bias the estimation of results and in some cases ensure that there can be no specific correlation between the sample and the data field.
This is also known as artificial limits and it’s important to realize that limits are imposed by the person analyzing the data. Basically artificial limits set an arbitrary point for acceptable and not acceptable. Say you make 40 chairs and hour, your designer decides that any chair that doesn’t make a rating of 80 is unacceptable. That acceptable rating is completely arbitrary based on the designer’s standards.
Mixtures occur when data from different sources is expected to be the same and is different. Say you’re looking for error data from two cashiers Shift A credit card receipts and Shift B, cash receipts and the skew is not the same. You were expecting the error rate for each method to have a normal distribution and what you got showed something like this.
Next week we will pick up with a continuation of non-normal distributions. Until then, Happy analyzing
As we continue our journey in Six Sigma it seems pertinent to discuss the different types of distributions you will see in your analysis. Let’s start with one at a time. The most common distribution is the Normal Distribution and here’s what you should know about it.
First, what is a distribution?
Simply put, a distribution will tell you how often a variable occurs in your process. This is important because the commonness of your variables will inevitable create a foundation for your improvement project.
Types of Distribution
The Normal Distribution
A normal distribution (Gaussian Curve, the average person knows it as the Bell Curve) shows a equal distribution. The mean (the average) divides the data in half, 50% on the data on each side of the mean. The Normal Distribution will have the following hallmarks:
This distribution is considered to be the most important distribution.
The area under the curve should equal 1.
Physical aspects of the curve should resemble a hill and should be symmetrical.
Both directions on either side of the mean extend indefinitely and never touch the horizontal axis.
White noise in your process should produce a normal curve shape
The Z distribution has a mean of 0 and a standard deviation of 1.
The mean (average), median (mid-point) and the mode (most common value) should be the same data value.
Next week, it’s on to non-normal classifications. Get to analyzing and if you need any help, reach out and let us know!
In metrics the most honest finding will be that your metrics will have degrees of variation. Understanding where and how those metrics occur, is the key to using your data in a forward thinking strategy. Let’s start with something simple, like toy production. We are going to track some standard variation sources.
Within Unit Encoding
This variation source occurs when you are measuring output from a single production cycle. Some places that variation is likely to occur are the width of parts, color shading, length of toy etc. Now you can choose to analyze different production cycles on the same day or alternating days, but you will always be comparing samples from the same cycle. A new production sample means a new data point.
Between Unit Encoding
These names are dead giveaways, but I digress! This implies that you are looking at samples from two different production cycles. This is different in that you would want to identify two different samples from different production cycles. The variations you are looking for will give you some clue as to whether the variations are operation influenced or process influenced.
This is the trickiest variation source. This specifically calls for you to compare your variation averages from all of your data points in a single day. So you can theoretically have both within unit variation data and between unit variations data, depending on how specific you need to get.
The key to getting the most out of your data is to understand what it’s telling you. Understanding where the variations are coming from is the first step to getting the most out of your data.
As we keep walking down this wonderful world of 6Sigma it’s important that we talk about how capability is measured. We’ve been talking about process capability for a few weeks now, so let’s talk about the capability measurement methods. This week we are going to focus on capability index and process capability.
What does it mean?
The first thing we need to understand are the terms for measurement, so here are a few basic definitions.
Cpk and Cp are capability rates and Pp and PPk are performance rates.
Cp- When you see this, you’re talking about rate of your process capability. To find it you use this formula:
Pp-When this comes up, the conversation is speaking to the pure performance of your process. The formula to find this data is:
Cpk- This refers to your process capability index, basically telling you how close your project is running to the acceptable limits. The formula for finding Cpk is:
Ppk-This refers to the non-centered distribution, when you hear this term it’s referring to adjustments to the effects that distribution. The formula for Ppk is:
What’s the Difference?
The main difference is the way the information is calculated. Cp and Pp is really short term data that considers only the quantity of information determined by specified limits. Cpk and Ppk rates process capability based on centralization and variation within one specification limit.
Data is so much more than numbers, but by understanding the why and the how 6Sigma begins to teach us what is significant in our data.
In our conversations about process capability, I want to focus your attention on baseline performance. Baseline Performance is an alternative way to view long-term and short-term data. When you hear baseline performance it most likely will be a description of baseline performance and it most likely will be used to describe long-term data.
What it means
Baseline in a nutshell gives you the average long-term performance of a specific process without
controlling any variables. The easiest way to think of this is a visualization of FTY (First Time Yield). Remember FTY shows you the challenges in your process when they are normally run without any interference from you.
What to use it on
When measuring baseline, you are identifying a typical challenge within a process. For example if you are observing the process for returns, your long-term data will include morning, afternoon and evening shift; multiple employees and submission points (email, in-person and via telephone).
Your short term data will appear on the visualization as well, so you will be able to see in a visual representation short-term and long-term average behavior for your processes. If there is always a dip in quality at around lunchtime, you will be able to see that visually represented in your data.
Why use it?
Baseline performance is going to quickly tell you where your burning platform issues are. If you are heading into a meeting with management, this is a report to take with you. It shows the long-term vs. short-term and gives you solid business evidence to support improvement projects.
Next week, we will tackle measures of capability and what they tell you. Remember that this is can be the starting point to discuss improvement with your belt. If you need to get started, give us a call and we can get you started.