As we cover Six Sigma Statistics, I want to make sure that I go over the illustrative part of the statistics. We know Six Sigma is technical but the key to making it stick, is to make it simple and understood by the non-technical people using it. So let’s talk about the Box Plot or the Whisker Plot. A key thing to remember in Six Sigma is that everyone using different terminology, so ask questions and make sure you are speaking the same language.
What is a Box Plot?
Simply put a box plot helps to put a picture to the data showing you where most of the data falls, how the data is distributed and where the outliers are. So it basically shows you what you’ve got, how it looks and what is unusual about it.
What does it measure?
Say you have a process that has multiple variables affecting it and you want to know what is what. If you have a delivery truck with 4 alternative routes a box plot can show you which ones, according to the data, are the most problematic. Additionally a box plot will tell you how symmetrical your data is. Knowing if your data is skewed or not can affect how you interpret your data. In a box plot, if the data is mostly symmetrical the median will appear in the middle of the box and the whiskers will appear to be mostly the same length. IF the data is skewed to one direction, the median will not be in the middle and the whiskers will be different sizes.
How does it work?
Box plot measurements are based on quartiles and the distributions are shown within the graphic. Think back to your SAT’s or ACT’s. Remember how they told you that you scored in the 25th percentile? Well that’s a box plot. You will have an upper limit and a lower limit and those limits will be determined by your organization’s goals. The outliers will be the extreme values, values that are so far outside of the normal distribution that it is unlikely they will be reproduced.
Interpreting your data is just as important as gathering it, so choose carefully and with purpose. Talk to your belt and use that advice to help you find the best method for your organization.
In Six Sigma we are always collecting data, generally we are collecting data to address a current problem in our operations or services. The wonderful thing about Six Sigma is that we are also able to collect passive data. The usefulness of passive data is that it provides us with the ability to identify patterns, the catch to visualizing these patterns is in selecting the right graph to view the data.
Why use a graph?
The first benefit that comes to mind is the ability to see the error trends from a visual perspective. The other reasons graphs are a great tool are:
- Alongside identifying trends, they also help you see potential variable relationships. When you have a situation that could have multiple culprits, a graph can help you see which ones are a real potential.
- They can help you identify the risks that your customers will determine critical. This move allows your customer to be proactive instead of reactive, a much more desirable trait.
- It allows you to systematically dismiss variables and determine which one’s control other ones.
- It shows you the results of the passive data you’ve collected.
Where do I get the information for a graph?
Data is everywhere right? Yes and No. Your graph is only as good as your data, so we don’t want questionable data. The integrity of your data will be defined by your individual organization, but if you stick to these three questions you should be fine:
- What do you need the data to tell you?
- How often do you need to collect it?
- How do you need to collect it?
Next week we will get into the types of graphs and what times of data are appropriate for them. Until then happy hunting!
As we go over Six Sigma statistics, we have to talk about normal distribution. Before we get to that though we have to talk about why distribution is important to the way you interpret your data. In interpreting your data there is something you should know before you tackle how the information observed, confidence intervals. Confidence intervals is more complicated than this blog, but basically what you need to know is the greater the confidence level the less likely the variation is to occur and the more you can guarantee the accuracy of data analysis. In confidence levels there are 3 common ones that we use in data analysis, 99%, 95% and 90%. The standard of measurement is 95%, the higher the better but as a baseline 95% is a solid analytic benchmark.
Okay so back to normal distribution. Here’s what you need to know.
What is it?
You find normal distribution when you take all of your data and create a visual representation of the information. You will illustrate when recurring variations show up in your process. It is actually more helpful when you have a distribution that isn’t normal because then you can say ‘Aha it was the 3 hour traffic jam that affected the process’. When you hear people talk about the curve, this is what they are referring to.
When do you use it?
This is a tool that is best when used as a continuous probability model with measurements that you don’t have to create. Think about the weight of a cargo shipment or the number of a specific product you receive.
Raw scores and Z scores
Each normal distribution will have a raw score which is made up of two parameters: the mean and the standard deviation. The Z score measures how far you varied from a particular point on your data line. In real terms it means, if you want to see how many errors occurred on the 5th then standard deviation shows you that.
Why is it important?
The area under the curve shows the proportion of the curve and which tells you how important this data is to your business. Is the curve is small then you now that the distribution occurs within a relatively small set of circumstances which is easier to control within process. A wider distribution shows you that your process can be interrupted by a variety of factors and may need you to keep a close eye on it.
The Pareto principle, most commonly known as the 80-20 rule, is known by business owners as the simple fact that 80% of your problems are caused by 20% of the people. Really the theory was about wealth and power distribution, but the general premise applies. Most of your issues can be attributed to a fairly small distribution of root causes.
What does it look like?
What does it do?
A Pareto Charts work in levels to help you identify the root cause of the tallest bar (the biggest issue).
How do I use it?
The trick with Pareto is to start high and whittle away. What does that mean? It means that when you find out department A is supplying department b with all of the material that ends up in their rework, don’t go to department b and shut everything down. I know that it seems counterintuitive, but jumping the gun before you find out why that material ends up in the rework pile, leads to rework on department a’s part, causing more defects.
What doesn’t it do?
Pareto doesn’t provide an instant Ah-ha moment, it’s a method that requires patience and adherence to the process to be effective. If you need the answer now, it may not be the method for you. You may be better suited to process mapping or the 5 Why’s which will point you in a direction immediately. I have to say however, if you want the right answer validated by numbers then Pareto is right for you.
In 6Sigma the devil is in the details and a successful improvement initiative depends heavily on the selection of the right tool for the engagement. A successful selection depends heavily on the knowledge and skill of your belt, so use that library of knowledge and if that belt isn’t asking you a thousand questions about your end goal-move on!