Measurement uncertainty estimation with Linear Mixed Models in R

statistics
programming
Author

João Ramalho

Published

September 27, 2022

Objective

Assess the feasibility and benefits of using linear mixed models in measurement uncertainty studies (e.g. enlarge the number in input factors. This document provides a comparison of measurement uncertainty studies with lm and anova, with the R SixSigma package and with the R lme4 package.

Introduction

The estimation of a measurement method uncertainty is a key output of a method validation. It provides users of the method a way to communicate the measurement results taking in consideration the variability introduced by the method itself as described in The beginers guide to uncertainty of measurement pag 17 by S.Bell Bell (2001)

Many different ways to estimate uncertainty have been listed by J E Muelaner (2015). In my professional activity I’ve been working to improve the way to estimate this uncertainty by adopting some of these different statistical approaches and the related software tools. In particular I’ve started by using the Minitab gage rnR following the AIAG2010 guidelines and later adopted the R package SixSigma by E.Cano Emilio L. Cano (2012). This last approach provided an effective way to treat large datasets but still has limitations in the modelling of random effects, overestimating errors and sometimes leading to negative variations, as explained by D.Montgomery in Design and analysis of experiments page 573 Montgomery (2012)

In this article we explore linear mixed models with the R package lme4, comparing it with the previous approaches and suggest routes to integrate as a new operational tool in our toolset for method validation.

Methods

Data loading

The dataset corresponds to a gage rnR study performed on pod volume measured with a vision system. It consists of 5 parts and 3 operators that measured each part 15 times. This has resulted in 225 individuals measurements.

data <- read_excel(here("posts/20220927/Gage_RnR_template_V4.xlsx"), range = "A3:E78")
data_long <- data %>% 
  pivot_longer(cols = c(Op1, Op2, Op3), 
               names_to = "Operator", values_to = "measure")
measure_long <- data_long %>% 
  mutate(across(.cols = c(Part, Operator), as_factor))

We’re showing below the characteristics of this dataset:

measure_long %>% summary()
 Part     Replicate         Operator    measure    
 P1:45   Length:225         Op1:75   Min.   :1799  
 P2:45   Class :character   Op2:75   1st Qu.:1803  
 P3:45   Mode  :character   Op3:75   Median :1805  
 P4:45                               Mean   :1805  
 P5:45                               3rd Qu.:1808  
                                     Max.   :1813  

Results

Linear model

The first and most direct way to assess the effect of the input factors is to establish a simple linear model and perform an anova.

measure_lm <- lm(measure ~ Part + Operator + Part:Operator, measure_long)
summary(measure_lm)

Call:
lm(formula = measure ~ Part + Operator + Part:Operator, data = measure_long)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.2925 -0.7783 -0.0317  0.7431  3.6451 

Coefficients:
                    Estimate Std. Error  t value Pr(>|t|)    
(Intercept)        1803.3953     0.3237 5570.485  < 2e-16 ***
PartP2               -1.9339     0.4578   -4.224 3.58e-05 ***
PartP3                5.8067     0.4578   12.683  < 2e-16 ***
PartP4                3.3965     0.4578    7.419 2.89e-12 ***
PartP5                1.0892     0.4578    2.379   0.0183 *  
OperatorOp2           0.5223     0.4578    1.141   0.2553    
OperatorOp3          -0.2854     0.4578   -0.623   0.5337    
PartP2:OperatorOp2    0.1732     0.6475    0.267   0.7893    
PartP3:OperatorOp2   -0.4307     0.6475   -0.665   0.5066    
PartP4:OperatorOp2    0.4805     0.6475    0.742   0.4589    
PartP5:OperatorOp2    0.1041     0.6475    0.161   0.8725    
PartP2:OperatorOp3    0.3327     0.6475    0.514   0.6079    
PartP3:OperatorOp3    0.9127     0.6475    1.410   0.1601    
PartP4:OperatorOp3    0.9753     0.6475    1.506   0.1335    
PartP5:OperatorOp3    0.3723     0.6475    0.575   0.5659    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.254 on 210 degrees of freedom
Multiple R-squared:  0.8399,    Adjusted R-squared:  0.8292 
F-statistic: 78.67 on 14 and 210 DF,  p-value: < 2.2e-16
measure_lm_aov <- aov(measure_lm)
summary(measure_lm_aov)
               Df Sum Sq Mean Sq F value Pr(>F)    
Part            4 1707.1   426.8 271.457 <2e-16 ***
Operator        2   13.1     6.6   4.177 0.0166 *  
Part:Operator   8   11.2     1.4   0.892 0.5237    
Residuals     210  330.1     1.6                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In this case the anova indicates that the part has a significative effect (as expected) and the operator too.

Six Sigma

measure_rr <- ss.rr(
  data = measure_long, 
  var = measure, 
  part = Part, 
  appr = Operator, 
  alphaLim = 1,
  errorTerm = "repeatability",
  main = "Micrometer FTR600\nr&R for tablet thickness",
  sub = "Tablet L",
  lsl = 17750,
  usl = 18250, 
  print_plot = FALSE
)
Complete model (with interaction):

               Df Sum Sq Mean Sq F value Pr(>F)
Part            4 1707.1   426.8 271.457 <2e-16
Operator        2   13.1     6.6   4.177 0.0166
Part:Operator   8   11.2     1.4   0.892 0.5237
Repeatability 210  330.1     1.6               
Total         224 2061.6                       

alpha for removing interaction: 1 

Gage R&R

                      VarComp %Contrib
Total Gage R&R     1.64098201    14.79
  Repeatability    1.57212495    14.17
  Reproducibility  0.06885705     0.62
    Operator       0.06885705     0.62
Part:Operator      0.00000000     0.00
Part-To-Part       9.45247729    85.21
Total Variation   11.09345930   100.00

                      VarComp    StdDev  StudyVar %StudyVar %Tolerance
Total Gage R&R     1.64098201 1.2810082  7.686049     38.46       1.54
  Repeatability    1.57212495 1.2538441  7.523064     37.65       1.50
  Reproducibility  0.06885705 0.2624063  1.574438      7.88       0.31
    Operator       0.06885705 0.2624063  1.574438      7.88       0.31
Part:Operator      0.00000000 0.0000000  0.000000      0.00       0.00
Part-To-Part       9.45247729 3.0744881 18.446929     92.31       3.69
Total Variation   11.09345930 3.3306845 19.984107    100.00       4.00

Number of Distinct Categories = 3 

There are many outputs that are provided by the SixSigma package which are discussed in full details in the Precision chapter of our book industRial statistics (in particular the results produced by R have been confirmed by us with a manual calculation in a spreadsheet using the same dataset).

Mixed Model (2 factors)

https://therbootcamp.github.io/SwR_2019Apr/_sessions/MixedModels/MixedModels.html#23

https://therbootcamp.github.io/SwR_2019Apr/_sessions/MixedModels/MixedModels_practical.html

measure_lme <- lmer(measure ~ 1 + 
                      (1|Operator) + (1|Part) + (1|Part:Operator), 
                    measure_long)

summary(measure_lme)
Linear mixed model fit by REML ['lmerMod']
Formula: measure ~ 1 + (1 | Operator) + (1 | Part) + (1 | Part:Operator)
   Data: measure_long

REML criterion at convergence: 766.9

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.2650 -0.6427  0.0242  0.5793  2.8999 

Random effects:
 Groups        Name        Variance Std.Dev.
 Part:Operator (Intercept) 0.00000  0.0000  
 Part          (Intercept) 9.44894  3.0739  
 Operator      (Intercept) 0.06668  0.2582  
 Residual                  1.56592  1.2514  
Number of obs: 225, groups:  Part:Operator, 15; Part, 5; Operator, 3

Fixed effects:
            Estimate Std. Error t value
(Intercept) 1805.341      1.385    1303
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')

This approach with linear mixed models using the lme4 R package produces very similar results (the cause for the small differences is still to be investigated and should in principle be related with parameters of the model and calculation functions).

Mixed Model (3 factors)

data2 <- data %>% 
  mutate(Op1d2 = Op1*1.001, Op2d2 = Op2*1.002, Op3d2 = Op3*1.003)

data2_long <- data2 %>% 
  pivot_longer(cols = c(Op1, Op2, Op3, Op1d2, Op2d2, Op3d2), 
               names_to = "Operator", values_to = "measure") %>% 
  mutate(day = str_sub(Operator, 4,5)) %>% 
  mutate(day = if_else(day == "", "d1", day)) %>% 
  mutate(Operator = str_sub(Operator, 1, 3))

measure2_long <- data2_long %>% 
  mutate(across(.cols = c(Part, Operator, day), as_factor))
measure2_lme <- lmer(measure ~ 1 + 
                       (1|Operator) + (1|Part) + (1|day) + 
                       (1|Part:Operator) + (1|day:Operator), 
                     measure2_long)
summary(measure2_lme)
Linear mixed model fit by REML ['lmerMod']
Formula: measure ~ 1 + (1 | Operator) + (1 | Part) + (1 | day) + (1 |  
    Part:Operator) + (1 | day:Operator)
   Data: measure2_long

REML criterion at convergence: 1526.9

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.4125 -0.6233  0.0105  0.5731  2.9625 

Random effects:
 Groups        Name        Variance Std.Dev.
 Part:Operator (Intercept) 0.04326  0.2080  
 day:Operator  (Intercept) 1.61265  1.2699  
 Part          (Intercept) 9.46258  3.0761  
 Operator      (Intercept) 0.28599  0.5348  
 day           (Intercept) 5.95525  2.4403  
 Residual                  1.53155  1.2376  
Number of obs: 450, groups:  
Part:Operator, 15; day:Operator, 6; Part, 5; Operator, 3; day, 2

Fixed effects:
            Estimate Std. Error t value
(Intercept) 1807.146      2.289   789.4
optimizer (nloptwrap) convergence code: 0 (OK)
Model failed to converge with max|grad| = 0.00307362 (tol = 0.002, component 1)

This approach with the lme4 package extends to 3 factors, in our view demonstrating its interest for the cases where its needed to go beyond the SixSigma package and as here add for instance the day as a random input factor.

Conclusions

This document shows that results obtained with the R packages SixSigma and lme4 are comparable. As lme4 allows to extend the number of input factors thus introducing for example the day, we recommend its application in a new concrete case to further assess its application in practice and enlarge the Method validation toolbox.

References

References

Bell, Stéphanie. 2001. The Beginers Guide to Uncertainty of Measurement. 2nd ed. Teddington, Middlesex, United Kingdom: National Phisical Laboratory. www.npl.co.uk/special-pages/guides/gpg11_uncertainty.
Emilio L. Cano, Andrés Redchuk, Javier M. Moguerza. 2012. Six Sigma with r. 1th ed. Vol. 36. Use r! Springer New York Heidelberg Dordrecht London: Springer.
J E Muelaner, M Chappell, A Francis. 2015. “A Hybrid Measurement Systems Analysis and Uncertainty of Measurement Approach for Industrial Measurement in the Light Controlled Factory.” Laboratory for Integrated Metrology Applications, Dep. Of Mechanical Engineering, University of Bath, Bath, UK. www.muelaner.com/wp-content/uploads/2015/03/Hybrid-MSA-and-Uncertainty-at-SDM.pdf.
Montgomery, Douglas C. 2012. Design and Analysis of Experiments. 8th ed. Wiley.