Comparing base R and tidyverse in Industrial statistics applications.


September 24, 2021

We in the manufacturing industry have been using large datasets since many years. Most industrial applications of statistics require some kind of software going beyond excel. A real life application of a statistical process control requires an application allowing to select time windows from a database in order to properly display a time series. It becomes immediately required that some calculations are done to allow for showing limits and trend lines.

As consequence the statistics and programming language R has started to be adopted for such applications since a long time. A clear example is the package {QCC} by Scrucca (2004) which announced itself in RNews in 2004 already. Although this package is very well maintained, this is unfortunately not the case for many others. Either because they were the output of a research project which is completed or because there’s no clear demand. The resulting situation is that a lot of this software is from before the {tidyverse} which is only from 2016. This makes that those packages follow a rather different programming approach by privileging side effects, lists and plots with {lattice} for instance.

The RStudio team is addressing this situation for the universe of modeling in general with the {tidymodels} family of packages adding an additional layer on top of the classical functions such as lm and by breaking down the outputs into tibbles such as it is done by {broom}. This aims at unifying the interfaces and the data objects and adopting a similar approach for the classical industrial and quality packages such as {qcc}, {SixSigma}, {qicharts} and {DoE.base} would make them more coherent with each other and with the {tidyverse}.



Scrucca, Luca. 2004. “Qcc: An r Package for Quality Control Charting and Statistical Process Control.” R News 4/1: 11–17.