ML with R-tidymodels: comparison with mlr3

machine learning

tools

Published

October 22, 2021

Several different frameworks and standalone packages are available in R to do Machine Learning (ML). While the selection of a framework is also a matter of opportunity, taste and awareness, having a general overview of the landscape is helpful. In this particular case because not all frameworks and packages are complementary or compatible.

Although having different implementations, all frameworks aim at providing a unified interface by establishing a specific syntax to call the ML algorithms and engines and returning coherent data objects. This principle is not new, the historical R statistical functions themselves aim at providing a uniform syntax to run various statistical techniques (e.g. using glm with many different distributions such as in glm(family = “poisson”, …).

In this article we’re shortlisting two key frameworks for selection and providing some background information to support our choice, which is made on the following criteria:

shorter learning curve
R native
long term perspectives in R (since how long it exists, the community size)
suited to industrial data science and working with its historical packages (stats, MASS, qcc, DoE.base, etc)

At this moment there are two main (competing) frameworks meeting these criteria, which have the main characteristics below:

	mlr3	tidymodels
creation	2012	2018
community	mlr-org	RStudio
syntax	base R	tidyverse
nr. packages	~15	~15
packages wrapped	>100
lifecycle phase	full redesign	early stage
main imports	data.table, R6	various tidyverse

ml3 is potentially faster and more robust in handling data and errors. This because it builds on base R, data.table and R6 which have in general better performance than the tidyverse. For R users from before the tidyverse, specially if still relying on older packages or skeptical of the tidyverse this is certainly the best choice.

My personal choice goes nevertheless to the tidymodels. I have started R directly with the tidyverse and I expect strong synergies between the two frameworks. One can be hesitating regarding the added value of tidymodels compared with using directly the model engines. We may also be skeptical on how much tidymodels capitalizes and preserves all the historical knowledge captured into important R packages such as stats and MASS. All this is confirmed and explained in detail in the tidymodels principles.

Other frameworks not covered here have their own strengths. Some of them build heavily on C++, for some others R is just another API. A specific one, close to tidymodels is caret which will certainly end up being absorbed as it has been developed by Max Kuhn who is now part of the tidymodels team.

Conveniently an exhaustive listing of packages and frameworks for ML is available in CRAN ML task view maintained by Torsten Hothorn from the R-project.org. Note that frameworks are called metapackages.

To go further, see below sites and books on the selected frameworks:
tidymodels.org
tidy modeling with R, the book
mlr-org
mlr3, the book

For additional books specifically on machine learning see my bookshelf page.