New blog post: Spatial Machine Learning with tidymodels
This post shows how to apply the tidymodels framework to spatial data workflows in R. Part 3 in a series about #sml.
New blog post: Spatial Machine Learning with tidymodels
This post shows how to apply the tidymodels framework to spatial data workflows in R. Part 3 in a series about #sml.
Hello #rstats #tidymodels #yolo #machinelearning community.
I think I am about to start some image recognition with R, in order to find traffic-calming devices on fine satellite imagery. @cquest has been doing some of that with Python.
Has somebody been there and has some words of advice or tips ?
Thanks !
This summer, join the tidymodels team as an intern and help expand the possibilities of feature selection!
Over the years, our eight summer interns have added incredible contributions, including packages like agua, applicable, bundle, butcher, shinymodels, spatialsample, and stacks. Now, it’s your turn to shape the future of #tidymodels #RStats tools!
Learn more and apply: https://www.tidyverse.org/blog/2025/01/tidymodels-2025-internship/
Introducing support for postprocessing in tidymodels!
Postprocessors refine predictions outputted from machine learning models to improve predictive performance or better satisfy distributional limitations.
The tidymodels team has been working on a set of changes across many #tidymodels packages to introduce support for postprocessing. They would love to hear your thoughts on their progress so far!
Learn more in the blog post: https://www.tidyverse.org/blog/2024/10/postprocessing-preview/
recipes 1.1.0 is on CRAN! recipes lets you create a pipeable sequence of feature engineering steps.
Improvements in column type checking, allowing more data types to be passed to recipes, use of long formulas, and better error for misspelled argument names.
Check out the blog post for more details (and a delicious treat at the end ): https://www.tidyverse.org/blog/2024/07/recipes-1-1-0/
I'll be running an "Introduction to machine learning with {tidymodels}" workshop at RSS Conference in September!
Session details: Wednesday 4 September, 2024
11:30am - 12:50pm
Brighton, UK
More info: https://virtual.oxfordabstracts.com/#/event/6693/program?session=92723&s=2600
Register: https://rss.org.uk/training-events/conference-2024/
We have five posit::conf(2024) workshops for #RStats
and #Python
modeling and ML enthusiasts!
• Causal Inference in R, led by @malcolmbarrett and @travisgerke
• Introduction to machine learning in Python with Scikit-learn, led by @TiffanyTimbers and Trevor Campbell
• Intro to MLOps with vetiver, led by @isabelizimm
• Introduction to tidymodels, led by @hfrick and @simonpcouch
• Advanced Tidymodels, led by @topepo
Preprint from Simon Wood on the new cross-validation smoothness estimation in #mgcv: https://arxiv.org/abs/2404.16490. It's a neat performant + data-efficient way to estimate GAMs based on complex CV splits (like spatial/temporal/phylo ones).
See ?NCV in latest {mgcv} for examples (https://cran.r-universe.dev/mgcv/doc/manual.html#NCV)
I might write a helper to convert {rsample}/{spatialsample} objects into mgcv's funny CV indexing structure.
#rstats #ml #tidymodels #mgcvchat @MikeMahoney218 @gavinsimpson @ericJpedersen @millerdl
tidymodels has long supported parallelizing model fits across CPU cores. A couple of the modeling engines that #rstats #tidymodels supports for gradient boosting—#XGBoost and #LightGBM—have their own tools to parallelize model fits. A new blog post explores whether tidymodels users should use tidymodels' implementation, the engines', or both.
It's a good weekend to learn survival analysis with tidymodels!
The tidymodels team wrote up a few case studies for you:
• Using survival analysis to see how long it takes the Department of Buildings in NYC to disposition complaints: https://www.tidymodels.org/learn/statistics/survival-case-study/
• Computing time-dependent measures of performance: https://www.tidymodels.org/learn/statistics/survival-metrics/
Read the announcement on survival analysis in tidymodels: https://www.tidyverse.org/blog/2024/04/tidymodels-survival-analysis/
Happy learning! #RStats #tidymodels
Do any statistical/ML software tools explicitly incorporate reusable holdout, where one uses thresholding, noise, or bootstrapping in holdout validation to prevent garden-of-forking-paths or overfitting issues?
I feel this paper describing the method made a splash when it came out in 2015 but I haven't seen much in implementation, at least in the R ecosystem: https://doi.org/10.48550/arXiv.1411.2664
Seems like something the #tidymodels team might think about? @topepo @juliasilge #rstats
We are working on postprocessing tasks for tidymodels. This is new methodological ground; we’d like feedback on our ideas. We have a placeholder name for a data set and could use suggestions/feedback there (we like ).
#rstats #MachineLearning #tidymodels
https://blog.aml4td.org/posts/data-usage-for-postprocessors/
Let's take a moment to relive the moments from our recent in-person gathering through these snapshots!
We had the pleasure of hosting María Paula Caldas, Data Scientist at OECD, and Julie Aubert, INRAE Research Engineer, who respectively delivered #inspiring talks on the development of #packages and statistical models using {#Tidymodels} in #R.
You can find the replay here: https://youtu.be/wEVKoPhB25g
It might be neat if #rstats #tidymodels variable selectors had an “except” argument, as in
all_predictors(except = foo)
Or
all_predictors(except = starts_with(“bar”))
In #Tidymodels #R is there a best practice of when to use fit/predict versus fit/augment? It seems like most of the examples that I've seen in the past, including on the tidymodels web page, use fit/predict, but I'm suddenly seeing a lot more fit/augment.
Many hospital systems use machine learning models to help allot limited care resources. A new article on the #rstats #tidymodels website explores claims that these models may be discriminatory:
Opportunity Scholars at posit::conf(2024). The application deadline is approaching fast; March 22nd. If you're a strong candidate or know someone who is, please act quickly.
Opportunity Scholarships receive free tickets, a workshop, support for travel and accommodation, plus lots of swag.
Learn more and apply now,
https://posit.co/blog/posit-conf-2024-announcement/
Q: What's a good prediction performance metric to for a binary classification model where I expect all predictions to be well below 0.5? (<<.001)
We have calibration analysis (expected vs number of positives and CI coverage for various probability bins, on holdout data), but I'm not sure how to summarize this or something else to get a single metric to track model improvement.
Maybe some scoring rule designed for sparse count data?