The eupolicy.social admin @admin

1 post1 participant0 posts today

**Jakub Nowosad** @nowosad@fosstodon.org · 5d

Jakub Nowosad @nowosad@fosstodon.org

New blog post: Spatial Machine Learning with tidymodels

This post shows how to apply the tidymodels framework to spatial data workflows in R. Part 3 in a series about #sml.

https://geocompx.org/post/2025/sml-bp3/

#rstats #rspatial #tidymodels

**Victor Alexandre** @humeursdevictor@mastodon.social · Apr 14

Apr 14

Victor Alexandre @humeursdevictor@mastodon.social

Hello #rstats #tidymodels #yolo #machinelearning community.

I think I am about to start some image recognition with R, in order to find traffic-calming devices on fine satellite imagery. @cquest has been doing some of that with Python.
Has somebody been there and has some words of advice or tips ?
Thanks !

**Posit** @Posit@fosstodon.org · Jan 24

Jan 24

Posit @Posit@fosstodon.org

This summer, join the tidymodels team as an intern and help expand the possibilities of feature selection!

Over the years, our eight summer interns have added incredible contributions, including packages like agua, applicable, bundle, butcher, shinymodels, spatialsample, and stacks. Now, it’s your turn to shape the future of #tidymodels #RStats tools!

Learn more and apply: https://www.tidyverse.org/blog/2025/01/tidymodels-2025-internship/

**Posit** @Posit@fosstodon.org · Oct 30, 2024 *

Oct 30, 2024 *

Posit @Posit@fosstodon.org

Introducing support for postprocessing in tidymodels!

Postprocessors refine predictions outputted from machine learning models to improve predictive performance or better satisfy distributional limitations.

The tidymodels team has been working on a set of changes across many #tidymodels packages to introduce support for postprocessing. They would love to hear your thoughts on their progress so far!

Learn more in the blog post: https://www.tidyverse.org/blog/2024/10/postprocessing-preview/

#RStats

**Posit** @Posit@fosstodon.org · Jul 31, 2024

Jul 31, 2024

Posit @Posit@fosstodon.org

recipes 1.1.0 is on CRAN! recipes lets you create a pipeable sequence of feature engineering steps.

Improvements in column type checking, allowing more data types to be passed to recipes, use of long formulas, and better error for misspelled argument names.

Check out the blog post for more details (and a delicious treat at the end ): https://www.tidyverse.org/blog/2024/07/recipes-1-1-0/

#RStats #tidymodels #machinelearning

**Nicola Rennie** @nrennie@fosstodon.org · Jul 1, 2024

Jul 1, 2024

Nicola Rennie @nrennie@fosstodon.org

I'll be running an "Introduction to machine learning with {tidymodels}" workshop at RSS Conference in September!

Session details:
Wednesday 4 September, 2024
11:30am - 12:50pm
Brighton, UK

More info: https://virtual.oxfordabstracts.com/#/event/6693/program?session=92723&s=2600

#RSS2024Conf #RStats #tidymodels

**Posit** @Posit@fosstodon.org · Jun 25, 2024

Jun 25, 2024

Posit @Posit@fosstodon.org

We have five posit::conf(2024) workshops for #RStats
and #Python
modeling and ML enthusiasts!

• Causal Inference in R, led by @malcolmbarrett and @travisgerke
• Introduction to machine learning in Python with Scikit-learn, led by @TiffanyTimbers and Trevor Campbell
• Intro to MLOps with vetiver, led by @isabelizimm
• Introduction to tidymodels, led by @hfrick and @simonpcouch
• Advanced Tidymodels, led by @topepo

https://reg.conf.posit.co/flow/posit/positconf24/publiccatalog/page/publiccatalog?search=&tab.day=20240812&search.sessiontype=1675316728702001wr6r

reg.conf.posit.coRegistrationWelcome to your event

#tidymodels #machinelearning #ml

**Noam Ross** @noamross@ecoevo.social · May 20, 2024 *

May 20, 2024 *

Noam Ross @noamross@ecoevo.social

Preprint from Simon Wood on the new cross-validation smoothness estimation in #mgcv: https://arxiv.org/abs/2404.16490. It's a neat performant + data-efficient way to estimate GAMs based on complex CV splits (like spatial/temporal/phylo ones).

See ?NCV in latest {mgcv} for examples (https://cran.r-universe.dev/mgcv/doc/manual.html#NCV)

I might write a helper to convert {rsample}/{spatialsample} objects into mgcv's funny CV indexing structure.

#rstats #ml #tidymodels #mgcvchat @MikeMahoney218 @gavinsimpson @ericJpedersen @millerdl

**Simon Couch** @simonpcouch@fosstodon.org · May 13, 2024

May 13, 2024

Simon Couch @simonpcouch@fosstodon.org

tidymodels has long supported parallelizing model fits across CPU cores. A couple of the modeling engines that #rstats #tidymodels supports for gradient boosting—#XGBoost and #LightGBM—have their own tools to parallelize model fits. A new blog post explores whether tidymodels users should use tidymodels' implementation, the engines', or both.

https://www.simonpcouch.com/blog/2024-05-13-parallel/

www.simonpcouch.comHow to best parallelize boosted tree model fits with tidymodels | Simon P. Couch

**Posit** @Posit@fosstodon.org · May 10, 2024

May 10, 2024

Posit @Posit@fosstodon.org

It's a good weekend to learn survival analysis with tidymodels!

The tidymodels team wrote up a few case studies for you:

• Using survival analysis to see how long it takes the Department of Buildings in NYC to disposition complaints: https://www.tidymodels.org/learn/statistics/survival-case-study/
• Computing time-dependent measures of performance: https://www.tidymodels.org/learn/statistics/survival-metrics/

Read the announcement on survival analysis in tidymodels: https://www.tidyverse.org/blog/2024/04/tidymodels-survival-analysis/

Happy learning! #RStats #tidymodels

www.tidymodels.orgtidymodels - How long until building complaints are dispositioned? A survival analysis case studyLearn how to use tidymodels for survival analysis.

**Noam Ross** @noamross@ecoevo.social · May 8, 2024 *

May 8, 2024 *

Noam Ross @noamross@ecoevo.social

Do any statistical/ML software tools explicitly incorporate reusable holdout, where one uses thresholding, noise, or bootstrapping in holdout validation to prevent garden-of-forking-paths or overfitting issues?

I feel this paper describing the method made a splash when it came out in 2015 but I haven't seen much in implementation, at least in the R ecosystem: https://doi.org/10.48550/arXiv.1411.2664

Seems like something the #tidymodels team might think about? @topepo @juliasilge #rstats

arXiv.orgPreserving Statistical Validity in Adaptive Data AnalysisA great deal of effort has been devoted to reducing the risk of spurious scientific discoveries, from the use of sophisticated validation techniques, to deep statistical methods for controlling the false discovery rate in multiple hypothesis testing. However, there is a fundamental disconnect between the theoretical results and the practice of data analysis: the theory of statistical inference assumes a fixed collection of hypotheses to be tested, or learning algorithms to be applied, selected non-adaptively before the data are gathered, whereas in practice data is shared and reused with hypotheses and new analyses being generated on the basis of data exploration and the outcomes of previous analyses. In this work we initiate a principled study of how to guarantee the validity of statistical inference in adaptive data analysis. As an instance of this problem, we propose and investigate the question of estimating the expectations of $m$ adaptively chosen functions on an unknown distribution given $n$ random samples. We show that, surprisingly, there is a way to estimate an exponential in $n$ number of expectations accurately even if the functions are chosen adaptively. This gives an exponential improvement over standard empirical estimators that are limited to a linear number of estimates. Our result follows from a general technique that counter-intuitively involves actively perturbing and coordinating the estimates, using techniques developed for privacy preservation. We give additional applications of this technique to our question.

**Max Kuhn** @topepo@fosstodon.org · May 7, 2024

May 7, 2024

Max Kuhn @topepo@fosstodon.org

We are working on postprocessing tasks for tidymodels. This is new methodological ground; we’d like feedback on our ideas. We have a placeholder name for a data set and could use suggestions/feedback there (we like ).

#rstats #MachineLearning #tidymodels

https://blog.aml4td.org/posts/data-usage-for-postprocessors/

blog.aml4td.orgApplied Predictive Modeling Blog - Data Usage with Postprocessing

**R-Ladies Paris** @rladies_paris@mastodon.social · May 1, 2024

May 1, 2024

R-Ladies Paris @rladies_paris@mastodon.social

Let's take a moment to relive the moments from our recent in-person gathering through these snapshots!

We had the pleasure of hosting María Paula Caldas, Data Scientist at OECD, and Julie Aubert, INRAE Research Engineer, who respectively delivered #inspiring talks on the development of #packages and statistical models using {#Tidymodels} in #R.

You can find the replay here:
https://youtu.be/wEVKoPhB25g

@chaimaboughanmi @mouna_belaid @RLadiesGlobal @Posit

#RStats #RLadies #RLadiesParis

**joranelias** @joranelias@mastodon.social · Apr 26, 2024

Apr 26, 2024

joranelias @joranelias@mastodon.social

It might be neat if #rstats #tidymodels variable selectors had an “except” argument, as in

all_predictors(except = foo)

all_predictors(except = starts_with(“bar”))

**Louise Sinks** @lsinks@fosstodon.org · Apr 17, 2024

Apr 17, 2024

Louise Sinks @lsinks@fosstodon.org

In #Tidymodels #R is there a best practice of when to use fit/predict versus fit/augment? It seems like most of the examples that I've seen in the past, including on the tidymodels web page, use fit/predict, but I'm suddenly seeing a lot more fit/augment.

**Simon Couch** @simonpcouch@fosstodon.org · Mar 29, 2024

Mar 29, 2024

Simon Couch @simonpcouch@fosstodon.org

Many hospital systems use machine learning models to help allot limited care resources. A new article on the #rstats #tidymodels website explores claims that these models may be discriminatory:

https://www.tidymodels.org/learn/work/fairness-readmission/

www.tidymodels.orgtidymodels - Fair prediction of hospital readmission: a machine learning fairness case studyWith information on a diabetes patient’s stay at a hospital like demographics, diagnostic results, payment, and medications, a hospital can train a machine learning model to predict reasonably well whether a patient will be readmitted within 30 days. What harms to patients could result from using such a model, though?

**Posit** @Posit@fosstodon.org · Mar 17, 2024

Mar 17, 2024

Posit @Posit@fosstodon.org

Opportunity Scholars at posit::conf(2024). The application deadline is approaching fast; March 22nd. If you're a strong candidate or know someone who is, please act quickly.

Opportunity Scholarships receive free tickets, a workshop, support for travel and accommodation, plus lots of swag.

Learn more and apply now,
https://posit.co/blog/posit-conf-2024-announcement/

#posit #python #pydata

**Noam Ross** @noamross@ecoevo.social · Mar 11, 2024 *

Mar 11, 2024 *

Noam Ross @noamross@ecoevo.social

Q: What's a good prediction performance metric to for a binary classification model where I expect all predictions to be well below 0.5? (<<.001)

We have calibration analysis (expected vs number of positives and CI coverage for various probability bins, on holdout data), but I'm not sure how to summarize this or something else to get a single metric to track model improvement.

Maybe some scoring rule designed for sparse count data?

@juliasilge @topepo?

#tidymodels #rstats #prediction

Recent searches

Search options

Administered by:

Server stats:

#tidymodels