eupolicy.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
This Mastodon server is a friendly and respectful discussion space for people working in areas related to EU policy. When you request to create an account, please tell us something about you.

Server stats:

217
active users

#duckdb

4 posts4 participants0 posts today

Drop #669 (2025-06-23): Monday Morning (Barely) Grab Bag

The Monday Drop discusses 3 main topics: a Rube Goldberg-inspired data pipeline to archive X posts into #DuckDB, the #RStats package {fplot} for automating distribution plot creation in R, and an article from CSS-Tricks on advanced CSS color techniques, detailing color spaces and models for modern web development.

dailydrop.hrbrmstr.dev/2025/06

hrbrmstr's Daily Drop · Drop #669 (2025-06-23): Monday Morning (Barely) Grab Bag
More from hrbrmstr's Daily Drop

Drop #669 (2025-06-23): Monday Morning (Barely) Grab Bag

Rube Goldberg X-traction Pipeline; fplot; Color Everything in CSS

Something for (hopefully) everyone as we start off this brutally hot (in many parts of the northern hemisphere) terminal week of June.

Stay safe out there.

Type your email…

Subscribe

TL;DR

(This is an LLM/GPT-generated summary of today’s Drop using Ollama + Qwen 3 and a custom prompt.)

  • A Rube Goldberg-inspired data pipeline is created to archive X posts into a DuckDB database, using XCancel, Inoreader, and a DuckDB script for automation (https://en.wikipedia.org/wiki/Rube_Goldberg)
  • The {fplot} R package automates the creation of distribution plots by detecting data types and selecting appropriate visualizations, with options for global relabeling of variables (https://lrberge.github.io/fplot/)
  • The CSS-Tricks article “Color Everything in CSS” provides an in-depth look at color spaces, models, and gamuts in modern web development, offering a comprehensive guide to advanced CSS color techniques (https://css-tricks.com/color-everything-in-css/)

Rube Goldberg X-traction Pipeline

I don’t see many mentions of Rube Goldberg in pop-culture settings anymore, which is a shame, since I used to enjoy poring over them in my younger days. Perhaps the reason for the lack of mentions is that many data pipelines have much in common with those complex, over-“engineerd” contraptions.

Case in point for a recent “need” of mine: I wanted a way to store posts from users on X into a DuckDB database, for archival and research purposes. I already use XCancel’s ability to generate an RSS feed for an account/search, which I yank into Inoreader for the archival part (the section header shows the XCancel-generated RSS feed for the White House’s other, even more MAGA, propaganda account).

Inoreader’s API is…not great. It can most certainly be machinated (I have an R package with the function I need in it), but I really wanted a solution that let me just use DuckDB for all the work.

Then, I rememberd, if you put feeds in Inoreader folders, you can turn that folder into a JSON feed that gets updates every ~30 minutes or so. This one:

is for a series of feeds related to what’s going on in the Middle East right now.

With that JSON URL in hand, it’s as basic as:

#!/usr/bin/env bash# for cache bustingepoch=$(date +%s)duckdb articles.ddb <<EOQLOAD json;INSTALL shellfs FROM community;LOAD shellfs;CREATE TABLE IF NOT EXISTS broadcast_feed_items (  url VARCHAR PRIMARY KEY,  title VARCHAR,  content_html VARCHAR,  date_published VARCHAR,  tags VARCHAR[],  authors JSON);-- this is where the update magic happensINSERT OR IGNORE INTO broadcast_feed_itemsFROM read_json('curl -s https://www.inoreader.com/stream/user/##########/tag/broadcast/view/json?since=${epoch} | jq .items[] |')SELECT url, title, content_html, date_published, tags, authors;-- Thinned out JSON content for viewing appCOPY (  FROM     broadcast_feed_items   SELECT     content_html, -- "title" is useless for the most part since this is an X post    date_published AS "timestamp",     regexp_replace(authors.name, '"', '', 'g') AS handle) TO 'posts.json' (FORMAT JSON, ARRAY );EOQ

There are other ways to unnest the data than using jq and the shellfs DuckDB extension, but the more RG the better (for this post)!

So the final path is:

X -> XCancel -> XCancel RSS -> Inoreader -> Inoreader JSON -> jq -> DuckDB

with virtually no code (save for the snippet, above).

I’ve got this running as a systemd timer/service running every 30 minutes.

Later this week (when I’m done hand-coding it—yes, sans-Claude), I’ll have a Lit-based vanilla HTML/CS/JS viewer app in one of the Drops.

fplot

(This is an #RStats section, so def move along if that is not your cuppa.)

My daily git-stalking led me to finding this gem of an R package.

{fplot} (GH) is designed to automate and simplify the visualization of data distributions (something I have to do every. single. day.). Its core mission is to let folks quickly generate meaningful and aesthetically pleasing distribution plots, regardless of the underlying data type (it supports continuous, categorical, or skewed), by making spiffy choices about the appropriate graphical representation for each variable.

Functions in the package detect the nature of your data (e.g., categorical vs. continuous, skewed or not) and automatically selects the most suitable plot type. For example, it will not use the same visualization for a categorical variable as it would for a continuous one, and it adapts further if the data is heavily skewed.

Ergonomics are pretty dope, since you only need a single line of code to generate a plot, with the package handling the details of layout and type selection. This is particularly useful for exploratory data analysis or for folks who want quick, visually appealing graphics without extensive customization.

Tools are provided to globally relabel variable names for all plots. This is managed via the setFplot_dict() function, which lets us map cryptic/gosh awful or technical variable names to more readable labels that will appear in all subsequent plots.

Example usage:

setFplot_dict(c(  Origin = "Exporting Country",  Destination = "Importing Country",  Euros = "Exports Value in €",  jnl_top_25p = "Pub. in Top 25% journal",  jnl_top_5p = "Publications in Top 5% journal",  journal = "Journal",  institution = "U.S. Institution",  Petal.Length = "Petal Length"))

The typical workflow with fplot is straightforward:

  1. Load your data.
  2. Optionally set global variable labels using setFplot_dict().
  3. Call the fplot function on your variable(s) of interest.
  4. The package automatically determines the best plot type and layout for your data.

The same function call can yield different types of plots depending on the data provided, streamlining the process of distributional analysis and visualization.

A gallery of examples and a more detailed walk-through are available on the package’s website.

Color Everything in CSS

The CSS-Tricks article “Color Everything in CSS” offers a comprehensive, up-to-date exploration of how color works in CSS, moving beyond just the basics of color and background-color to cover the deeper technical landscape of color on the web. The article introduces essential concepts like color spaces, color models, and color gamuts, which are foundational for understanding how colors are represented, manipulated, and rendered in browsers today.

We’ve covered many of these individual topics before, but this is a well-crafted, all-in-one that does such a good job, I do not wish to steal any thunder from it. Head on over for to level up your CSS skills.

FIN

Remember, you can follow and interact with the full text of The Daily Drop’s free posts on:

  • 🐘 Mastodon via @dailydrop.hrbrmstr.dev@dailydrop.hrbrmstr.dev
  • 🦋 Bluesky via https://bsky.app/profile/dailydrop.hrbrmstr.dev.web.brid.gy

☮️

Bonus Drop #86 (2025-06-15): I Think You May Be Projecting

The Weekend Bonus Drop covers two data engineering projects utilizing #DuckDB. The first project improves rock-climbing trip planning by integrating climbing routes with precise weather forecasts. The second project organizes Garmin activity data into a clean database. Both exemplify real-world engineering challenges for personal projects, emphasizing practical problem-solving and hands-on learning in data…

dailydrop.hrbrmstr.dev/2025/06

hrbrmstr's Daily Drop · Bonus Drop #86 (2025-06-15): I Think You May Be Projecting
More from hrbrmstr's Daily Drop

oops #til to use #duckdb to query a CSV, generate date ranges, use windowing functions to backfill data and pivot functions to make data that you can easily graph in a spreadsheet.

based upon;
- average solar radiation distribution over the year for my area
- My actual kwh production and usage for the last month (which #homeassistant gives as data change events, not hourly or daily reporting)
- The KWHs I've spent on AC that I expect to increase over the summer

I'm operating at 85% capacity 🙌

Continued thread

#macOS26

- I detest the new Preview.app icon & won't lower myself to show the new horrible Finder icon.
- #RStats 4.5.0 and RStudio Version 2025.08.0-daily+176 (2025.08.0-daily+176) both work normal (as expected) (as does #DuckDB 1.3.0)
- the "clear" icon styles aren't horrible

> Anyone who has worked for more than 5 minutes in an enterprise more than 30 miles outside San Fransisco know that the vast majority of information in the enterprise is cataloged and transacted via Excel spreadsheets. And if you're lucky, these spreadsheets are accessible to more than one person at a time via platforms like SharePoint.

This is large.

Or shall I say x-large?

It basically excels and will essentially solve all German Enterprise IT issues:

github.com/gregwdata/ducklakexl

GitHubGitHub - gregwdata/ducklakexl: Use Excel as a metadata catalog for DuckLake 🤪Use Excel as a metadata catalog for DuckLake 🤪. Contribute to gregwdata/ducklakexl development by creating an account on GitHub.

PSA for a potential audience of ... maybe a handful of people: something I posted earlier today in the #duckdb discord under 'show-and-tell': if you wished you could get new `duckdb` binaries automagically via `apt`, a simple repo shows one way -- and got me versions 1.3.0 and one or two of the 1.2.* ones. Text of post in alt-text.