eupolicy.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
This Mastodon server is a friendly and respectful discussion space for people working in areas related to EU policy. When you request to create an account, please tell us something about you.

Server stats:

225
active users

#index

0 posts0 participants0 posts today

#WritersCoffeeClub 07/01—What's your greatest weakness as a writer?

The index. Adding #index entries to a text is _so_ much work, especially since I need different approaches for the PDF and EPUB versions of the manuscripts. And I constantly wonder: "Is this significant enough to be included in the index?" and tend to err on the side of inclusion.

Seriously, if money was no object and I could outsource any aspect of my work, this would be it.

I wonder how the "professionals" do this? Is there a formal profession of "index person"? What are the training and the workflow process like?

Inter-Basin Groundwater Flow In West-Central Florida
--
doi.org/10.1016/j.jhydrol.2025 <-- shared paper
--
fl.water.usgs.gov/floridan/int <-- shared USGS overview page, Floridan Aquifer System Groundwater Availability
--
“HIGHLIGHTS
• The regional pattern of IGF in west-central Florida is dominated by the characteristics of the Upper Floridan Aquifer.
• IGF plays a major role in the available water for partitioning and watershed aridity index.
• Groundwater pumping affects IGF, and the change in IGF counteracts the human impact on available water..."
#GIS #spatial #mapping #groundwater #spatialanalysis #spatiotemporal #Florida #USA #waterresources #waterquality #watersecurity #regional #model #modeling #HSPF #MODFLOW #geology #sedimentology #hydrogeology #aquifer #runoff #discharge #watershed #precipitation #climate #aridity #index #pumping #humanimpacts #anthropogenic #watersupply

Good day folks! I'm working on creating a moderated #Peertube #Public #Index. Does anyone have any recommendations on good servers that could be added to the Public Index?

What I'm looking for, no porn, no hate speech, no copyrighted material.

If you're looking to get added to our public index, please message me here or email me btfree@btfree.org with title "PTIndex".

I'm going to work on creating a video as to how YOU can add BT Free's public index to YOUR Peertube as well as how to run your own #Public #Index. As well as how to install and configure your own Peertube to run.

I need YOUR help to find QUALITY instances though that we'd like to be a part of!

Okay, Back of the napkin math:
- There are probably 100 million sites and 1.5 billion pages worth indexing in a #search engine
- It takes about 1TB to #index 30 million pages.
- We only care about text on a page.

I define a page as worth indexing if:
- It is not a FAANG site
- It has at least one referrer (no DD Web)
- It's active

So, this means we need 40TB of fast data to make a good index for the internet. That's not "runs locally" sized, but it is nonprofit sized.

My size assumptions are basically as follows:
- #URL
- #TFIDF information
- Text #Embeddings
- Snippet

We can store an index for 30kb. So, for 40TB we can store an full internet index. That's about $500 in storage.

Access time becomes a problem. TFIDF for the whole internet can easily fit in ram. Even with #quantized embeddings, you can only fit 2 million per GB in ram.

Assuming you had enough RAM it could be fast: TF-IDF to get 100 million candidated, #FAISS to sort those, load snippets dynamically, potentially modify rank by referers etc.

6 128 MG #Framework #desktops each with 5tb HDs (plus one raspberry pi to sort the final condidates from the six machines) is enough to replace #Google. That's about $15k.

In two to three years this will be doable on a single machine for around $3k.

By the end of the decade it should be able to be run as an app on a powerful desktop

Three years after that it can run on a #laptop.

Three years after that it can run on a #cellphone.

By #2040 it's a background process on your cellphone.

#index The weekly db dump failed on Saturday because of a runaway script eating up all the disk space on the utility server that does the dump. I've fixed that but I need to wait on the full db export (in progress now) to finish before re-dumping the weekly. Stay tuned.