Okay, but we can also #federate this now with the #fediverse. Like, #ActivityPub can handle search queries just fine.
So, just running on microcomputers, everyone can put on their own index whatever they want.
A person can _easily_ index 50,000 pages on a rapsberry pi.
A #FediSearch can broadcast any query to known peers. Each peer returns top-k results. The originating node can then aggregate and rank.
So @alice queries her FediSearch, it searches its own index and queries subscribed peers, those peers do the same thing. Nodes can choose who they trust, cache, etc.
The number of indexes pages will be something along the lines of `pages_per_nod * log(number_nodes)`. So a thousand nodes may only cover a million pages, but if the trust network is good, those are probably the most important million pages.
Also, I would venture that you'd have some nodes specializing in having a lot of pages: tens of millions, others just for stuff they like, others specifically for non-commercial interests. Selecting who you federate your search with really affects the ranking.