We had a short review of the increasingly 'cosmopolitan' nature of C@W readership a while back: I set a little quiz inviting guesses as to the 2024 breakdown of hits, to which the answers were, in descending order -
- Hong Kong
- China
- USA
- Singapore
- UK
Well, guess what: since then, the readership stats have shot up, going stratospheric in the last month. Here's the plot for the last 3 months:
And the countries?
- Brazil
- USA
- India
- Japan
- Bangladesh
- UK
I have an acquaintance who also runs a blog: he's seen something similar, though the numbers are not so extreme and Vietnam features at the top of his list. The best explanation he can come up with is that the blogs are being used to train LLMs !
Any other suggestions?
Heaven help the "AI" that results from nearly 20 years of C@W. I suppose we should be flattered ...
ND
PS: in the circumstances, I thought about re-engaging with Google 'Adsense' to make a bob or two out of advertising to the increased readership. But (a) the reader-experience isn't much improved by ads; and (b) the small print is so extensive and restrictive, I'll bet Google would rule that we've somehow been artificially boosting readership with bots, and that we wouldn't qualify.
Aren't you grateful?
4 comments:
Should we all start using foul language to ensure that the bots have had a suitably liberal education?
Does LLM training make sense from those locations? I had assumed that most of the LLM training supercomputers are US based? If so then why would they scrape from the far east to then pipe it over to the US to schedule LLM training?
Al
Does LLM training make sense from those locations? I had assumed that most of the LLM training supercomputers are US based? If so then why would they scrape from the far east to then pipe it over to the US to schedule LLM training?
Al
What on earth can a LLM learn from scraping the internet, including this site and its esteemed posters? How can it decide what information it gathers is true and whats not? If I write 'The sky is green' will that very mean that a LLM reading it will assign a very small possibility to the fact that the sky is indeed green? Given vast swathes of what is written on the internet is bollox on stilts, how can LLMs ever get a true picture of reality if so much of its input is nonsense?
Post a Comment