Wednesday, 2 July 2025

Extraordinary C@W blog stats: AI 'training' at work?

We had a short review of the increasingly 'cosmopolitan' nature of C@W readership a while back: I set a little quiz inviting guesses as to the 2024 breakdown of hits, to which the answers were, in descending order -  

  1. Hong Kong
  2. China
  3. USA
  4. Singapore
  5. UK

Well, guess what: since then, the readership stats have shot up, going stratospheric in the last month.  Here's the plot for the last 3 months:


And the countries?

  1. Brazil
  2. USA
  3. India
  4. Japan
  5. Bangladesh
  6. UK

I have an acquaintance who also runs a blog: he's seen something similar, though the numbers are not so extreme and Vietnam features at the top of his list.  The best explanation he can come up with is that the blogs are being used to train LLMs !

Any other suggestions?

Heaven help the "AI" that results from nearly 20 years of C@W.  I suppose we should be flattered ...

ND 

PS: in the circumstances, I thought about re-engaging with Google 'Adsense' to make a bob or two out of advertising to the increased readership.  But (a) the reader-experience isn't much improved by ads; and (b) the small print is so extensive and restrictive, I'll bet Google would rule that we've somehow been artificially boosting readership with bots, and that we wouldn't qualify.

Aren't you grateful?

6 comments:

dearieme said...

Should we all start using foul language to ensure that the bots have had a suitably liberal education?

Anonymous said...

Does LLM training make sense from those locations? I had assumed that most of the LLM training supercomputers are US based? If so then why would they scrape from the far east to then pipe it over to the US to schedule LLM training?
Al

Sobers said...

What on earth can a LLM learn from scraping the internet, including this site and its esteemed posters? How can it decide what information it gathers is true and whats not? If I write 'The sky is green' will that very mean that a LLM reading it will assign a very small possibility to the fact that the sky is indeed green? Given vast swathes of what is written on the internet is bollox on stilts, how can LLMs ever get a true picture of reality if so much of its input is nonsense?

Anonymous said...

Obviously China and HK are building a giant database of everyone in Western Europe and the USA, our comments and NDs writing will add to the information from our doorbells, mobile phones and Huawei routers. If their social credit database can encompass a billion Chinese, that's a similar size task.

Nick Drew said...

Anon@11:23 - that was my general assumption, too. But sometimes the www / cloud works in mysterious ways. For example, my understanding is that movies that are being streamed are "located" wherever in cyberspace it is optimal for the streaming that's taking place at that point in time. So, if (say) 9pm is peak streaming time for a particular movie, it'll be "located" optimally for 9pm streaming in Japan when it IS 9pm in Japan, and it'll have been "moved westwards" for 9pm streaming in Europe when it IS 9pm in Europe

(I may not have explained that very well)

I'll make one other empirical comment: the hour-by-hours stats have been very flat - I'll publish another post with a graph. Yet the "locations" of the "readership" have been all over the place. This feels like a sophisticated operation.

formertory said...

Isn't that the problem? Dwell fleetingly on the thought of an LLM somewhere digesting the possibility that Mad Miliband is in fact the saviour of the human race, and be very afraid as it regurgitates it as (a version of) Truth in a future world. Orwellian to the max.