At some point in my career I’ve actually designed mission critical high performance distributed server systems for a living, so I’m well aware of that.
You can still pack thousands of users per server and have very low latency as long as you use the right architecture for it (it’s mainly done with in-memory caching and load balancing) when you’re accessing gigantic datasets which far exceed the data space of a game where the actual shared data space is miniscule since all clients share a local copy of most of the dataspace - i.e. the game level they’re playing in - and even with the most insane anti-cheat logic that checks every piece of data coming in from the user side against a server-side copy of the “game level data space” it’s still but a fraction of the shared data space in equivalent situations in the corporate world, plus it tends to be easilly partitionable data (i.e. even in MMORG with a single fully open massive playing space, players only affect limited areas of the entire game space so you don’t really need to check the actions of a player against the data of all other players).
Also keep in mind that all the static (never changing or slow changing stuff) like achievements or immutable level configuration can still be served with “normal” latencies.
Further the kind LVL1 ISP that provides network access for companies like Sony servicing millions of users already has more than good enough latency in their normal service and hence Sony needs not pay extra for “low latency”.
Anyways, you do make a good and valid point, it’s just that IMHO that’s the kind of thing that pushes the running costs per-player-month from one dollar cents or less to, at most (and this is likely quite a large overestimation), a dollar per-player-month unless they only have tens of players per-server (which would be insane and they should fire their systems designers if that’s the case).
One of the first things they teach you in Experimental Physics is that you can’t derive a curve from just 2 data points.
You can just as easilly fit an exponential growth curve to 2 points like that one 20% above the other, as you can a a sinusoidal curve, a linear one, an inverse square curve (that actually grows to a peak and then eventually goes down again) and any of the many curves were growth has ever diminishing returns and can’t go beyond a certain point (literally “with a limit”)
I think the point that many are making is that LLM growth in precision is the latter kind of curve: growing but ever slower and tending to a limit which is much less than 100%. It might even be like more like the inverse square one (in that it might actually go down) if the output of LLM models ends up poluting the training sets of the models, which is a real risk.
You showing that there was some growth between two versions of GPT (so, 2 data points, a before and an after) doesn’t disprove this hypotesis. I doesn’t prove it either: as I said, 2 data points aren’t enough to derive a curve.
If you do look at the past growth of precision for LLMs, whilst improvement is still happening, the rate of improvement has been going down, which does support the idea that there is a limit to how good they can get.