Maybe AGI will be just one big NN architecture in the end…
https://blog.normalcomputing.ai/posts/2023-09-12-supersizing-transformers/supersizing-transformers.html
I’m still curious what the limitations of the memory cache is here and whether you can keep adding to it without having to retrain it if the cache changes.
