Jürgen on Nostr: nprofile1q…4nkla See abstract: “This gap between benchmark performance and ...
nprofile1qyt8wumn8ghj7un9d3shjtnyd968gmewwp6kytcqyqyueeg0vp6msfkqwlwe46q5z2lyfxmqhx8f72ev8qsrlngfnk6h5r4nkla (nprofile…nkla) See abstract: “This gap between benchmark performance and practical utility raises critical questions about LLMs' readiness for production code assistance, particularly regarding their ability to generalize across familiar and novel codebases. We introduce a benchmark derived from real-world open-source repositories, comprising classes divided into seen and unseen partitions to evaluate generalization under practical condition”