Why Nostr? What is Njump?
2024-07-30 21:10:00

Dustin on Nostr: If you have tried using LLMs to generate and execute code, you might appreciate this ...

If you have tried using LLMs to generate and execute code, you might appreciate this workshop paper we presented at @ ICML workshop on LLMs and Cognition. The primary contributions are 1) a Case-Based Reasoning approach to reducing LLM failures via dynamic, few shot prompting and 2) seven failure types that can cause generated code to fail. These failure types are more detailed than most benchmarks that evaluate LLM code generation; and since we didn't have an automated way to check for all of them, we performed the evaluation by hand 😅

Paper: https://openreview.net/pdf/f2d10bfca1b7d9f6f0a87144fee8e775cba6701a.pdf

Author Public Key
npub1mgvwnpsqgrem7jfcwm7pdvdfz2h95mm04r23t8pau2uzxwsdnpgs0gpdjc