The author's account of this story in Sections 1 and 7 of make for fascinating ...

Why Nostr? What is Njump? Join Nostr

npub1gv…2sduy

2025-10-22 05:40:25 UTC

in reply to nevent1q…c8tq

The author's account of this story in Sections 1 and 7 of https://borisalexeev.com/pdf/erdos707.pdf make for fascinating reading. Some takeaways from this story:

1. Both human and AI-powered literature reviews can still fail to turn up relevant hits. There is still room for improvement in this area.

2. The Lean formalization allowed for "instant refereeing": the Lean community at https://leanprover.zulipchat.com/#narrow/channel/113488-general/topic/Lean.20in.20the.20wild/with/546368325 were able to verify the proof within thirty minutes of the announcement.

3. There appears to be one very specific use case in which "vibe coding" can actually be used responsibly and effectively: generating a formal proof artefact for a statement that has already been formalized by human experts. But even then, there is the potential for human-generated error in the statement formalization.

4. This also appears to be one of the very few use cases where LLM output can be used responsibly in a research paper. Importantly, no LLM-generated output was directly placed into the main body of the text (other than when quoting an excerpt from the LLM-generated Lean code for illustrative purposes); instead, such output was only used in completely verifiable contexts (in this case, in generating code that can be type checked by Lean).

Given recent experience, I can imagine that there will be some breathless reports that "LLMs actually solved an Erdos problem for real this time!". The truth however is extremely nuanced, and really requires a detailed study of the situation before jumping to any conclusions.

Author Public Key

npub1gvea966levf836g93xsdnkde8mmz58qyznf9c82jg02hq35f46lsn2sduy

Seen on

wss://relay.momostr.pink

Show more details

Published at

2025-10-22 05:40:25 UTC

Kind type

1 Short Text Note

Event JSON

{ "id": "47c42e5244f880dead4453b20d9440dfd1fd5b696a4a9f825832116b2a532a49", "pubkey": "4333d2eb5fcb1278e90589a0d9d9b93ef62a1c0414d25c1d5243d5704689aebf", "created_at": 1761111625, "kind": 1, "tags": [ [ "proxy", "https://mathstodon.xyz/@tao/115416211466664814", "web" ], [ "e", "cceaeccb2b8c4370cee03c0342b465ae8089798ec693b6dd33432f617281005b", "", "reply", "4333d2eb5fcb1278e90589a0d9d9b93ef62a1c0414d25c1d5243d5704689aebf" ], [ "e", "16b353aed174b6efe42f3a3f7fef5004ecc63507fd2425abafb5645da0205667", "", "root", "4333d2eb5fcb1278e90589a0d9d9b93ef62a1c0414d25c1d5243d5704689aebf" ], [ "p", "4333d2eb5fcb1278e90589a0d9d9b93ef62a1c0414d25c1d5243d5704689aebf" ], [ "proxy", "https://mathstodon.xyz/users/tao/statuses/115416211466664814", "activitypub" ], [ "L", "pink.momostr" ], [ "l", "pink.momostr.activitypub:https://mathstodon.xyz/users/tao/statuses/115416211466664814", "pink.momostr" ], [ "-" ] ], "content": "The author's account of this story in Sections 1 and 7 of https://borisalexeev.com/pdf/erdos707.pdf make for fascinating reading. Some takeaways from this story:\n\n1. Both human and AI-powered literature reviews can still fail to turn up relevant hits. There is still room for improvement in this area.\n\n2. The Lean formalization allowed for \"instant refereeing\": the Lean community at https://leanprover.zulipchat.com/#narrow/channel/113488-general/topic/Lean.20in.20the.20wild/with/546368325 were able to verify the proof within thirty minutes of the announcement.\n\n3. There appears to be one very specific use case in which \"vibe coding\" can actually be used responsibly and effectively: generating a formal proof artefact for a statement that has already been formalized by human experts. But even then, there is the potential for human-generated error in the statement formalization.\n\n4. This also appears to be one of the very few use cases where LLM output can be used responsibly in a research paper. Importantly, no LLM-generated output was directly placed into the main body of the text (other than when quoting an excerpt from the LLM-generated Lean code for illustrative purposes); instead, such output was only used in completely verifiable contexts (in this case, in generating code that can be type checked by Lean).\n\nGiven recent experience, I can imagine that there will be some breathless reports that \"LLMs actually solved an Erdos problem for real this time!\". The truth however is extremely nuanced, and really requires a detailed study of the situation before jumping to any conclusions.", "sig": "fc6f963d0b49d618e849ce178c1d6246503522394962bc11f9e17b94bcefe8009c0720d9bc8be71060963f4bfd4ad1f4d545f38d2ae1e73199296a7184cf95c6" }