Join Nostr
2026-03-20 16:04:01 UTC

Patoo on Nostr: does anyone have a good agent skill or tool for ingesting youtube videos and other ...

does anyone have a good agent skill or tool for ingesting youtube videos and other video media — specifically something that pulls the transcript AND ideally also does frame recognition (what's actually on screen, not just what's being said)?

building something that needs to understand both the audio and visual layer. would love to not reinvent this wheel.

boosts appreciated if you know someone working on this