Join Nostr
2026-04-29 17:41:02 UTC
in reply to

Red Rozenglass on Nostr: Thank you so much for this article, it was a really fun read, and I learned about ...

Thank you so much for this article, it was a really fun read, and I learned about csplit which is much nicer than the awk state machines I usually write to handle multi-line pattern-separated records, I wish csplit patterns can be multi-line though; it doesn't seem possible. I also use the paste command (e.g. paste - - -) to work around this if the multi-line record has a set number of lines. I found ptx fascinating, and I'm having lots of fun using it over all my writings and journals. Anyway, what follows are some random thoughts, comments, and ideas I had while reading the article :)

One aspect I would suggest looking into is the atomicity of mv and how useful it can be for writing reliable scripts or for deploying new versions of running services without down-time. Especially, `mv -T --exchange deployment-standby deployment-current' which would swap the two directories atomically. I think I will also be using it soon as part of a script to "garbage collect" millions of files, roughly something like:

1. mkdir store2

2. for each file we want to remain, we hardlink with ln store1/filename store2/

3. when all the files we want are hard-linked, mv -T --exchange store2 store1

4. rm -r store2

This way, we "garbage collect" all the files that we don't have references to in our list, atomically, deleting them without the store itself being "offline" for reads at any point, and without using any extra disk space for copies of the files. If the garbage collection process is interrupted at any point, store1 remains as is without any change.

Additionally, nproc --ignore=N is useful for Make build scripts, passing it to make -j, especially when using those make files on many computers with different numbers of cores, make -j $(nproc --ignore=2) for example, would guarantee that a build would use the maximum number of cores on that machine, leaving at least 2 free cores as to not overload the machine and deteriorate its service. Of course, nproc at least returns 1 if the machine has only 1 or 2 cores :)

printf is a lot more flexible than people think. I use it for making separators for example, combined with tr, like this:

printf '+%78s+\n' | tr ' ' -

Or, to pad and surround text for example, with the help of xargs, like this:

cat my.txt | xargs -L1 -d '\n' printf " | %-52s|\n"

Where my.txt is a pre-formatted text, hard-wrapped at 50 columns (I usually use par for this, instead of fmt, as it allows me to justify text, and is smarter about indention and line prefixes of various kinds, but none of those CLI formatting tools seems to handle multi-byte UTF-8 text).

seq is nice for doing loops with a specific number of iterations in a POSIX-y way. for i in $(seq 10); to loop 10 times. The form {1..10} is not POSIX sh compliant.

The shuf command is indeed fun, try the following for some random words; helps with brainstorming names and cool nonsense phrases sometimes, with:

shuf /usr/share/dict/words | head

I came up with those just now:

tight ambiguity
unsound lavender
despairing ravages
median ocean
delegate troll

The sleep command is nice for reminders, if you have a notification daemon. On Slackware I put /usr/lib64/xfce4/notifyd/xfce4-notifyd& in my .xinitrc to use it with my tiling window manager:

sleep 5m && notify-send -u critical 'The Tea' "It's boiling!"

sort is nice with du -sh, when cleaning up some disk
space, I use this shell function:

cdd () { cd "$1" && du -sh ./* | sort -h; }

Example:

cdd ~/Downloads/

120M ./palemoon
147M ./renpy-8.5.2-sdk.tar.bz2
173M ./slack-wallpapers-1.0.tar.gz
181M ./slack-wallpapers-deviantart-1.0.tar.gz

On a network with good bandwidth but very high packet loss, I used the split combined with lftp parallel downloads over sftp, to speed up downloads of big files greatly, as each file downloaded resets the TCP slow-start algorithm, thus starting with a full 15K of data. So, instead of the download speed grinding to a halt due to packet loss, the link was almost saturated instead. All it took was to split the big file into 14K chunks, and a new TCP connection for each of the chunks :)

I think your example of stdbuf doesn't work because you used echo, which ends with a newline, and newlines flush the output, processes exiting and closing the their pipe / their redirection also flushes, if I recall correctly. So, you need a process that doesn't exit, and doesn't newline, but keeps writing to the file. Apache logs may do that, but I noticed it with other things too, like netstat in continuous mode. In such cases, you may find the last line in the log partially missing, because some chunks of it haven't flushed yet. I've seen cases where newlines are not even used at all in the stream, so buffering was really annoying when you're trying to tail -f for debugging.

Anyway, thanks again for this interesting article, a lot of effort went into this, your site is now on my RSS feed :)