Most Useless use of grep?
The Useless Use of Cat Award, has been around for decades and is given out for shell constructs that invoke cat
unnecessarily, as in:
cat foo.txt | wc -l
Where something like this:
< foo.txt wc -l
Gets the same job done with one less fork and exec, and slightly less typing. There are plenty of arguments people give as to why the useless use of cat
here isn’t actually useless. The most compelling of these to me, is that in the latter construct, accidently typing a greater-than instead of the intended less-than will result in overwriting and clobbering the file you are trying to read from. Oops.
However you often see the useless cat
even when the alternative does not require a file redirection, as in this one I’ve caught myself using before:
cat foo.txt | grep "foo"
Which could be better written as simply:
grep "foo" foo.txt
One reason people (including myself) sometimes use this construct is that it mirrors the thought process that goes into constructing the sequence of pipes: “I want the data to flow from here, then to here, then to here”. You could first write just the cat
command to see all the data, then add more to the pipe sequence to further modify the data, repeat until you get what you want. If you have a command sequence and you want to send a file through it, just throw a cat in front, just like if you have a command sequence but only want to output a subset of the lines currently being outputting, just throw a grep
on the end.
By this same logic, when I have a command line that I’ve been working on that returns the desired results, one per line, but what I really want is just the count of how many results there are, I’ll just append a quick |wc -l
to the end. Of course if the last command in the command line was a grep, it would be better to just add a -c
to the grep invocation for the same effect, but the agglomerative nature writing a command line sometimes pulls to the less efficient solution which just involves taking a working pipeline and adding one more step, rather than thinking about how to change an existing step to get the same effect.
Writing a command that returns more results than desired and then later paring them down with grep
is a pretty common event. Sometimes this is necessary, but plenty of times one of the previous commands in the pipeline could have easily preformed the filtering itself
I’ve many times ended up quickly writing a command like
find . | grep foo
Instead of the probably much more efficient:
find . -name "foo"
But the most useless use of grep, is in pipe lines where one command is intentionally configured to include certain information only to have that same information removed further down the pipe line. By default, the command ps
only shows processes for the current user. To view all processes for all users, and tag each with the username, you can useps aux
. I’ve caught myself several times writing this construct:
ps aux | grep <my user name>
Which I think must be the most useless use of grep
ever conceived.
Two Strange Bugs
Someone recently posted a link on an online forum to an article on the GCC Wiki entitled Reasons you should NOT Use Inline Assembly. The gist of the article is given in the below quote:
Even ‘minor’ mistakes like failing to clobber memory or earlyclobber a register can result in problems that may not show up until well after the asm has been executed. And examining the asm output during a compile (-S) doesn’t guarantee that the inline asm was written correctly either, since small and (seemingly) unrelated changes in surrounding code might someday cause the compiler to resolve the constraints differently.
This reminded me of a time when I was bitten by this exact issue, and was thoroughly confused until I realized what happened. This was back in my college days in the mid 1990’s and I was working on some sort of game for DOS in Borland C++. I had a bit of inline assembly in there, probably for something like setting the graphics mode to Mode X, but that code had already been working for a while, so my mind didn’t immediately jump to there. As I recall, I was having some unexplained crashes in some seemingly benign C code, located well after any calls to inline assembly.
My normal debugging method at the time, of commenting out parts of the code until it stopped crashing, wasn’t converging on a solution, and after a while I discovered that putting a comment– any comment- at a certain location in the code would stop the crash and everything would work fine. My first instinct was to blame the compiler, so I could check what differences there were in the outputted assembly with and without a comment. I quickly found that the register being used to store a certain pointer value was different between the two cases, which eventually led me to the true source of the issue. Earlier on in the code, my inline assembly function was called, and I was clobbering a register I shouldn’t be. Borland C++ at documented rules about what registers a function could clobber and which it needed to save and restore, and I was failing to save one that I should have been. The compiler assumed that it knew what value was in a particular register at the time of the crash, only to be surprised that my assembly code had clobbered it.
Its an obvious bug in retrospect, but it through me for a loop at the time until I realized the problem as my mental model of how C++ worked didn’t allow for “Adding a comment can affect the program’s behavior.”
–
Another bug I encountered around the same time was a “random crash” when writing my operating system. In 1996, I decided I wanted to write my own 32-bit Operating System in i386 assembly. It was pretty minimal, but it could boot up, enable protected mode, and run simple programs and multitask between them. To make development easier, I made it possible to run as a DOS extender as well as a stand alone OS. In this mode, you could run the OS Loader as a DOS Executable, and it would allocate a block of XMS Memory and the OS would run entirely within that block of memory, leaving conventional memory untouched. Then when shutdown, it would return to real mode and return to DOS. This saved me a lot of rebooting time during development since the OS was never self hosting, and development was done with an MS-DOS editor and TASM.
One day while working on my kernel I started experiencing random freezes in my OS, requiring a reboot. This wasn’t anything odd, I’m sure I rebooted that poor 486 I was working on a million times while working on that project, but this crash didn’t seem to be triggered by any new code I’d added or any specific thing going on, it was just… random. I’d have my kernel booted up, sometimes for a few seconds, sometimes for several minutes, and it would just hard lock up. Worse yet, and possibly a clue, this would even happen sometimes after I’d returned to DOS– I’d boot up DOS, enter my OS, quit my OS, and then at some point later the computer would hang. I spent quite a while one evening trying to track this down. Eventually, by luck, I had rebooted my PC after a crash and was reviewing my code when the PC crashed yet again. But wait a minute, I’d never even run my OS since the last reboot. Turns out all this time the ‘bug’ I’d been unsuccessfully trying to track down was a CPU Fan that had stopped spinning. My computer was periodically overheating, but since I was working on an OS Kernel at the time, I was totally in the mindset of assuming it was something I was doing in code that was causing the problem.
These are two of my go-to examples I give when talking about “difficult” or “random” or “amusing” bugs with other programmers. In both cases, the bug was something I hadn’t originally considered and was happening in a way that made normal debugging difficult, triangulating to a certain line in the code just wasn’t working, because the bug wasn’t in the particular line of code that was crashing, it was somewhere else in the program entirely, or in the second case, not in the code at all.
Gemini, Chromium, and XDG, Oh My
I found Vermaden’s series of blog posts on setting up a FreeBSD Desktop were very useful when I was first working on getting FreeBSD installed and usable on my Thinkpad.
The most recent installment posted a few days ago about configuring and using XDG, Part 24 - Configuration - Universal File Opener reminded me of my own struggles with XDG and xdg-open a few months ago.
I don’t make a lot of use of this sort of feature on my desktop. My natural usage patterns tend to have me think in terms of “I’ll open wireshark (either from a command line or dmenu), and then load a PCAP”, as opposed to “I’ll click on this PCAP and open it”, so I’ve never worried much about how these linkages for what program is opened for each file type are made or how to configure them.
However a few months ago I ran into an annoying issue that I wanted to fix. I’ve been using Lagrange, a very nice looking graphical browser for Gemini. I found that if when browsing a Gemini site in Lagrange, I ran accross a hyperlink to an http site, I could click it in Gemini, and the link would open in Chromium. However if I ran accross a gemini:// link in Chromium, clicking on it would only give me an xdg-open error dialog and I would have to copy the link, open lagrange, and paste it in manually, and who has time for that?
Since Chromium was calling xdg-open to try to figure out what to do with the link, I figured a brief foray into the xdg-open documentation would tell me how to configure it to understand what to do with gemini:// links. However, I came up empty. There may have been a simple way to do this that I just missed, but everything I found with xdg-open was for deciding what program to make based on mime types and extensions. I couldn’t find a way to say: “match ^gemini://
” and open this program. So I decided to hack it.
My “solution”, kluge, workaround, whatever, was to make a shell script called ‘xdg-open’ and put it in my ~/bin folder, which appears early in my PATH. This way whenever Chromium wants to open a link, this script gets called and I can do whatever I want as far as pattern matching and spawning applications. Then if my script doesn’t want to do anything for a particular link, it can just call the real xdg-open for the normal behavior. Its currently a very short script that looks like this:
#!/bin/sh
case "$1" in
gemini:*)
exec lagrange "$@"
;;
*)
exec /usr/local/bin/xdg-open "$@"
;;
esac
Now I have a simple way to control what application chromium will spawn for a given link without needing to muck around with xdg-open configuration. I much prefer having a simple script where I can fully customize behavior for cases like this than have to fight against other software to configure it to do what I want.
Git for Local Backups
At ${WORK} we use Perforce for version control where the central repository lives on a server that is backed up to multiple places. Our home directories on the various servers we use to develop and build on are also backed up. There is no backup system in place, however, for my desktop, especially now that I’m working full time from home. The assumption is that all important work is done on the servers and nothing important would be lost if a desktop needed to be replaced.
This is mostly true, but I do have some files, such as my dot files and scripts I write to help implement my desktop environment, and notes taken in emacs-org. These are stored on my desktop workstation, and I would not care to lose them in the event of a drive failure. I also have some non-work related files such as posts for this blog, and other media and files I’ve created.
To back these up I set up a QNAP NAS and configured a backup share to be mounted on my desktop. Some things, such as my music files, I just have a static backup directory and copy things over manually, as they rarely change. However for other directories where I create and modify files a lot, I didn’t want to have to worry about forgetting to backup, and I also wanted the ability to go back to or see an older version of a document I’d been working on.
So, I set up a few git repositories on my desktop, one in my ~/scripts directory, one in my ~/emacs-org directory, etc, and have begun keeping these directories under version control. I also created bare repositories on the NAS and push to them as a backup.
This works well, but rather than manually have to push after committing, I wanted to make a post-commit hook that would automatically push to the backup on each commit. While this could be configured manually easily enough, as I had multiple repositories and wanted to be able to easily create new ones, I went looking for a script.
I found a Helpful Blog Post which contained a quick and dirty script to create the bare repository and set up the remote and commit hooks in the repository to be backed up.
It worked, but it was a little too ‘quick and dirty’: There was no error handling, so if something went wrong it could continue running commands in unexpected places It didn’t check if the required parameters were passed in or if directories expected to exist did so. It also failed in a confusing way if passed in a relative path for the repo to be backed up. It would also silently overwrite any existing post-commit hooks already defined in your repro when it added the backup hook. (All things I ran into when trying it out).
I made some changes to the script to address these issues:
#!/usr/local/bin/bash
# Add a Post Commit Hook to a git repository to backup to another bare repository
# on every commit. Will create the bare repository if it does not exist.
#
# Usage: ./git_backup.sh -r <repo> -b <backup>
# eg. ./git_backup.sh -r ~/my_repo -b /media/nas/git/my_repo_backup
#
# Originally from https://medium.com/@fitzgeraldpk/git-dont-push-to-backup-698459ae02f2
#get the command line parameters
die() { echo "!! $*" 1>&2 ; exit 1; }
while getopts r:b: option
do
case "${option}" in
r) repo=${OPTARG};;
b) backup=${OPTARG};;
esac
done
if [ -z "$repo" ] || [ -z "$backup" ]; then
echo "Usage: get_backup.sh -r <repo> -b <backup location>"
exit 1
fi
if [ ! -d "$repo" ]; then
echo "Repo [$repo] does not exist"
exit 1
fi
mkdir -p $backup || die "Failed to make directory $backup"
echo "* Creating $backup"
cd $backup
git init --bare || die "git init failed in $backup"
cd - > /dev/null
cd $repo
echo "* Adding remote"
git remote add --mirror=push backup $backup || die "Failed to add remote to $repo"
git remote -v
echo "* Adding New post-commit hook"
if [ ! -f .git/hooks/post-commit ]; then
echo '#!/bin/sh' > .git/hooks/post-commit || die "Failed to create post-commit hook file"
fi
echo " " >> .git/hooks/post-commit
echo " git push backup" >> .git/hooks/post-commit || die "Failed to add post-commit hook"
chmod a+x .git/hooks/post-commit || die "Failed to make post-commit hook file executable"
Putting this up in case someone else will find this useful.
My Favorite Support Call
I was sitting at my desk one afternoon, about fifteen years ago, when the phone rang. This by itself was an odd occurrence. I was a programmer after all, who wanted to talk to me? Even more oddly, the caller was a receptionist from the main building down the street wanting to transfer a customer support call to me.
Not that I never handled customer support issues– I was lead programmer on a module which our system used to connect to a myriad of third party systems, and there were often integration issues that needed to be sorted out when a new site went live. We would test against a manufacturer’s 2.04 software and 2.05 software only to find out on site that there was an undocumented 2.04a release that only ever went out to a single customer, with undocumented protocol differences, and guess who just bought our system? However usually by the time an issue got to me it was distilled down into a ticket “Error getting data from Foobar Baz 2.3 system”. It was pretty rare for me to talk directly to the boots on the ground.
Our system was a big box that connected to other even bigger boxes via thick cables with a lot of pins and spoke to them using various protocols, some standard, some proprietary, some undocumented and reverse engineered, and this was why this particular customer was calling.
Upon taking the call, I was greeted by a friendly and polite, but clearly exasperated, caller. His path in getting to me clearly explained his exasperation. He was a technician working for our customer and was assigned to install the device my company made into the data center at a secure government location. He had run into trouble getting our device talking to their systems and reached out to the reseller’s technical support team.
After running through some troubleshooting steps, escalating within the reseller, and then running through them all again with the a second level technician, he’d been shunted over to the manufacturer’s (my company’s) technical support team, to troubleshoot a third time. After reaching the depths of our technical support he ended up in our QA team, who tried to figure out if what he was experiencing was a bug in our system or not, and finally when that came to naught, we was redirected to me as the lead developer for the module in question.
It quickly became clear why we were having so much trouble diagnosing the problem, and why no one could give me an actual case with any sort of details to work on. Our normal troubleshooting steps would involve connecting a modem to a phone line and having our device on site dial home, where we could log in to the underlying OS to investigate issues. We also had a myriad of other methods for remotely accessing our devices for troubleshooting, but at this site, being a secure government location, they were all off the table. There wasn’t much of a tool chain in place for troubleshooting without remote access, just a spartan front panel with some readings, a tired technician and me. Everyone else had already failed, so now it was my turn.
The thick cable that connects our device to the device we are talking to connects to a large multi-pin connector on the back of our device, and the diagnostic readout can show the voltage level on each pin once per second. Not a ton to go on when the values can fluctuate thousands of times per second, but that’s what I had him read to me.
“All Zeros”, he assured me.
“No readings at all?”, I replied, “Are you sure the cable is plugged i…”
He cut me off before I could finish. “Listen, I’ve been on the phone for five hours. I’ve talked to six different people and all of them wanted me to make sure the cable is plugged in. I assure you it is. I’m not an idiot, I’m looking at the back of your unit now, and I can see that the cable is definitely plugged in. I can see the cable and the connector. Both are in my field of view right now, and they are definitely plugged together. I’ve also already tried unplugging and replugging the cable, and cleaning the connector. Its not the cable. The cable is great, its brand new. Can we just assume I’m not an idiot who forgot to plug this thing in, and troubleshoot the actual issue? Whats the next step?”
It seemed reasonable, problem was, without remote access I didn’t really have a next step, and all those pins reading zero volts almost certainly pointed to a cable issue. I wasn’t sure if I should go there, he already seemed kind of touchy, but I didn’t have any other ideas, so I opened my mouth:
“Is the other end of the cable plugged in?”
This was met with a moment of silence as he put down the phone to check, and quickly came back on the line, obviously embarrassed.
“Ok, thank you for your help, I think I can handle this from here. click” He was in quite a rush to get off the phone.
Goes to show that support technicians can check if a cable is plugged in, but you need the lead developer to remember that cables have two ends. I guess that’s why we earn the big money.
All kidding and self-aggrandizing aside, this was my favorite support call I’ve ever been on, all the build up to the punch line, all the techs who went down the same train of thought, but didn’t quite take it to its logical conclusion, the sudden denouement, it felt like being in a comedy skit.