Archive for March, 2013

Some whitespace pitfalls in Bash programming

March 11, 2013 1 comment

Bash is a Unix shell, a command interpreter. It is a layer between the system function calls and the user that provides its own syntax for executing commands. It can work in interactive mode or in batch mode (i.e., via scripts). In interactive mode the Bash prompt, that usually ends with the $ symbol, tells you that Bash is ready and waiting for your commands.

Because the Bash doesn’t have as many features as other programming languages like Java or Python many people suffer from the misconception that it is a sort of simple tool and that takes only a few hours to get proficient with it. Wrong approach. As it happens with a `decent` programming language, if one doesn’t pay attention and learns it properly she will see lots of unexpected errors when running her scripts.

An endless source of errors in Bash programming is the management of whitespaces. In this blog entry we will collect some whitespace pitfalls that usually hit Bash beginners and also developers that only write Bash scripts from time to time (like me).

Commands and arguments

Bash reads commands from its input (a file, a terminal emulator…). In general, each read line is treated as a command, i.e., a instruction to be carried out. Bash divides each line into words at each whitespace (spaces or tabs). The firs word it finds is the name of the command to be executed and the remaining words become arguments to that command.

The above paragraph is IMHO the golden rule that one should never forget if she wants to use Bash in a painless way. It seems trivial (and in fact, it is) but most programming languages are not so strict when using whitespaces so, if someone regularly develops in Perl, Java, C…, it is likely that she tends to use the whitespace in a wrong way when writing Bash scripts. For instance, in Python one can setup a variable as follows:

++++my_var = 4

In fact, it is recommended to delimit the operator with whitespaces for improving readability. However, doing the same thing in Bash will raise an error:

++++$ my_var = 4
++++my_var: command not found

because Bash interprets the above line as: execute the my_var command with arguments = and 4. But the my_var command doesn’t exist so we get an error. The proper way to do the assignment is:

++++$ my_var=4

Parameter expansion

In order to access the data stored in a variable, Bash uses parameter expansion i.e., the replacement of a variable by its value (on expansion time the parameter or its value can be modified in a number of ways but we will only use simple replacements here). The syntax for telling Bash that we want to do a parameter expansion is to prepend the variable name with a $ symbol. Although not always necessary, it is recommended to put curly braces around the variable name:

++++$ echo "my_var = ${my_var}"
++++my_var = 4

Non double-quoted results of expansions are subject to word splitting. Double-quoted results of expansions are not. An because after the replacement Bash can still execute actions on the result (due to the golden rule mentioned above) we have just met a pervasive mistake involving whitespaces: the wrong quotation of results of parameter expansions. For instance, if we want to create an empty file named My favourite quotations.txt then we shouldn’t issue the following command:

++++$ filename="My favourite quotations.txt"
++++$ touch ${filename}

the result of the parameter expansion is not double-quoted so word splitting happens and we execute the touch command with three arguments, My, favourite and quotation.txt. Instead we should double-quote the result of the expansion for avoiding the word splitting after the parameter expansion:

++++$ touch "${filename}"

The above example is trivial and harmless but the same considerations apply to potentially dangerous commands like rm. If we forget the double quotes we may end up removing the wrong files (which can be annoying if we don’t have a backup of those files). When a variable contains whitespaces we have to decide if its expansion needs double quotation or not. The general advice is to put double quotes around every parameter expansion; bugs caused by unquoted parameter expansions are harder to debug than those due to quoted expansions.

Another source of problems related with parameter expansions and whitespaces is the use of metacharacters because unquoted expansions are also subject to globbing. Globbing happens after word splitting. For instance, if in our current directory we have a TODO.txt and a README.txt files, issuing the following similar commands will produce very different results:

++++$ msg="There are *.txt files in this directory"
++++$ echo $msg
++++There are README.txt TODO.txt files in this directory
++++$echo "$msg"
++++There are *.txt files in this directory

Testing conditions

Another common misuse of whitespaces hits many people when they call the [ builtin or the [[ keyword for testing some expression (you can found a nice explanation about the types of Bash commands here). It seems to be very easy to forget that both of them are commands whose arguments are an expression and a closing ] (or ]]). So one can write the following for checking if the file TODO.txt in current directory is indeed a file:

++++$ [-f TODO.txt]
++++[-f: command not found
++++$ [ -f TODO.txt]
++++-bash: [: missing `]'
++++$ [ -f TODO.txt ] && echo OK

In the first case we are trying to execute a non-existing command because the [ is not followed with a whitespace. In the second case the last argument passed is not a ] because the ] is not preceded with a whitespace. The third case is fine.

In addition, [ and [[ manage whitespaces in a different way. Parameter expansions inside a [ are subject to word splitting and globbing so quoting those expansions can be critical. However [[ is a keyword and it parses its arguments and does the expansion itself, taking the result as a single argument even if the result contains whitespaces. So if we want to check the content of:

++++$ my_var="I like quotes"

we can do:

++++$ [ "$my_var" = "I like quotes" ] && echo "Me too!"
++++Me too!
++++$ [[ $my_var = "I like quotes" ]] && echo "Me too!"
++++Me too!

[[ has more nice features (pattern matching, support of control operators for combining expressions) that make it more powerful than [ but those features are not directly related with whitespaces so they will not be considered here.

Also notice that many beginners tend to think that the if keyword must be followed by a [ or a [[ just as if must be followed by ( in C and other programming languages. It is not the case in Bash. if must be followed by one or more commands. Those commands can be [ or [[ but they don’t have to.

Grouping commands

The last example of a common mistake related to whitespaces is the use of the one line version of grouping commands. The proper syntax is::

++++{ <LIST>; }

So we have to pay attention to the whitespaces between the curly braces and the list of commands and also to the semicolon after the list of commands or we can get errors like these:

++++$ {rm TODO.txt || echo "Unable to delete file" >&2; }
++++-bash: syntax error near unexpected token `}'
++++$ { rm TODO.txt || echo "Unable to delete file" >&2 }
++++> ^C

In the first case the closing brace is unexpected because there is no opening brace (instead we have issued a non existing {rm command). In the second case there is no semicolon after the last command. As a consequence Bash thinks that the closing } is an argument of the last command and waits for the user to close the group of commands. We have aborted the unfinished command and come back to the default interactive prompt by pressing Ctrl-C.

Further reading

A couple of great sources of information about Bash are: