For the most part, it’s easy to write Bash scripts. What’s really tricky is writing good Bash scripts.
What do we mean by this? In short, we mean writing “clean code”. Clean code is:
- Easy for someone to pick up and understand
- Expects the unexpected
There will always be situations where you write a script for a one-time only task. In these situations, people tend to cut corners and become more flexible with making a script scalable or reusable. It’s tempting, it is, we’ve all been there. But…inevitably, you’ll almost always need to do the same or similar task unexpectedly in the future. It takes time to write good code in all situations, but, I promise, it’s always worth it in the long run!
Here we’ve put together a simple list of 12 good practices which you can try to follow. This is by no means an exhaustive list and I encourage you to read more widely and continually assess and develop your scripting practices.
- Plan ahead
Most of your scripting headaches will be solved by planning ahead. Think about what you want your script to do and expect the unexpected. This means not only thinking about the “happy path” where everything proceeds as you expect it to, but also the exceptions. For example, what should it do if the file its processing is empty or doesn’t exist?
2. Build your script in small steps
Depending on the complexity of the task you’re trying to accomplish, your script may be very small or somewhat larger. Now, even for the most experienced of script writers, there can be a typo or other error in their first attempt. To avoid this, no matter the scale of your script, build it up in small stages and test as you go.
3. Scale up slowly
We don’t just need to consider building the process up step by step, but also the size of the data the script is handling. A simple rule of thumb is to get your script working for a single task first. Then, build up slowly until you reach the full scale of your dataset i.e. handle one file, then 10, then 100… This allows you to predict the resources you need to process the full dataset and get a rough idea of how long this process will take.
4. Comment, comment, comment
OK, I could start by saying that you can never have too many comments in a script. But, that’s not true, at some point you won’t be able to see the wood for the trees and your script will become unnecessarily bloated. There is no hard and fast rule on how many comments you should have in your scripts, it just comes down to common sense. Simply, make sure that you have enough information so that if someone, usually you, picks your script up in 6 months that they know what it does and have a rough idea of how it does it. This isn’t just true for shell scripting, but for almost every other type of programmatic scripting you may encounter.
5. Don’t prolong the life of the script unnecessarily
This is called script longevity and links back to expecting the unexpected. Earlier in the week we looked at using the set command to handle failures and errors. A script should never fall over quietly. Trigger an exit signal when the unexpected happens. This means that a script should exit on the first error and not blithely continue running unnecessarily or fall over without you realising it.
6. Keep on top of variable management
There are a lot of things to consider for variable management. First up is syntax. I like to use upper case for environment variables and lower case for local variables.
Depending on your preference you can use underscores or camel case. I prefer underscores as it keeps things consistent between variable types. But that’s up to you.
Variable names should be meaningful – i.e. you should know straight away what they are referring to. And, as mentioned earlier in the week, use double quotes and curly braces to avoid issues with whitespace and wildcards in the variable value.
7. Prevent code bloat by using functions
Quite often you will want to repeat the same process multiple times within a script. We could just copy and paste the code, amending it to our needs. This is inefficient and will make for a longer script. Instead, we can wrap the code in a function so that the code we want to run is only located once in our script and is referenced using a function call anywhere it is needed. Like variables, functions should be meaningfully named. They should be small, only performing a limited, clear task.
8. Don’t duplicate scripts
It can be very tempting to put paths to data directly into your scripts. Then, when you come to use the script on the new dataset, copy the script, save it under a new name and update the dataset location to point at the new data. Please don’t do this. It’s worth investing the time in making your scripts reusable and scalable by using arguments and avoiding hard coded paths.
9. Keep debugging simple
Scripts will fail, it’s inevitable. We’ve already mentioned expecting the unexpected, but what should you do when the unexpected actually happens? How do you know at what point your script failed? This is where logging comes in – print everything your script is doing back to the system. That way, when it fails, you know at which point it stopped working and will have a much easier time debugging the code.
10. Clean up after yourself
This is simple. If you generate intermediate or temporary files, make sure that you remove them when you’ve finished with them. This should be built into the script itself and not done an afterthought once it’s run.
11. Make your code easy to read
Digestible code is always easier to work with and maintain. To quote Martin Fowler: “Any fool can write code that a computer can understand. Good programmers write code that humans can understand.”. To help, you can use linters to look over your scripts and give feedback on their readability/formatting. One of the widely used online tools for Bash script linting is https://www.shellcheck.net/.
12. Don’t walk away from new scripts
It’s oh so tempting to hit “go”, set your nice, shiny new script running…then go off and make a cup of tea. Don’t. Sit back down and make sure it runs OK for the first couple of times. Why the first couple and not just the first time I hear you ask? Because, it will inevitably be the third or fourth time you run the script that it will fail in a spectacularly dramatic fashion. I like a cup of tea as much as the next person but, please, make it before you start running your script!
Most of important of all, don’t be afraid to ask for help!
© Wellcome Genome Campus Advanced Courses and Scientific Conferences