Horrible, Repetitive Unix Jobs
One of the challenges that faces the campus as we switch from UA1VM to Bama is the smooth migration of information between machines. Web pages, especially, can prove burdensome because they contain references to the location of data by machine name. In fact, machine names may be imbedded in files hundreds of times. Hand editing many files, looking for many instances of a change to be made is prone to mistakes. Thankfully, Unix has tools that make these types of repetitive task very easy to do. This is just what shell scripting was meant to do; automate the tedious jobs.
So, let us suppose you have a directory full of ".html" files that refer to UA1VM which you have moved over to Bama and you now wish to replace all instances of the word "UA1VM" in all files with "Bama" to update the URL. You need a script that will get a list of all the important files and go through them to do the replacement.
The editor we will use is called "sed" for stream editor, meaning it will just run through a file, line by line, following the editing instructions it is given at the start. We will give it a filename but it writes the edited file to STDOUT so well have to catch that and redirect it to another file (see Tipsheet Vol. 1, No. 4 for more information on redirects). So, if we have a file called "original.html" and we want the edited file to be called "newfile.html" we could run sed with
sed -e "s+UA1VM+bama+g" original.html > newfile.html
The string "s+UA1VM+bama+g" gives the editing instructions. It says to substitute ("s") for "UA1VM" with "bama" and do it globally ("g"), meaning every instance on a given line, not just the first. The "+" separates the substitution strings and is arbitrarily chosen.
Now this has to be put into a loop which picks up all the ".html" files in a directory. It would also be a good idea if, in the loop, all the ".html" files got saved (in case of problems) under a name such has ".html_old". The entire script to do the loop, run sed, and rename the files would look like this:
#!/bin/ksh files=$(ls *.html) for filename in $files; do echo processing $filename sed -e "s+UA1VM+bama+g" $filename > tmp mv $filename $filename"_old" mv tmp $filename done
The lines in this script have the following actions:
- tell the computer to use the ksh to run this script
- get a list of all the ".html" files in the current directory
- start the loop to go through the list of files
- print out a line to tell you where it is in the list
- run the " sed " command and save the results in a file called "tmp"
- rename the original file (".html"to ".html_old")
- rename "tmp" to the original file name
The final step before running the script is to mark it as executable. To do this you would type
chmod u+x myscript
where "myscript" is the name you gave the script. You can create the script with your favorite editor. You would then run "myscript" by typing
myscript
You should run this script just once. If you do it a second time, your updated ".html" files will overwrite the ".html_old" files you saved. Also, beware that this editor doesnt check the context of the word it is changing. If you have a place where UA1VM needs to remain unchanged you will need to modify the "sed" command slightly. The suggested change would be to put in the entire machine name, as in
sed -e "s+UA1VM.UA.EDU+bama.ua.edu+g" original.html > newfile.html
This script will process through large numbers of files in just minutes. You would need to run it from every directory where files need to be updated.
© 1998, The University of Alabama. The information included here is for the University of Alabama central computing facility as it was configured on the document date. It may or may not apply to other Unix systems.

