Back in 2008, I did a fairly major upgrade of my website, and after trying sed, I found that perl provided a better way of modifying html code across multiple files.
I wrote a web page about it, and it has been most useful since then. This page is a sequel to that page, and it should be read first, before continuing with this page.
Here is a link to it.
In a more recent upgrade of various sections of my website, I came across some difficulties, and had to dig a bit deeper into perl, this web page documents how I used perl to solve them.
By the end of the previous web page about perl, I had a working script that did text substitution across multiple lines.
#!/bin/bash old='< some-text >.*? < more-text >' new='< new-text >' perl -0777 -pi -e "s/$old/$new/gis;" *.htm
Here is a line by line discription of what it does.
\ ^ . $ | () []
* + ? {}
*? +? ?? {}?
? * + | ^ $ ()
/i /m /s /x /o /g /e
Now the above script worked fine for some of the html coding I was trying to do, but I met some problems.
What follows is some of the solutions I eventually dug up from the internet.
One of the problems I had was in replacing several lines of text, with different files having different amounts of white space in them.
The solution to this involved the use of a perl feature called Character Classes, where a small number of specific characters can be used to represent a number of characters of a similar type.
The specific characters are
\d \s \w \D \S \W
The \s character will represent any white space characters, and can be used as a shorthand for tabs, spaces, and new lines.
So the regex could be for example
$old='text-1\s+text-2\s+text-3'
and this would match any block of text that contained "text-1" then "text-2" then "text-3", with any amount of white space between them.
The particular problem here was that each file had different text in it, so it was not possible to specify a particular string after which the new text should be added.
Now simple logic would suggest that if I just used wildcards such as
$old='.*'
then tried
$new='.* newtext'
then the substitution would just add "newtext" after all the existing text.
However it doesn`t work - the "newtext" is added just as instructed, however all the original text is deleted.
After digging around, it appears that the way that perl does substitution is
The trouble is - after step 2 - it doesn`t know what it has removed, it just has an empty space. So using the wild card in the $new variable doesn`t achieve anything.
The solution is to change the fourth line of the script at the top of the page, so that the s/// operator now becomes -
perl -0777 -pi -e "s/($old)/$new/gis;" *.htm
That is - the bit that calls the $old variable is placed inside parenthesis.
Now perl adds another bit to the procedure -
So the $new variable can now be written as
new='$1 newtext'
and in line 4, perl puts back all the original text ( from the variable $1 ) plus the new text.
So the final script now becomes something like
#!/bin/bash old='^.*$' new='$1 newtext' perl -0777 -pi -e "s/($old)/$new/gis;" *.htm
As far as I can see, perl creates the variable $1 after it has done a succesful match.
Also, perl can create more than one of these numbered variables - so $1 up to $9 are certainly possible, but there doesn`t seem to be a definitive answer as to how many of these numbered variables are allowed beyond these nine.
One of the things I needed to do was to change the file extensions on multiple files from .htm to .php.
I started to dig around perl to see how to do it, but was quickly diverted to a much simpler method - using the bash shell script command "rename".
All it needed was
rename .htm .php *.htm
Job done !