Bash Tricks: Split / Cut a String with Multi Character Delimiters Using AWK

Some time back I wrote this post showing how to split a string into substrings separated by multi character delimiters. Didn’t realize then that there’s a much easier solution using awk. Using the same example as used in the previous post, here’s the solution.

echo "abcd<>efgh<>ijkl<>mn op<>qr st<>uv wx<>yz" | awk 'BEGIN {FS="<>"} {for(i=1;i<=NF;i++)print $i}'

The delimiter here is "<>".

This will print out all the substrings. If you want an individual substring, you can use something like

echo "abcd<>efgh<>ijkl<>mn op<>qr st<>uv wx<>yz" | awk 'BEGIN {FS="<>"} {print $1}'
echo "abcd<>efgh<>ijkl<>mn op<>qr st<>uv wx<>yz" | awk 'BEGIN {FS="<>"} {print $2}'

Thats how easy it is.

Bash Tricks: Split / Cut a String with Multi Character Delimiters

Its simple enough to split a string when it has single character delimiters using the cut command. However cut doesn’t support multi-character delimiters. Here’s a sample script to demonstrate how to split strings with multi-character delimiters.

#!/bin/bash
#Inputs to the script, the delimiter, and the string itself
D="<>"   #Multi Character Delimiter
string="abcd<>efgh<>ijkl<>mn op<>qr st<>uv wx<>yz" #String with delimiters

#Split the String into Substrings
sList=($(echo $string | sed -e 's/'"$D"'/\n/g' | while read line; do echo $line | sed 's/[\t ]/'"$D"'/g'; done))
for (( i = 0; i < ${#sList[@]}; i++ )); do
  sList[i]=$(echo ${sList[i]} | sed 's/'"$D"'/ /')
done

#Output the Split String
echo No of SubStrings - ${#sList[@]}
for (( i = 0; i < ${#sList[@]}; i++ )); do
  echo ${sList[i]}
done


In the above script, the string is being split and then stored in the sList array. You can access the individual substrings using ${sList[0]}, ${sList[1]}, ${sList[2]} etc. The output for the above script is:

@$ sh cut.sh 
No of SubStrings - 7
abcd
efgh
ijkl
mn op
qr st
uv wx
yz

For those who are wondering how to cut using the cut command. Here’s an example

echo "a|b|c|d|e" |  cut -d '|' -f 1
echo "a|b|c|d|e" |  cut -d '|' -f 2
echo "a|b|c|d|e" |  cut -d '|' -f 3
etc..