79475780

Date: 2025-02-28 14:07:38
Score: 2
Natty:
Report link

For building large data sets in Bash, I'd strongly advise to generally use an array.

It'll scale much better, also offering much better handling and filtering performance, than concatenated strings - most especially, for very large sets of >5000 elements (depending on your machine's resources).

An exception to that 'rule of thumb' would be, to e. g. invoke a very efficient filtering tool like grep once (not in large loops) over a large data set.

Also, I've found while loops to be a bit more efficient than for and more flexible, in many cases, especially, if indices/counters are involved.

The best-scaling way I've found, to build large, concatenated strings, is building and then converting an array natively.

A comparative example to the URL concat problem above:

#! /bin/bash

url="http://www.$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w $((4096-15)) | head -n 1).com"

items=2000

# Build array of items and measure time
startTime="$EPOCHREALTIME"
while [[ $((i++)) -lt $items ]]; do 
    tmpArr+=("$url")
done
stopTime="$EPOCHREALTIME"
awk 'BEGIN {printf "%s %f\n", "SECONDS ARRAY building from loop:", '"$stopTime"' - '"$startTime"'};'

# Bash/native expansion of array to string and measure time
# Mind temporary IFS restriction and '[*]' to wrap on and expand newlines
startTime="$EPOCHREALTIME"
_IFS="$IFS"; IFS=$'\n'
tmpVar="${tmpArr[*]}"
IFS="$_IFS"
stopTime="$EPOCHREALTIME"
awk 'BEGIN { printf "%s %f\n", "SECONDS VAR from array expansion:", '"$stopTime"' - '"$startTime"'; }'

# Directly concat strings
tmpVar=
startTime="$EPOCHREALTIME"
while [[ $((i++)) -lt $items]]; do
    tmpVar+="$url"$'\n'
done
stopTime="$EPOCHREALTIME"
awk 'BEGIN { printf "%s %f\n", "SECONDS VAR direct building:", '"$stopTime"' - '"$startTime"'; }'

# Looped printf output concatenation
startTime="$EPOCHREALTIME"
tmpVar="$(
    for (( i=0; i < $items; i++ )); do
        printf '%s\n' "$url"
    done
)"
stopTime="$EPOCHREALTIME"
awk 'BEGIN { printf "%s %f\n", "SECONDS VAR from printf loop:", '"$stopTime"' - '"$startTime"'; }'

# Print tmpVar concatenated string
#echo -e "\nVAR ELEMENTS:"
#printf '%s\n' "$tmpVar"

For 2.000 items, directly concatenating a string may still work fast enough.

This will change quickly, though, for a higher number of items (depending on your system's resources).

Where the effort for building an array and converting it natively, is minimal and will increase only linearly, the time for directly concatenating strings will grow exponentially - up to stalling your script; the chosen solution above will show even worse performance:

Items: 2.000

SECONDS ARRAY building from loop: 0.073598

SECONDS VAR from array expansion: 0.129333

SECONDS VAR direct building: 0.641954

SECONDS VAR from print loop: 1.196127

For 5.000 items:

SECONDS ARRAY building from loop: 0.201801

SECONDS VAR from array expansion: 0.326866

SECONDS VAR direct building: 4.797142

SECONDS VAR from print loop: 6.986455

For 10.000 items:

SECONDS ARRAY building from loop: 0.378829

SECONDS VAR from array expansion: 0.674607

SECONDS VAR direct building: 18.280629

SECONDS VAR from print loop: 27.756085

For 100.000 items:

SECONDS ARRAY building from loop: 3.748199

SECONDS VAR from array expansion: 6.427268

SECONDS VAR direct building: ??? >20 minutes

SECONDS VAR from print loop: ???

Reasons:
  • Blacklisted phrase (1): ???
  • Long answer (-1):
  • Has code block (-0.5):
  • Ends in question mark (2):
  • Low reputation (0.5):
Posted by: fozzybear