Some days ago I was on the chance to transfer a huge directory.
Huge means ~50TB with +10million files and a deep of only 6 folders under the parent one.
As I must do that kind of transfer more than 10 times with the same amount of folders… I decided to implement some kind of parallel function which launch parallel rsync’s at a given deep of my choose.
The ressult was that “pure bash” little script (the only dependency is “screen”)… You’ll notice that the main function “sync_this()” will run alone in your script only changing 2 or 3 variables ;-)
#!/bin/bash
[ ! $1 ] && echo "Usage: $0 /path/to/run" && exit 1
TARGET="$1"
[[ ! "${TARGET}" ]] && echo -e "$TARGET\n not a directory" && exit 1
[ ! -d ${TARGET} ] && echo -e "$TARGET\n not a directory" && exit 1
LOGDIR=$(dirname $0)/$(basename ${TARGET})
[ -d ${LOGDIR} ] && echo "Cleanup" && rm -fr ${LOGDIR}
mkdir -p ${LOGDIR}/transferlogs
check_max_processes()
{
local let MAXPARALEL=$1
while [ $(ps waux | egrep ":[0-9]{2} rsync" | wc -l) -gt ${MAXPARALEL} ] ; do
printf "%s" .
sleep 1
done
}
sync_this()
{
local let MAXDEPTH=3
local let MAXPARALEL=20
LAUCHRSYNC="/root/autosync/launch_rsync.sh"
local let y=0
for FOLDER in $(find ${TARGET} -mindepth ${MAXDEPTH} -maxdepth ${MAXDEPTH} -type d) ; do
DIRLIST[$y]="${FOLDER}"
let y++
done
echo "Copying files and directories NOT recursively"
for ((i=0;i<${MAXDEPTH}; i++));do
let x=0
for ITEM in $(find ${TARGET} -mindepth $i -maxdepth $i -type d) ; do
check_max_processes ${MAXPARALEL}
screen -S ${x} -d -m ${LAUCHRSYNC} -nr ${ITEM} nr_${x} ${LOGDIR}
let x++
[[ $x =~ [0-9]{1,2}00$ ]] && printf "\n%s\n" "$x Directories Copied Not recursively"
done
echo "Deep $i DONE, going upper"
done
echo "Launching recursive rsyncs in deep ${MAXDEPTH}"
let x=0
for ((i=0;i<${#DIRLIST[@]}; i++ )); do
printf "\n%s" "Launching rsync $i of ${#DIRLIST[@]}"
check_max_processes ${MAXPARALEL}
screen -S ${i} -d -m ${LAUCHRSYNC} -r ${DIRLIST[$i]} r_${i} ${LOGDIR}
done
}
sync_this ${TARGET}
I’m using an additional script to launch the rsync (variable ${LAUCHRSYNC}) why? Simply to keep track of what the rsync’s are doing and the result of it, here the code of that script:
RSYNCRECURSIVE="/root/autosync/launch_rsync.sh"
#!/bin/bash
# launch_rsync.sh
RECURSIVE=$(echo $1 | tr '[[:upper:]]' '[[:lower:]]')
TARGET=$2
SCREENNAME=$3
LOGDIR=$4DSTSERVER="1.1.1.1"
if [[ "${RECURSIVE}" =~ ^\-{1,2}(nr|non-recursive)$ ]] ; then
rsync -cdlptgoDv --partial ${TARGET}/* ${DSTSERVER}:${TARGET}/ 2>&1 > ${LOGDIR}/transferlogs/${SCREENNAME}_NOTRECURSIVE.log
RES=$?
elif [[ "${RECURSIVE}" =~ ^\-{1,2}(r|recursive)$ ]] ; then
rsync -cazv --partial ${TARGET}/* ${DSTSERVER}:${TARGET}/ 2>&1 > ${LOGDIR}/transferlogs/${SCREENNAME}.log
RES=$?
else
echo "$0 -nr|-r|--non-recursive|--recursive"
exit 1
fi
if [ $RES -eq 0 ] ; then
echo "$RES : ${TARGET}" >> ${LOGDIR}/${RECURSIVE//-/}_TRANSFERS.OK
else
echo "$RES : ${TARGET}" >> ${LOGDIR}/${RECURSIVE//-/}_TRANSFERS.FAIL
fi
If you don’t care about the ressult of the rsync’s, you can simply move the rsync line’s from the launch_rsync.sh to the main code of the script and launch them to the background.
The main script will create a new folder with name: $(dirname $0)/$(basename ${TARGET}) in which you’ll find some important files:
| Nombre | Contenido |
|---|---|
(nr|r)_TRANSFERS.FAIL |
Folders which rsync HASN’T finished OK (nr=not-recursive,r=recursive) |
(nr|r)_TRANSFERS.OK |
Folders which rsync HAS finished OK (nr=not-recursive,r=recursive) |
transferlogs |
Folder which will have 1 logfile for each rsync launched ;-) |
EDIT:
You’ll find more info here:
https://wiki.ciberterminal.net/doku.php?id=linux:parallel_rsync