Some days ago I was on the chance to transfer a huge directory.
Huge means ~50TB with +10million files and a deep of only 6 folders under the parent one.
As I must do that kind of transfer more than 10 times with the same amount of folders… I decided to implement some kind of parallel function which launch parallel rsync’s at a given deep of my choose.
The ressult was that “pure bash” little script (the only dependency is “screen”)… You’ll notice that the main function “sync_this()” will run alone in your script only changing 2 or 3 variables ;-)
#!/bin/bash [ ! $1 ] && echo "Usage: $0 /path/to/run" && exit 1 TARGET="$1" [[ ! "${TARGET}" ]] && echo -e "$TARGET\n not a directory" && exit 1 [ ! -d ${TARGET} ] && echo -e "$TARGET\n not a directory" && exit 1 LOGDIR=$(dirname $0)/$(basename ${TARGET}) [ -d ${LOGDIR} ] && echo "Cleanup" && rm -fr ${LOGDIR} mkdir -p ${LOGDIR}/transferlogs check_max_processes() { local let MAXPARALEL=$1 while [ $(ps waux | egrep ":[0-9]{2} rsync" | wc -l) -gt ${MAXPARALEL} ] ; do printf "%s" . sleep 1 done } sync_this() { local let MAXDEPTH=3 local let MAXPARALEL=20 LAUCHRSYNC="/root/autosync/launch_rsync.sh" local let y=0 for FOLDER in $(find ${TARGET} -mindepth ${MAXDEPTH} -maxdepth ${MAXDEPTH} -type d) ; do DIRLIST[$y]="${FOLDER}" let y++ done echo "Copying files and directories NOT recursively" for ((i=0;i<${MAXDEPTH}; i++));do let x=0 for ITEM in $(find ${TARGET} -mindepth $i -maxdepth $i -type d) ; do check_max_processes ${MAXPARALEL} screen -S ${x} -d -m ${LAUCHRSYNC} -nr ${ITEM} nr_${x} ${LOGDIR} let x++ [[ $x =~ [0-9]{1,2}00$ ]] && printf "\n%s\n" "$x Directories Copied Not recursively" done echo "Deep $i DONE, going upper" done echo "Launching recursive rsyncs in deep ${MAXDEPTH}" let x=0 for ((i=0;i<${#DIRLIST[@]}; i++ )); do printf "\n%s" "Launching rsync $i of ${#DIRLIST[@]}" check_max_processes ${MAXPARALEL} screen -S ${i} -d -m ${LAUCHRSYNC} -r ${DIRLIST[$i]} r_${i} ${LOGDIR} done } sync_this ${TARGET}
I’m using an additional script to launch the rsync (variable ${LAUCHRSYNC}
) why? Simply to keep track of what the rsync’s are doing and the result of it, here the code of that script:
RSYNCRECURSIVE="/root/autosync/launch_rsync.sh"
#!/bin/bash # launch_rsync.sh RECURSIVE=$(echo $1 | tr '[[:upper:]]' '[[:lower:]]') TARGET=$2 SCREENNAME=$3 LOGDIR=$4DSTSERVER="1.1.1.1" if [[ "${RECURSIVE}" =~ ^\-{1,2}(nr|non-recursive)$ ]] ; then rsync -cdlptgoDv --partial ${TARGET}/* ${DSTSERVER}:${TARGET}/ 2>&1 > ${LOGDIR}/transferlogs/${SCREENNAME}_NOTRECURSIVE.log RES=$? elif [[ "${RECURSIVE}" =~ ^\-{1,2}(r|recursive)$ ]] ; then rsync -cazv --partial ${TARGET}/* ${DSTSERVER}:${TARGET}/ 2>&1 > ${LOGDIR}/transferlogs/${SCREENNAME}.log RES=$? else echo "$0 -nr|-r|--non-recursive|--recursive" exit 1 fi if [ $RES -eq 0 ] ; then echo "$RES : ${TARGET}" >> ${LOGDIR}/${RECURSIVE//-/}_TRANSFERS.OK else echo "$RES : ${TARGET}" >> ${LOGDIR}/${RECURSIVE//-/}_TRANSFERS.FAIL fi
If you don’t care about the ressult of the rsync’s, you can simply move the rsync line’s from the launch_rsync.sh to the main code of the script and launch them to the background.
The main script will create a new folder with name: $(dirname $0)/$(basename ${TARGET})
in which you’ll find some important files:
Nombre | Contenido |
---|---|
(nr|r)_TRANSFERS.FAIL |
Folders which rsync HASN’T finished OK (nr=not-recursive,r=recursive) |
(nr|r)_TRANSFERS.OK |
Folders which rsync HAS finished OK (nr=not-recursive,r=recursive) |
transferlogs |
Folder which will have 1 logfile for each rsync launched ;-) |
EDIT:
You’ll find more info here:
https://wiki.ciberterminal.net/doku.php?id=linux:parallel_rsync