Some days ago I was on the chance to transfer a huge directory.
Huge means ~50TB with +10million files and a deep of only 6 folders under the parent one.
As I must do that kind of transfer more than 10 times with the same amount of folders… I decided to implement some kind of parallel function which launch parallel rsync’s at a given deep of my choose.
The ressult was that “pure bash” little script (the only dependency is “screen”)… You’ll notice that the main function “sync_this()” will run alone in your script only changing 2 or 3 variables ;-)
#!/bin/bash [ ! $1 ] && echo "Usage: $0 /path/to/run" && exit 1 TARGET="$1" [[ ! "${TARGET}" ]] && echo -e "$TARGET\n not a directory" && exit 1 [ ! -d ${TARGET} ] && echo -e "$TARGET\n not a directory" && exit 1 LOGDIR=$(dirname $0)/$(basename ${TARGET}) [ -d ${LOGDIR} ] && echo "Cleanup" && rm -fr ${LOGDIR} mkdir -p ${LOGDIR}/transferlogs check_max_processes() { local let MAXPARALEL=$1 while [ $(ps waux | egrep ":[0-9]{2} rsync" | wc -l) -gt ${MAXPARALEL} ] ; do printf "%s" . sleep 1 done } sync_this() { local let MAXDEPTH=3 local let MAXPARALEL=20 LAUCHRSYNC="/root/autosync/" local let y=0 for FOLDER in $(find ${TARGET} -mindepth ${MAXDEPTH} -maxdepth ${MAXDEPTH} -type d) ; do DIRLIST[$y]="${FOLDER}" let y++ done echo "Copying files and directories NOT recursively" for ((i=0;i<${MAXDEPTH}; i++));do let x=0 for ITEM in $(find ${TARGET} -mindepth $i -maxdepth $i -type d) ; do check_max_processes ${MAXPARALEL} screen -S ${x} -d -m ${LAUCHRSYNC} -nr ${ITEM} nr_${x} ${LOGDIR} let x++ [[ $x =~ [0-9]{1,2}00$ ]] && printf "\n%s\n" "$x Directories Copied Not recursively" done echo "Deep $i DONE, going upper" done echo "Launching recursive rsyncs in deep ${MAXDEPTH}" let x=0 for ((i=0;i<${#DIRLIST[@]}; i++ )); do printf "\n%s" "Launching rsync $i of ${#DIRLIST[@]}" check_max_processes ${MAXPARALEL} screen -S ${i} -d -m ${LAUCHRSYNC} -r ${DIRLIST[$i]} r_${i} ${LOGDIR} done } sync_this ${TARGET}
I’m using an additional script to launch the rsync (variable ${LAUCHRSYNC}
) why? Simply to keep track of what the rsync’s are doing and the result of it, here the code of that script:
#!/bin/bash # RECURSIVE=$(echo $1 | tr '[[:upper:]]' '[[:lower:]]') TARGET=$2 SCREENNAME=$3 LOGDIR=$4DSTSERVER="" if [[ "${RECURSIVE}" =~ ^\-{1,2}(nr|non-recursive)$ ]] ; then rsync -cdlptgoDv --partial ${TARGET}/* ${DSTSERVER}:${TARGET}/ 2>&1 > ${LOGDIR}/transferlogs/${SCREENNAME}_NOTRECURSIVE.log RES=$? elif [[ "${RECURSIVE}" =~ ^\-{1,2}(r|recursive)$ ]] ; then rsync -cazv --partial ${TARGET}/* ${DSTSERVER}:${TARGET}/ 2>&1 > ${LOGDIR}/transferlogs/${SCREENNAME}.log RES=$? else echo "$0 -nr|-r|--non-recursive|--recursive" exit 1 fi if [ $RES -eq 0 ] ; then echo "$RES : ${TARGET}" >> ${LOGDIR}/${RECURSIVE//-/}_TRANSFERS.OK else echo "$RES : ${TARGET}" >> ${LOGDIR}/${RECURSIVE//-/}_TRANSFERS.FAIL fi
If you don’t care about the ressult of the rsync’s, you can simply move the rsync line’s from the to the main code of the script and launch them to the background.
The main script will create a new folder with name: $(dirname $0)/$(basename ${TARGET})
in which you’ll find some important files:
Nombre | Contenido |
Folders which rsync HASN’T finished OK (nr=not-recursive,r=recursive) |
Folders which rsync HAS finished OK (nr=not-recursive,r=recursive) |
transferlogs |
Folder which will have 1 logfile for each rsync launched ;-) |
You’ll find more info here: