Saturday, April 18, 2009

Fast Parallel Downloading (for apt-get)

I'm rebuilding a Ubuntu server. Normally apt-get downloads one file at a time, which can get dull when you're installing 598 files. I found the tool "apt-fast" which downloads one or two files quickly, by downloading with multiple streams per file. This is somewhat sketchy, as it requires installation of additional software, assumes the file gets spliced together correctly, and doesn't gracefully handle network problems.

I have a solution: xargs

Xargs walks on water. It is incredibly useful. In a nutshell, it runs a single command on a list of files. I'll post a lot more later, but here's how to speed up apt-get:

cd /var/cache/apt/archives/
apt-get -y --print-uris install ubuntu-desktop^ > debs.list
egrep -o -e "http://[^\']+" debs.list | xargs -l3 -P5 wget -nv
apt-get -y install ubuntu-desktop^

Replace "ubuntu-desktop^" with whichever task or package you want. Since ubuntu-desktop is a task, a huge collection of packages, the "^" on the end is required (and magic).

The options say to take three packages into a batch (-l3), and download five batches at a time in parallel (-P5). These settings are arbitrary, but provide a nice speedup while also not hammering the Ubuntu repository servers too hard.

6 comments:

  1. trying to use your idea for kde on ppc lenny

    apt-get -y --print-uris install kde^ => no results

    without the '^' things ran better (i didnt understand your magic at first anyway :).

    It made me notice that my sources.list was bloated, and with ftp server which wget didnt like conversing with ( there was a lot of "error in server greetings" )

    Thanks netselect-apt for choosing servers for sources.list.

    [code]

    cd /var/cache/apt/archives/
    apt-get -y --print-uris install $x | egrep -o -e "http://[^\']+" | xargs -l3 -P5 wget -nv
    apt-get -y install $x

    ReplyDelete
  2. Nice idea, not sure if it really speeds up an upgrade, which is what I wanted to use it for, but anyway. I made some changes and this is what I ended up with:

    (apt-get -y --print-uris $@ | egrep -o -e "http://[^\']+" | xargs -r -l3 -P5 wget -nv -P "/var/cache/apt/archives/") && apt-get $@

    All the output is piped, there does not seem to be any need for a temporary file. The working directory isn't changed. I added the "-r" option to xargs so that things work when there's nothing to download. Put it in a file and I use apt-get update && ./fapt.sh upgrade to quickly install all upgrades. Now if only the update was faster (parallel?).

    ReplyDelete
  3. Hi John, thanks a lot for the ideas and the code.

    I was told that downloading 3 GB of packages would take 4 days.

    With your solution above, it will be done in just a few hours.

    I've shared your solution on Launchpad for the benefit of others as well :
    https://bugs.launchpad.net/ubuntu/+source/apt/+bug/313680?comments=all

    Thanks again.

    ReplyDelete
  4. Hello!

    I had the following problem:

    for example aptitude install kde
    makes aptitude to start two simultanous
    downloads for the first two packages.
    Howewer both always get stuck at 0%.

    aptitude download [somesinglebackage]
    works good

    So i tried your script using xargs -l1 -P1
    and now i could download all packages at
    once.

    Any idea why I have to do this?

    Thanks for any input / Patrik

    ReplyDelete
    Replies
    1. No idea Patrik, sorry. I'd assume the xargs speed would be the same as "aptitude download."

      thanks for the post!

      Delete