Having spent some time recently on multi-threading a process to download a few thousand files from the Internet, I started to notice some interesting differences between cURL and the indomitable wget.
Some observations:
1. The out of the box config for cURL leads to far more refused connections, empty content or HTTP 302 (document moved) codes than wget. Whether this is because of the default auto-follow behaviour of wget which cURL doesn't do, or the way they identify, I don't know.
2. cURL seemed a little faster, but not in a significant way.
3. When asking both tools to download the file only if it is newer, cURL truncates the old file and puts header information into the file instead. Wget behaves properly. I assume there is an override for this behaviour, but the hackish work around is to keep two copies of your files - an archive copy to reference modified infomation, and the new copy if cURL sees there is a newer version available.
4. I find that cURL is probably more powerful (because of complexity), but wget is easier to get working Right Now.
And finally, if you are attempting anything similar, threading is a MUST - a single thread doing this kind of operation will block for timeouts, redirects and the like.
Introducing more threads will get the job done in almost 1/N time. Your connection is big enough these days, but dealing with thousands of remote servers is too uncertain to rely on and wait on one thread.
Comments [0]