2013年12月15日日曜日

How to download files automatically with python + cygwin

There are excellent lectures in coursera.
I like startup engineering the most.
From the lecture, we can get knowledge about how to utilize IaaS especially AWS.
AWS made it easy to get in touch with computing with good hardware without purchasing PC.

Then, I want to learn the lesson offline.
It is because there's no network infrastructure outside my room.
But it takes a lot of time to download all the contents by clicking through browser.....



So I implemented a python script to get all the files from the site.
The following script is one of the example for achieving that.

This script was confirmed to function well under cygwin + python2.5.



import subprocess, sys

# ---------------------
def wgetFunc(destFile, url):
 cmd = "wget --no-check-certificate -O " + destFile + " " + url
 print cmd
 subprocess.call(cmd, shell=True)
# ---------------------

# Parameters ----------
destDir       = "/cygdrive/d/tmp/"
searchStr     = "https://class.coursera.org/startup-001/lecture/download.mp4?lecture_id="
listUrl       = "https://class.coursera.org/startup-001/lecture/index"
localListFile = "index.html"
# ---------------------


print "[1] Create destination directory"
subprocess.call("mkdir " + destDir, shell=True)

print "[2] Get index file"
wgetFunc( destDir + localListFile, listUrl )

print "[3] Parse html to get file list"
F         = open( destDir + localListFile, "r")
urlList  = []
for line in F:
 line = line.replace("\"", "")
 suf  = line.find(searchStr)
 if suf != -1:
  suf = suf + len(searchStr)
  urlList.append( searchStr +  line[suf:] )

for line in urlList:
 print line,


print "[4] Confirm whether download or not"
answer = raw_input("Download? Y/N  :  ")
if answer == "N":
 print "Refuse to download."
 sys.exit()
elif answer== "Y":
 print "Downloading..."
 for i in range(0,len(urlList)):
  wgetFunc(destDir + str(i) + ".mp4", urlList[i] )
else:
 print "Please input Y or N."


When I try to do the same thing in MacOS, "wget" command should be installed at first.
It's very easy to do that by executing these commands.

* Download source files of wget
 $ sudo curl -O http://ftp.gnu.org/pub/gnu/wget/wget-1.13.4.tar.gz 

* Unzip the file.
 $ sudo tar zxvf wget-1.13.4.tar.gz Install it $ sudo ./configure --with-ssl=openssl
 $ sudo make
 $ sudo make install

* You can confirm if the installation is successful or not by typing...
 $ wget If installed, the following description will be returned.
wget: missing URL Usage: wget [OPTION]... [URL]... Try `wget --help' for more options.

0 件のコメント:

コメントを投稿