Railscasts crawler (Download all screencasts easily)
I wrote this script, as I felt really annoying to download each screencast everytime I needed them. So I ended up writing script to automatically download all screencasts without hassle i.e crawler.
If you not having it. Just type this command in your terminal –
$gem install hpricot
–Rest include this script in your /lib folder
–Change the path in the script where you want to download all the screencasts
–Go to your projects development environment (script/console) and run the script by these commands–
video = Railscasts.new #new Railscasts object
video.start #will start downloading all screencasts from Railscasts
Note:
- If you stop the script in between manually, it will not download those screencasts that are already in your hard disk.
- All logs are maintained in Railsproject/log/railscasts.log.
- Handling all exceptions raised
Improvements/Suggestions are appreciated.
Thanks
And yes script:
# Author : Akshay Gupta #file: railscasts.rb # First check you have all gems installed. Place the script in /lib folder and run the script. # I don't have expertize in ruby, please tell how it can be improved. # change the path accordingly, where you want to save path # My working env is on MacOS, one need to make some changes if running on Windowsrequire 'rubygems' require 'hpricot' require 'open-uri' require 'logger' $log = Logger.new('log/railscasts.log') $path = "/Users/akshaygupta/railsvideo/railscasts/" $stop = false class Railscasts attr_accessor :url def initialize @@page = 1 @@url = "http://railscasts.com/episodes?page=" start end def url @url = @@url+@@page.to_s end def start url build_doc screencasts_links download_screencasts next_page if !$stop start else puts "Successfully doneEnjy all the screencasts" end end def build_doc begin $log.info("*********Fetching #{@url}***********") @doc = Hpricot(open(@url)) rescue Exception => e $log.debug("Problem fetching #{e}") end end def screencasts_links begin @download_links = (@doc/".download/a[1]").collect {|a| (a.search("[@href]").first[:href])} $log.info(" All Download links on this page :\n #{@download_links}") rescue $log.info("Problem in download links") end end def download_screencasts @download_links.each do |mov| begin file = mov.split('/').last res = `cd #{$path}; ls | grep "#{file}"` if !res $log.info("Now downloading file #{file}") result = `cd #{$path}; wget "#{mov}"` if result $log.info("Successfully Downloaded #{file}") end else $log.info("Already downloaded #{file}") end rescue Exception => e $log.info("problem downloding file #{e}") end end end def next_page if @@page < 17 @@page += 1 else $log.info("All screencasts downloaded
, Mission accomplished!!") $stop = true end end end
Advertisement
Great script. Why don’t you create a gist of it? http://gist.github.com
Thanks buddy for the appreciation and suggestionOk will do it soon, as working on some other scripts altogether.
Simple and effectiveI just changed the "if !res" line to "if res.empty?" because the script was telling me that I already have all the casts!thank you for the script, and keep up the good work!