Railscasts crawler (Download all screencasts easily)

I wrote this script, as I felt really annoying to download each screencast everytime I needed them. So I ended up writing script to automatically download all screencasts without hassle i.e crawler.

Its in Ruby of course :-) and requires simple Hpricot gem.
If you not having it. Just type this command in your terminal –

$gem install hpricot

–Rest include this script in your /lib folder
–Change the path in the script where you want to download all the screencasts
–Go to your projects development environment (script/console) and run the script by these commands–
video = Railscasts.new #new Railscasts object
video.start #will start downloading all screencasts from Railscasts
Note:
  1. If you stop the script in between manually, it will not download those screencasts that are already in your hard disk.
  2. All logs are maintained in Railsproject/log/railscasts.log.
  3. Handling all exceptions raised

Improvements/Suggestions  are appreciated.

Thanks
And yes script:
# Author : Akshay Gupta
#file: railscasts.rb
# First check you have all gems installed. Place the script in /lib folder and run the script.
# I don't have expertize in ruby, please tell how it can be improved.
# change the path accordingly, where you want to save path
# My working env is on MacOS, one need to make some changes if running on Windows

require 'rubygems'
require 'hpricot'
require 'open-uri'
require 'logger'
$log = Logger.new('log/railscasts.log')
$path = "/Users/akshaygupta/railsvideo/railscasts/"
$stop = false

class Railscasts
attr_accessor :url

def initialize
  @@page = 1
  @@url  = "http://railscasts.com/episodes?page="
  start
end

def url
  @url = @@url+@@page.to_s
end

def start
  url
  build_doc
  screencasts_links
  download_screencasts
  next_page
  if !$stop
    start
  else
    puts "Successfully done :)  Enjy all the screencasts"
  end
end

def build_doc
  begin
    $log.info("*********Fetching #{@url}***********")
    @doc = Hpricot(open(@url))
  rescue Exception => e
    $log.debug("Problem fetching #{e}")
  end
end

def screencasts_links
  begin
    @download_links =
      (@doc/".download/a[1]").collect {|a| (a.search("[@href]").first[:href])}
    $log.info(" All Download links on this page :\n #{@download_links}")
  rescue
    $log.info("Problem in download links")
  end
end

def download_screencasts
  @download_links.each do |mov|
    begin
      file = mov.split('/').last
      res = `cd #{$path}; ls | grep "#{file}"`
      if !res
        $log.info("Now downloading file #{file}")
        result = `cd #{$path}; wget "#{mov}"`
        if result
          $log.info("Successfully Downloaded #{file}")
        end
      else
        $log.info("Already downloaded #{file}")
      end
    rescue Exception => e
      $log.info("problem downloding file #{e}")
    end
  end
end

  def next_page
    if @@page < 17
      @@page += 1
    else
      $log.info("All screencasts downloaded :-) , Mission accomplished!!")
      $stop = true
    end
  end
end
Advertisement
  1. Great script. Why don’t you create a gist of it? http://gist.github.com

  2. Thanks buddy for the appreciation and suggestionOk will do it soon, as working on some other scripts altogether.

  3. Simple and effectiveI just changed the "if !res" line to "if res.empty?" because the script was telling me that I already have all the casts!thank you for the script, and keep up the good work!

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.