Refactor
:my
=>
'code'
Codes
Refactorings
Popular
Best
Submit
Spam
Account
Logout
Login
JavaScript doesn't seem to be activated, expect things to be ugly and sloppy!
Learn How to Create Your Own Programming Language
createyourproglang.com
Recent
How to get accepted in Fileice (200% Working) 22/2012
Premium Account
FILE HOSTS PREMIUM ACCOUNT
ALL FILE HOST PREMIUM ACCOUNTS
Zynga Slingo Trainer v5.12
iTunes Gift Card Generator V3.1 2012
Diablo 3 GOLD Coins FREE
Working PS3 Jailbreak 3.65 And 3.66
ExtaBit Premium Accounts and Cookies
Steam Wallet Hack - Money Adder & Hack v3
Popular
XBOX POINTS GENERATOR - MICROSOFT POINTS GENERATOR v1.2012
11 may 2012 premium uploading accounts 100% working
Free Microsoft Points
Free Microsoft Points - Microsoft Points Generator - Xbox Live Codes 2012
Car Town Free Blue Points Hack
Free CarTown Blue Points Generator and CarTown Templates
Better way to get content via jQuery $.get()
Free Microsoft Points
Simple Days Purger
Sharecash Downloader Bypass Surveys New 05/2012
Pastable version of
Screen Scraping Google with Hpricot and Watir
<pre class='prettyprint' language='ruby'>require 'rubygems' require 'hpricot' require "watir" # Navigate to Google in a new IE instance # Search for the following: pirates vs ninjas # Iterate thought the first 10 pages of google # On each page collect the following # The blue page title # The green url # Output all the data into a file named test_output.txt # Format "Title - URL" # # Prepare Watir url = "http://www.google.com/" browser = Watir::Browser.new browser.goto url # Input our search term: "Gap, Inc." and click search button browser.text_field(:name, "q").set "pirates vs ninjas" browser.button(:name, "btnG").click results = Array.new # We loop through 10 pages of results for page in 1..10 # Bring Watir's HTML into Hpricot so we can easily process it parsed = Hpricot(browser.html) # Look in the LIs (class g) results << (parsed/"li.g").map do |ele| { # Grab the title from the inner_text of A (Blue Text) :title => ele.at("a").inner_text, # Grab the URL from the innter_text of Google's citation (Green Text) :url => (ele/"//cite").first.inner_text } end # Use Watir to head over to the next page unless we are done browser.link(:text, (page+1).to_s).click unless page >= 10 end # Time to write to file outfile = File.new("test_output.txt", "w") # Results are organized by page. results.each do |page_results| # Write each individual SERP's title and url page_results.each do |serp| # Clean the nasty - ## - stuff from the URL text outfile.puts serp[:title] + " - " + serp[:url].gsub(/ -.*-/, '') end end</pre> <a href="http://www.refactormycode.com/codes/673-screen-scraping-google-with-hpricot-and-watir" style="color:#fff" title="As seen on RefactorMyCode.com"><img alt="Small_logo" src="http://www.refactormycode.com/images/small_logo.gif" style="border:0" /></a>