require 'net/http'
require 'rexml/document'
require 'download.rb'
max_connections = 5 #max simultaneous connections/threads to run
nzb =[] #array to hold parsed nzb
import_file ="" #used to hold raw imported nzb's
#opens the nzb and read each line
File.open(ARGV[0], "r") do |aFile|
aFile.each_line do |line|
import_file += line
end
end
# process the nzb file using xml processing
# this needs to be redone as its slow
doc = REXML::Document.new(import_file)
count = 0
#confusing to look at but it searchs for <groups><group></group></groups>
while defined? doc.root.elements.to_a[count][1][1][0]
group = doc.root.elements.to_a[count][1][1][0]
#searchs for <segments>
if doc.root.elements.to_a[count][3][3]
count2 = 1
#searchs for multiple <segments>
while defined? doc.root.elements.to_a[count][3][count2][0]
segment = doc.root.elements.to_a[count][3][count2][0]
nzb << [group,segment]
#puts segment
count2 += 2
end
else
segment = doc.root.elements.to_a[count][3][1][0]
#inserts them into an array ex. ["alt.binaries.warez", "1262522465.84379.1@news.astraweb.com"]
nzb << [group,segment]
end
count+=1
end
Refactorings
No refactoring yet !
Elij
January 10, 2010, January 10, 2010 01:30, permalink
When dealing with XML there are 2 approaches -- DOM or SAX -- first method loads the entire xml block into a runtime object (REXML::Document.new.) The second gives serial access and should be faster and have an even memory footprint.
Also REXML is notoriously slow -- I'm don't know much about the ruby scene to recommend an alternative but from here it looks like you'll need to move to SAX and something other than REXML.
Jose
January 10, 2010, January 10, 2010 05:36, permalink
Yeah, I would look into Nokogiri for XML parsing, the differences in speed between REXML and Nokogiri are quite large actually. If it is an absolute necessity to be as fast as possible I would use the libxml library directly.
Jose
January 10, 2010, January 10, 2010 18:46, permalink
Here is something that gives the exact same output using Nokogiri, the entire import file section is not required at all and is replaced in one line.
require 'rubygems'
require 'nokogiri'
nzb = []
doc = Nokogiri::XML(File.read('filename.xml'))
doc.css('file').each do |file|
nzb << file.css('group').first.content
nzb << file.css('segment').first.content
end
bain19.myopenid.com
January 11, 2010, January 11, 2010 03:41, permalink
Awesome work Jose, i had to make one change but it works a million times quicker. The issue was there could be multiple segments for a file. So i just chucked in another loop
doc.css('file').each do |file|
file.css('segment').each do |seg|
nzb << [file.css('group').first.content, seg.content.strip]
end
end
I was bored during ruby class and started working on a nzb client program out of ruby. I got it working fairly well but its slow when it comes to parsing the xml files.
this is what I have done so far
I have pasted the xml parsing part only.
Here is the entire program http://ruby.pastebin.ca/1744726
example nzb file http://ruby.pastebin.ca/1744710