Aae34a7973a8d98e53764a1c89090c55

Hello,
I'm a web-scrapping enthusiast and I script short one-liners in bash using sed, awk, perl, grep, tail, head, tr,... that sort of programs. Here's a really cool perl one-liner that basically extracts values from any xml(html) tag. You should try it. Can you make it shorter, or any more powerful?
Cheers,
Guillaume

curl http://www.cnn.com | perl -ne 'm/>([^<].*?[^>])<\// && print$1."\n"'

Refactorings

No refactoring yet !

D41d8cd98f00b204e9800998ecf8427e

V

November 13, 2007, November 13, 2007 22:57, permalink

No rating. Login to rate!

didn't test it, so this could be wrong.

curl http://www.cnn.com | perl -ne 'm/>([^<>]*?)<\// && print$1."\n"'
Aae34a7973a8d98e53764a1c89090c55

griflet

November 20, 2007, November 20, 2007 18:00, permalink

No rating. Login to rate!

Tested. Works. I also added a sed command to remove blank lines. Anyone cares to insert that in the perl one-liner, for sports?

curl http://www.cnn.com | perl -ne 'm/>([^<>]*?)<\// && print$1."\n"' | sed -e '/^$/d'
Ff0bd1a8c9502aac62868cabf40b2b7d

pascal.charest

February 5, 2008, February 05, 2008 21:19, permalink

No rating. Login to rate!

Here is another version.

Using curl -s flag enable silent mode, you won't have a progress bar on your terminal output.

Using +? instead of *? remove a lot of empty line that were matched by a succession of tags.

curl -s http://www.cnn.com | perl -ne 'm/>([^<>]+?)<\// && print$1."\n"'
C9dde8ac6e533576dd3ed5332beb2dc7

Antonio Pires de Castro Júnior

December 29, 2009, December 29, 2009 17:58, permalink

No rating. Login to rate!

Uma única linha em RUBY para ler arquivo XML, contar quantas ocorrências de cada linha e mostrar quantas vezes cada uma aparece, ordenado. SHOW no PERL....

Como o resultado mostra cada ocorrência e quantas vezes ela aparece, coloquei o caractere ";" para separar, assim vcs podem colocar o resultado em um arquivo com extensão .cvs e abrir na planilha eletrônica. Depois podem montar seus gráficos... Enjoy it...

Basta copiar a linha abaixo para dentro de um arquivo vazio, mudar o local do diretório ou usar ARGV[0] no lugar e passar na execução deste script: ruby programa.rb

File.open('/usr/local/teste/xml-cvs/ssp.xml').readlines.join.to_s.scan(/\>(.*)\</).uniq.sort.each {|dados| puts "#{dados};#{ File.open('/usr/local/teste/xml-cvs/ssp.xml').readlines.join.to_s.scan(/\>(.*)\</).count(dados)}"}

Your refactoring





Format Copy from initial code

or Cancel