12d6a2e5971e661ee2bf2994526e848d

I needed to read in a CSV file with an enormous number of columns (6194 to be precise) and delete the columns that were entirely 99s to signify no data. I transposed the array and then went through it in reverse using slice to remove the columns that averaged 99. I also needed to keep columns that contained "item" in the column header.

It's pretty kludgy and I'd like to know how to improve it.

# sum an an array's elements
def sum_row(a)
  sum = 0
  a.each do |item |
    sum = sum + item.to_i
  end

  return sum
end  





  transposed_array = array_of_condition.transpose()
  j = transposed_array.size
  transposed_array.reverse_each do |row|
    sum = 0
    temp_row = row.dup
    temp_row.slice!(0) # remove the column header
    row_sum = sum_row(temp_row)
    row_mean = (row_sum / temp_row.size)
    puts "Row mean = " + row_mean.to_s + " Row sum = " + row_sum.to_s
     if  ((row_mean == 99) and (row[0].to_s.include?("item")== false ))
       transposed_array.slice!(j-1)
       puts "Column deleted: " + j.to_s
     end
     j = j - 1
  end

  
  final = transposed_array.transpose()

Refactorings

No refactoring yet !

D7a31f22c11776898826f7c1ed0ff80a

mischamolhoek.myopenid.com

December 18, 2008, December 18, 2008 08:35, permalink

1 rating. Login to rate!

for starters you can replace the sum_row function for a inject call.
As you can see i also don't slice the first element off, and use [1..-1] instead.

row_sum = row[1..-1].inject(0){|s, e| s+e}
D41d8cd98f00b204e9800998ecf8427e

reima

December 23, 2008, December 23, 2008 02:50, permalink

1 rating. Login to rate!

Assuming every row has value 99 just because the mean is 99 is wrong (in general). Computing the mean also hides what you really intend to do (finding rows with only 99s). Using Enumerable#all? makes the code cleaner, shorter, and correct all at the same time. Be advised that this code still copies the array twice (when calling transpose) and again every column (when slicing). If you are concerned about efficiency then you should try to solve the problem in a way which avoids those copies.

final = array_of_condition.transpose.delete_if do |column|
  !column[0].include?("item") && column[1..-1].all? {|entry| entry.to_i == 99}
end.transpose
Be072eb0e9f6f81c20541150cabce3ab

Ollie

December 23, 2008, December 23, 2008 05:57, permalink

1 rating. Login to rate!

The ideal solution (from a code maintenance standpoint) would express the domain using classes and objects. Whether the effort involved to do such a thing is justified depends on many things, most importantly, the probability that your script will need to be modified with additional capabilities in future.

OP, what is the purpose of your question -- to what extent would you like your code refactored?

12d6a2e5971e661ee2bf2994526e848d

srboisvert.myopenid.com

December 25, 2008, December 25, 2008 15:11, permalink

No rating. Login to rate!

It's mostly that I just want to see how people would refactor it for learning purposes. I'm not a ruby wizard so my coding style if mostly c++ with the use of ruby libraries. The code already accomplished the specific purpose I set out to create it for (fixing a fubar data file) but I will probably need it, or a variant of it, again at some point.

04ced9c763644293d1bd77ee75185da8

spiral

January 14, 2009, January 14, 2009 10:29, permalink

1 rating. Login to rate!

im new to this and im trying to get a csv file and convert it to an excel sheet but i need to keep the header the header is always the 1st line how do i define this iv tryed using a counter ne help would be great

12d6a2e5971e661ee2bf2994526e848d

srboisvert.myopenid.com

January 14, 2009, January 14, 2009 13:26, permalink

No rating. Login to rate!

Hi Spiral. You're asking the wrong the type of question for this place. It is about refactoring existing code. Your best bet would be to try a place like stackoverflow.com

Your refactoring





Format Copy from initial code

or Cancel