55502f40dc8b7c769880b10874abc9d0

Here's a quick and dirty bash script that takes a file and looks for a particular line, then loops through those lines and splits the contents to get the 8th column and create an sql insert statement with it. I'm not looking to make it more generic, but the for loop is unbearably slow. How to improve it?

SCRIPT="gen_${1}.sql"
echo "--gen_${1}.sql" > $SCRIPT
typeset -i counter
counter=0

echo "Get all lines that start with D|4| (this may take a while)"
LINES=$(grep "D|4|" $1)

echo "Generate insert statements to $SCRIPT"
for LINE in $LINES; do
  REFNO=$(echo $LINE | awk '{split($0,a,"|");print a[8]}')
  echo "insert into CHEQUE (datecomptable, noreference) values (to_date('2007/11/30','YYYY/MM/DD'),'${REFNO}');" >> $SCRIPT
        counter=counter+1
done

echo "commit;" >> $SCRIPT
echo "${counter} records created in $SCRIPT"

Refactorings

No refactoring yet !

Ff0bd1a8c9502aac62868cabf40b2b7d

pascal.charest

February 8, 2008, February 08, 2008 05:23, permalink

No rating. Login to rate!

Not sure you'll get that great of a speed up, but you should modify line 7 to be like following. The partern then require a start of line.

LINES=$(grep "^D|4|" $1)
5a00a3a98dcf6f9cd717440fd2b606e5

Eineki

February 8, 2008, February 08, 2008 10:39, permalink

No rating. Login to rate!

Just a quick thought
Maybe cut is more efficent than awk for a simple split.
If I'm right, line 11 REFNO=$(echo $LINE | awk '{split($0,a,"|");print a[8]}') should change in

NOTE: Assuming than awk start counting fields from 1

REFNO=$(echo $LINE | cut -d\| -f8)
55502f40dc8b7c769880b10874abc9d0

furtive

February 8, 2008, February 08, 2008 19:49, permalink

No rating. Login to rate!

It didn't make a difference performance wise but I really like the clarity of the cut command, I didn't know it existed. Thanks!

5a00a3a98dcf6f9cd717440fd2b606e5

Eineki

February 10, 2008, February 10, 2008 00:57, permalink

No rating. Login to rate!

Hi, the following code don't fire a lot of subshell and this should do a big difference.
With cut you don't have to process line per line the result of the grep and sed "decorates" the single lines
creating the sql statements you need.

Let me know if this solution works faster than the original one

Just A test I've done:
harvesting from a file of near 500000 lines tooks about 5 seconds
on a athlon64 3800 with 1Gb Ram as you can see in the time output

$ time ./test database
Generate insert statements to gen_database.sql (this may take a while)
469573 records created in gen_database.sql

real 0m5.633s
user 0m4.996s
sys 0m0.380s

SCRIPT="gen_${1}.sql"
echo "--gen_${1}.sql" > $SCRIPT
typeset -i counter
counter=0
INSERT_STATEMENT="insert into CHEQUE (datecomptable, noreference) values (to_date('2007\/11\/30','YYYY\/MM\/DD'),'&');"

echo "Generate insert statements to $SCRIPT (this may take a while)"
$(grep "D|4|" $1 | cut -d\| -f8 | sed -e "s/.*/${INSERT_STATEMENT}/" >> $SCRIPT)
counter=$(expr `wc -l ${SCRIPT} | cut -f1 -d\ ` - 1)

echo "commit;" >> $SCRIPT
echo "${counter} records created in $SCRIPT"
E72ac5fa4658d70354cc7024cb515179

Andrew Gilmartin

February 11, 2008, February 11, 2008 16:34, permalink

No rating. Login to rate!

The only bash change I would make is to replace the use of the back-tick with a nested expression.

counter=$(expr $(wc -l $SCRIPT | cut -f1 -d\ ) - 1)
D41d8cd98f00b204e9800998ecf8427e

grrrreg

February 19, 2008, February 19, 2008 14:44, permalink

No rating. Login to rate!

I think you should replace the for loop with an awk command. Awk knows how to grep.
note: I use a "while read" loop with "echo" because printing ' inside awk is such a mess.

SCRIPT="gen_${1}.sql"
echo "--gen_${1}.sql" > $SCRIPT

typeset -i counter
counter=0

echo "Generate insert statements to $SCRIPT (this may take a while)"

awk -F \| ' /D\|4\|/ { print $8} ' $1 | while read DATE;do
   echo "insert into CHEQUE (datecomptable, noreference) values (to_date('2007/11/30','YYYY/MM/DD'),'${DATE}');" >> $SCRIPT

done

counter=$(grep -v -- "--${SCRIPT}" ${SCRIPT} | wc -l)

echo "commit;" >> $SCRIPT
echo "${counter} records created in $SCRIPT"

Your refactoring





Format Copy from initial code

or Cancel