8d8978eba1922a74b91c4b361c7706cc

I'm trying to implement a solution with a radixSort algorithm. The input will be T lines each containing a number N, where 0 <= T, number_of_digits(N) <= 10^6.

I made a naive implementation, I think this can be optimized a lot, but I can't find a way to avoid list replication (my haskell foo is not that evolved yet).

I managed to optimize the input processing time using ByteString (from 58 secs to 7 secs on a compiled source with a -O3 flag), but still the algorithm is slow, (It must execute at least in 5 secs).

import Control.Monad (mapM_, liftM)
import Data.List (sort)
import qualified Data.ByteString.Lazy.Char8 as L

main = L.interact processInput

processInput :: L.ByteString -> L.ByteString
processInput = L.unlines . map printInt . radixSort . map (maybe 0 fst . L.readInt) . L.lines
  where
    printInt = L.pack . show

radixSort :: [Int] -> [Int]
radixSort xs = sortOnDigit 1 xs
  where
    sortOnDigit n ls = 
      let 
        buckets = createBucketsFor $ (listWithDigit n ls)
        result  = concat buckets
      in
        if length (buckets !! 0) == length xs
        then result
        else sortOnDigit (n + 1) result
    

listWithDigit :: Int -> [Int] -> [(Int, Int)]
listWithDigit n xs 
  | n <= 0 = error ("Index " ++ show n ++ " not valid")
  | otherwise = zip xs $ map getDigit xs
  where
    getDigit x = (x `div` (10^(n-1))) `mod` 10

createBucketsFor :: [(Int, Int)] -> [[Int]]
createBucketsFor xs = zipWith (elementsEndingOn) [0..9] (repeat xs)
  where
    elementsEndingOn :: Int -> [(Int, Int)] -> [Int]
    elementsEndingOn n xs = [val | (val, digit) <- xs, digit == n]

Refactorings

No refactoring yet !

55ca8e1248605bf5e1819ce2c47c160f

sargon

November 10, 2009, November 10, 2009 09:37, permalink

No rating. Login to rate!

why radixsort ?
Doesn't fit mergesort better ?

34db50b7ce2e115afadf5a765b950739

Thomas Salvador

November 15, 2009, November 15, 2009 00:06, permalink

No rating. Login to rate!

hi sargon:

radixsort is faster than mergesort, as it uses knowledge of the structure of data to sort.

speed of radixsort is O(mn) for n keys of length m => O(n) here for that constant m.
mergesort would perform as always with O(n log n), hence significant slower.

regards,
thomas.

55ca8e1248605bf5e1819ce2c47c160f

sargon

November 16, 2009, November 16, 2009 23:44, permalink

No rating. Login to rate!

joa.
But this is only a landuau notation trick,
as long as m > log n the dropped constant is bigger.

And haskell is a functional language thus it is dump, sry,
to not implement a algorithm which scale on parallel architectures.

http://www.haskell.org/haskellwiki/GHC/Data_Parallel_Haskell

And for the performance of this algorithm, I wouldn't transform the
string into numbers and back. Strings and thus lists are very effective in haskell ;)

Dc61c1b2045efa54375188bdb6a33945

Jedai

November 21, 2009, November 21, 2009 18:29, permalink

No rating. Login to rate!

Ok... radix sort by itself is doubtlessly faster than mergesort (at least in your case), but your implementation is very slow.
For instance your createBucketsFor() traverse your list 10 times each call whereas a good radixsort would only do it once, also your test of termination (length (buckets !! 0) == length xs) have to traverse the whole bucket every time (is in O(n), even if GHC is smart enough to optimize away the length xs) whereas you could replace it by (all null . map (buckets !!) $ [1..9]) which would be O(1). Also you try to do buckets by the decimal representation of your Int instead of using the underlying binary representation.

In other words, a small test show your implementation as at least 3 times slower than the simple mergesort implemented as sort in the GHC standard library (Data.List). Of course a good radix sort can do even better and it so happen that there already is a radix sort in the standard library, though it doesn't present itself like that you can use Data.IntSet ! It is far faster than sort() and incomparably better than your implementation.

import qualified Data.IntSet as IS

radixSort = IS.toList . IS.fromList
Dc61c1b2045efa54375188bdb6a33945

Jedai

November 21, 2009, November 21, 2009 20:37, permalink

No rating. Login to rate!

To have an idea of the performance, I used Criterion to compare sorting times of [999999, 999998..1] :
my radixSort took 0.5s, mergesort took 1s and your implementation took 6.5s (so 13 times slower) and it only get worse (or better depending on which sort you use) with more Ints.

928b47489462b9301d8e7026d9b074cf

SemenovEvgeni

July 22, 2010, July 22, 2010 11:26, permalink

No rating. Login to rate!

it was very interesting to read.
I want to quote your post in my blog. It can?
And you et an account on Twitter?

Your refactoring





Format Copy from initial code

or Cancel