8571728785e5e350ddb58cd458533b12

I'm trying to create Bit.ly style URLs identifiers given an input number. These can be completely sequential, and should be able to be generated without duplicates in O(1) time. Basically I don't want to have to check if each code is unique, and they need to be small. I have a really shoddy implementation now that SEEMS to work, but it looks like there should be a better way of accomplishing this.

Here's some sample input I'd think

#   code_for(0) => 0
  #   code_for(62) => 10
  #   code_for(124) => 20
  #   code_for(1000000) => 4c92

  CHARS = ('0'..'9').to_a + ('a'..'z').to_a + ('A'..'Z').to_a

  def self.code_for(identifier)
    new_code = ""
    begin
      current = Integer(identifier % CHARS.length)
      new_code = CHARS[current] + new_code
      identifier = (identifier - current) / CHARS.length
    end while identifier != 0
    return new_code
  end

Refactorings

No refactoring yet !

D41d8cd98f00b204e9800998ecf8427e

steved

February 11, 2010, February 11, 2010 18:43, permalink

No rating. Login to rate!

Fixnum#to_s takes an optional base argument between 2-36

>> 1000000.to_s(16)
=> "f4240"
D41d8cd98f00b204e9800998ecf8427e

bob

February 11, 2010, February 11, 2010 20:11, permalink

No rating. Login to rate!

@steved: Yea but his base is 62, and includes lowercase and uppercase letters separately

D41d8cd98f00b204e9800998ecf8427e

steved

February 12, 2010, February 12, 2010 16:43, permalink

No rating. Login to rate!

@bob: Yea, but his _real_ requirement is bit.ly type urls. You could debate whether they should be case-sensitive or not, but base 36 will let you handle a lot urls and still have small identifiers.

>> 1_000_000_000.to_s(36)
=> "gjdgxs"
>> "gjdgxs".to_i(36)
=> 1000000000
D7908f05c89e965f6bc5308ad6f41256

steenslag

February 19, 2010, February 19, 2010 21:16, permalink

No rating. Login to rate!

For ruby 1.8 there is the gem base62. (Version 0.1.1 on github is compatible with ruby 1.9). It encodes and decodes.

require 'base62'
puts 123456789.base62_encode # => "8M0kX"
puts "8M0kX".base62_decode   # => 123456789
25ff3dfe48d3847ecf9971aab99589fb

mxcl

February 22, 2010, February 22, 2010 11:17, permalink

No rating. Login to rate!

I experimented with this too, with a very similar base 62 implementation. I found that using String.to_i(base) was 6 times faster than the best I could do. And as pointed out above you still get pretty tiny URLs for a lot of shortens.

Also there is the additional accessibility bonus that people can more easily read the url out loud (without worrying about caps), though I agree this is not really the point, but it's something.

The base62 gem may be worth a look at though.

D41d8cd98f00b204e9800998ecf8427e

steenslag

March 6, 2010, March 06, 2010 00:05, permalink

No rating. Login to rate!

I just found the radix gem ( http://rubyworks.github.com/radix/ ) by Thomas Sawyer, which provides the means of converting to and from any base. It defaults to base: 62! Representation strings are user-definable, upto base 62.

Your refactoring





Format Copy from initial code

or Cancel