How a Perl script helped me learn Ruby
Back when Rails was all the rage, I wanted to learn Ruby. But it wasn’t Rails that taught me Ruby.
It was a Perl-based password generator.
I don’t remember what the circumstances were–it could be that pwgen wasn’t available on OS X–but for whatever reason, I chose to use a script named SopPasswd.pl, by someone named Mark A. Pors. I tried to pull up his old website, but it redirects to average-photography.com now.
Here’s the script, reproduced from this link without permission:
#!/usr/bin/perl -w
# SopPasswd: A generator for Sort-of-pronounceable passwords.
# Version: 0.1
# Author: Mark A. Pors, mark@dreamzpace.com, www.dreamzpace.com
# License: GPL
use strict;
my $dict = '/usr/share/dict/words'; # path to dict file
my $wordlen = 8; # desired length of the password
my $numwords = 10; # number of passwords to print
my $sublen = 3; # length of the word chunks that create the password
my $sep = "\n"; # how to separate the words
my @dict;
$wordlen >= $sublen || die "Error: The word length should be equal or larger than the length of the 'chunks'\n";
open (DICT, "<$dict") || die ("Cannot open dict: $!");
while (<DICT>) {
chomp;
push (@dict, $_);
}
while (1) {
my @sub = ();
my $word;
my $parts = int ($wordlen/$sublen);
for (1 .. $parts) {
my $try = $dict[rand @dict];
redo if length($try) < $sublen;
$word .= lc substr($try, 0, $sublen);
}
my @chars = split(m{}xms, $word);
my $upper = rand @chars;
$chars[$upper] = uc $chars[$upper];
$word = join(q{}, @chars);
my $left = $wordlen % $sublen;
$word .= substr (int rand (10**($wordlen - 1)), 0, $left);
print $word . $sep;
chomp (my $exit = <STDIN>);
}
That’s fairly readable for Perl, but at the time, I tended to think of perl as line noise, and that looks a little different than what I reemember.
About that time, Rails was all the rage. I had an internal project at work I wanted to work on, and Rails seemed to be a good fit. I’d implemented an earlier version in Zope, but just didn’t have a machine I could dedicate to it.
One minor problem: I knew nothing about Ruby. Sure, the “blog in 10 minutes” videos were sexy, but it was clear that they were by people who knew Rails intimately.
I had checked out various tutorials, and at the time, _why’s tutorials were among the best. The thing is, _why’s style was, well, odd. It just didn’t click with me. I had put the project on the back burner for a while.
I looked at the script, realized I wanted to keep using it, but wanted something I was comfortable with working with. And thus, SopPasswd.rb became my pet project.
What it does
All it does is it reads in /usr/share/dict/words
and chops the words into
$sublen lengths, and strings them together into a word that’s approximately
$wordlen long, with digits from 0-9 as padding.
Now, unfortunately, I didn’t keep all my code, but I do have an old version I just resurrected from my git repo. With that in mind, I’ll revisit the train wreck of the original, as I remember it, and what I did instead.
What I did was as straight a port of the original script as I could, at work during a slow day. I was working on a Mac G4 Gigabit Ethernet machine at the time, which iirc ran at a blazing 400MHz. After a few minutes of thinking and poring over the Pickaxe book, I came up with a script, and ran it…
…and 30 seconds later, I had a list of passwords.
What had gone wrong?
Perl is fast.
The thing is, I did a straight port. One thing to keep in mind about Perl is that it’s fast. Blazingly fast. It’s such an astoundingly amazing language that asking whether it’s interpreted or compiled isn’t even that simple of a question: it’s more like an interpreted language that gets compiled on the fly. For all its warts, it’s an amazing language.
Ruby, like Python, is an interpreted language. Both languages tend to have a “fast enough” philosophy, meaning that they haven’t put the time in to make their languages as fast as Perl, preferring instead to provide ways of speeding up your code.
So, where this takes no time at all:
open (DICT, "<$dict") || die ("Cannot open dict: $!");
while (<DICT>) {
chomp;
push (@dict, $_);
}
This is a major bottleneck on older hardware:
dict = []
begin
d = File.open(DICT,"r")
d.each {|f| dict << f.chomp }
rescue
raise RuntimeError, "#{DICT} not found"
end
Ruby is slow, but we can work around that
Before I started to learn about Ruby, I had been learning about Python, and if there was one thing I retained about Python, it was that part of the “fast enough” philosophy was knowing when to offload computation onto a built in function. With that in mind, I pored over the Pickaxe book again, and came up with this:
begin
dict = File.readlines(DICT)
rescue
raise RuntimeError, "#{DICT} not found"
end
Now, readlines
retains newlines, which the Perl script fixed by invoking
chomp
on each line. For the heck of it, I decided to try adding this:
dict.each {|f| f.chomp!}
That brought back the bottleneck.
So, what do we do here? Well, the ultimate question is, do we need to chomp()
all the lines? Not really; when I got right down to it, when I did the sanity check for line length, I just took the word.length()
and subtracted one.
So we started with this:
for (1 .. $parts) {
my $try = $dict[rand @dict];
redo if length($try) < $sublen;
$word .= lc substr($try, 0, $sublen);
}
Here was the ultimate result:
parts.times do
word = Iconv.conv("UTF-8", "ISO_8859-1", dict[rand(dictlen).to_i])
wordlen = word.length - 1
redo if wordlen < sublen
wrange = rand(wordlen-sublen).to_i
myword = word[wrange..wrange+sl]
redo if (!(myword =~ vowel) || (myword =~ badmatch))
print myword.downcase
end
I know that’s a lot longer, but bear with me.
This line:
word = Iconv.conv("UTF-8", "ISO_8859-1", dict[rand(dictlen).to_i])
was necessary because I was getting an error when I moved this from Ruby 1.8 to Ruby 1.9. I haven’t tried it lately (more on that below) but hoepfully that kludge isn’t necessary anymore; it converts UTF-8 text to ISO_8859-1.
wordlen = word.length - 1
Instead of running chomp()
, I subtract one from the line length, which has the same effect.
redo if wordlen < sublen
This goes back to the top of the loop if the chosen word is shorter than sublen
.
wrange = rand(wordlen-sublen).to_i
This chooses a random number between 0 and the length of the word, minus the length of sublen
.
myword = word[wrange..wrange+sl]
If you’re not familiar with Ruby, this chooses a range from wrange
to wrange + sublen
.
redo if myword =~ badmatch or not myword =~ vowel
There are two regular expressions defined before that:
badmatch = /\W|\'/
vowel = /^[aeiou]/i
badmatch
checks for non-word characters, and an apostrophe. vowel
is a case-insensitive check for vowels. I decided that checking for vowels in the sublen led to more usable passwords.
And, the random numbers
So then this
my $left = $wordlen % $sublen;
$word .= substr (int rand (10**($wordlen - 1)), 0, $left);
Became this:
num.times {print rand(10).to_s}
and num
was calculated like this:
num = wordlen % sublen
The Perl users probably noticed the original script interpolated a string, while mine just sends stuff to stdout. String interpolation tended to be another bottleneck on the old machine, and print just worked fine for me.
The whole thing:
So to keep this a little shorter, I had started on this toy problem with the goal of getting the Ruby version as close to Perl performance as possible, and keep the line count as close as possible. Since I was using it all the time but wanted different options, I eventually changed the hardcoded wordlengths, sublen lengths, and number of words from being hardcoded, to being read from stdin with sane defaults set.
So, here’s the whole thing:
#!/usr/bin/env ruby1.9
# Port of the Perl script SopPassword, now ported to Ruby 1.9
def usage
puts <<END
SopPasswd: A generator for Sort-of-pronounceable passwords.
Version: 0.9
Author of Perl version: Mark A. Pors <mark@dreamzpace.com>
Ruby port: Shane Simmons <regeya@earthlink.net>
License: GPL
Usage: SopPasswd [OPTION]
-h, --help: this message
-l, --wordlen: Change the length of the word (default: 7)
-s, --sublen: Change the length of word segments (default: 3)
-n, --numwords: Change the number of words (default: 100)
-d, --dictionary: Full path to dict/words (default: /usr/share/dict/words)
END
Process.exit
end
require 'getoptlong'
require 'iconv'
fn = '/usr/share/dict/words'
wordlen = 8
numwords = 20
sublen = 3
word = ""
list = []
opts = GetoptLong.new(
["--wordlen", "-l", GetoptLong::OPTIONAL_ARGUMENT],
["--numwords", "-n", GetoptLong::OPTIONAL_ARGUMENT],
["--sublen", "-s", GetoptLong::OPTIONAL_ARGUMENT],
["--help","-h", GetoptLong::OPTIONAL_ARGUMENT],
["--dictionary","-d", GetoptLong::OPTIONAL_ARGUMENT])
opts.each do |opt, arg|
case opt
when "--help" then usage
when "--wordlen" then wordlen = arg.to_i
when "--numwords" then numwords = arg.to_i
when "--sublen" then sublen = arg.to_i
when "--dictionary" then fn = arg
end
end
parts = (wordlen/sublen).to_i
sl = sublen - 1
badmatch = /\W|\'/
vowel = /^[aeiou]/i
dict = File.readlines(fn)
dictlen = dict.length
num = wordlen % sublen
numwords.times do
parts.times do
word = Iconv.conv("UTF-8", "ISO_8859-1", dict[rand(dictlen).to_i])
wordlen = word.length - 1
redo if wordlen < sublen
wrange = rand(wordlen-sublen).to_i
myword = word[wrange..wrange+sl]
redo if myword =~ badmatch or not myword =~ vowel
print myword.downcase
end
num.times {print rand(10).to_s}
puts
end
So, that’s still working for you?
Do I still use it? Did I switch to pwgen? Nah, I decided to switch toy problems and write a new password generator as a markov chain generator. It’s almost instantaneous, doesn’t have to know anything about the language of the dictionary file, and doesn’t have to read in the dictionary every time. The output is identical.