09 - Regular Expressions

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
-- Jamie Zawinski

  • Don't use regular expressions if you just need plain text search in string: string['text']

  • For simple constructions you can use regexp directly through string index.

    match = string[/regexp/]             # get content of matched regexp
    first_group = string[/text(grp)/, 1] # get content of captured group
    string[/text (grp)/, 1] = 'replace'  # string => 'text replace'
    
  • Use non-capturing groups when you don't use the captured result.

    # bad
    /(first|second)/
    
    # good
    /(?:first|second)/
    
  • Don't use the cryptic Perl-legacy variables denoting last regexp group matches ($1, $2, etc). Use Regexp.last_match(n) instead.

    /(regexp)/ =~ string
    ...
    
    # bad
    process $1
    
    # good
    process Regexp.last_match(1)
    
  • Avoid using numbered groups as it can be hard to track what they contain. Named groups can be used instead.

    # bad
    /(regexp)/ =~ string
    # some code
    process Regexp.last_match(1)
    
    # good
    /(?<meaningful_var>regexp)/ =~ string
    # some code
    process meaningful_var
    
  • Character classes have only a few special characters you should care about: ^, -, \, ], so don't escape . or brackets in [].

  • Be careful with ^ and $ as they match start/end of line, not string endings. If you want to match the whole string use: \A and \z (not to be confused with \Z which is the equivalent of /\n?\z/).

    string = "some injection\nusername"
    string[/^username$/]   # matches
    string[/\Ausername\z/] # doesn't match
    
  • Use x modifier for complex regexps. This makes them more readable and you can add some useful comments. Just be careful as spaces are ignored.

    regexp = /
      start         # some text
      \s            # white space char
      (group)       # first group
      (?:alt1|alt2) # some alternation
      end
    /x
    
  • For complex replacements sub/gsub can be used with a block or a hash.

    words = 'foo bar'
    words.sub(/f/, 'f' => 'F') # => 'Foo bar'
    words.gsub(/\w+/) { |word| word.capitalize } # => 'Foo Bar'