Writing SQL Functions in Ruby

I just released Amalgalite 0.6.0 with a couple of major enhancements. The biggest of which is the ability to write your own scalar and aggregate SQL functions in Ruby. You may not know them by that name, but that's the proper terminology according to SQLite.

I'm going to do a brief overview of what these function types are and then show how to write a scalar and an aggregate SQL function for SQLite using Amalgalite.

Scalar SQL Functions

Scalar SQL Functions are the functions used to do some sort of data transformation, or some test on a piece of data. They return a one value, based upon the input parameter(s). In the following statement the length() function is a scalar function.

SELECT name, length( name ) FROM country;

These pages give the complete list of built-in scalar functions in SQLite.

There are lots of functions in there that you probably recognize from general SQL usage: hex(), coalesce(), lower(), trim(), etc.

Aggregate SQL Functions

Aggregate SQL Functions are the functions used to do, well, aggregation of data across rows. They return a single value, based upon multiple rows of data. You use aggregate functions when you use GROUP BY SQL clauses. In this code example, the sum() function is an aggregate function.

SELECT country, sum( population ) FROM subcountry GROUP BY country;

Aggregate Functions lists the built-in aggregate functions for SQLite.

Writing a Scalar SQL Function

Sometimes you get to a point doing database work that you want your own scalar function, or you know of a scalar function in some other database that you want to use with Amalgalite. For instance the sha1() function is available in PostgreSQL and MySQL, so lets add it for use in our SQLite databases.

Functions are defined with respect to database instances in Amalgalite. They must be defined each time the database is opened.

require 'rubygems'
require 'amalgalite'
#
# Create a database and a contrived test table
#
db = Amalgalite::Database.new( ":memory:" )
db.execute( "CREATE TABLE test( filename, line_number, data)" )

#
# add some data
#
require 'open-uri'
db.prepare("INSERT INTO test( filename, line_number, data ) VALUES( ?, ?, ? )") do |stmt|
  lines = open("http://redmine.ruby-lang.org/repositories/entry/ruby-18/README.EXT?format=raw").readlines
  lines.each_with_index do |line, i|
    stmt.execute( "README.EXT", i + 1, line )
  end
end

#
# Define the sha1 SQL function
#
require 'digest/sha1'
db.define_function( 'sha1' ) do |x|
  Digest::SHA1.hexdigest( x.to_s )
end

And that is it. We just defined the sha1(t) SQL function for use in this database. We told SQLite that there is now a function available sha1 that takes a single argument x. Let's use it.

rows = db.execute("SELECT sha1('I love ruby')")
puts rows.first[0] # => d729b72eceebbc2c690ca9b586edd895d8bd4405

And use it on some actual table data too.

db.execute("SELECT data, sha1( data ) as sha1 FROM test") do |row|
  print "#{row['sha1']}: #{row['data']}"
end

Also look at the rdoc for Amalgalite::Database#define_function and Amalgalite::Function for other ways to implement a scalar function.

And a more complete example ships as examples/define_function.rb with Amalgalite.

Writing an Aggregate SQL Function

Aggregate functions are possibly more rare to write, but no less effective. An implementation of an aggregate function is different in that it has 2 methods and needs to save state. Therefore it is implemented as a class. Lets implement a wordcount(s) aggregate that counts the number of words.

# Implementation of the wordcount( s ) SQL aggregate
class WordCount < Amalgalite::Aggregate
    def initialize
      @name = 'wordcount'
      @arity = 1
      @word_count = 0
    end

    def step( str )
      @word_count += str.strip.split(/\s+/).size
    end

    def finalize
      return @word_count
    end
end

# define the aggregate in the database
db.define_aggregate( 'wordcount', WordCount )

We always define an aggregate as a class. The rules regarding the class are listed in the rdoc for Amalgalite::Aggregate.

The way it works is that for each row that is grouped into a single output row has the step() method called once, and then the finalize() method is called one time at the end. A new instance of the class implementing the aggregate is instantiated every time the aggregate is used in a statement.

row = db.execute("SELECT filename, wordcount( data ) as wc  FROM test GROUP BY filename").first
puts "#{row['filename']}: #{row['wc']}"  # =>  README.EXT: 4736
db.close

Also read the rdoc for Amalgalite::Aggregate and look at another example examples/define_aggregate.rb that ships with Amalgalite.

Now go define some crazy useful functions.

Comments (View)

blog comments powered by Disqus