The Dimwit's Guide to Renderers and Formatters

...Or I will rend thee in the gobberwarts with my blurglecruncheon, See if I don't!

Vogon Jeltz / Douglas Adams, The Hitchhiker's Guide to the Galaxy

This document is written by a newcomer, hoping to offer some hints on "good" and "bad" ways to use Renderers and Formatters, and some architecture notes for those looking at how they work.

API Quick Reference

Here are some examples of ways you can request that some object 'obj' is rendered as a PDF document.

You may wish to skip over this and refer back to it later. All the calls are basically just wrappers around a single primary interface, MyRenderer.render (where MyRenderer is some subclass of Ruport::Renderer)

Method Calls Notes
MyRenderer.render(:pdf, :data=>obj) (internal) Primary API
MyRenderer.render_pdf(obj) MyRenderer.render(:pdf, :data=>obj)
render_helper(MyRenderer, obj) MyRenderer.render(format, :data=>obj, :io=>output, :layout=>false) Only applicable within a Formatter method. Copies format from the running Formatter and sets :io to the current output object.
obj.as(:pdf) MyRenderer.render(:pdf) { |r| r.data=xxx } Only applicable if your object class has "include Ruport::Renderer::Hooks; renders_with MyRenderer". Renders obj.renderable_data if it exists, otherwise obj.
obj.save_as("foo.pdf") obj.as(:pdf, :file=>"foo.pdf") Filename extension mapped directly to format. The Report class in ruport-util extends this method so that .pdf opens the file in "wb" mode, and .txt is mapped to :text

There are some extra methods available for the built-in data classes like Data::Table and Data::Row

Method Calls Notes
table.to_pdf table.as(:pdf)
render_table(table) render_helper(Renderer::Table, table) Only applicable within a Formatter method. Copies format from the running Formatter and sets :io to the current output object.

All these methods can take an options hash as an extra argument.

If you wish to send output directly to a stream (or any object which supports the '<<' method), pass it as an :io option, e.g.

MyRenderer.render(:pdf, :data => obj, :io => $stdout)

Otherwise the result is returned as a String.

Renderers and Formatters

When you are using the API to render something, you only interact with one thing: the Renderer class. However, behind the scenes there are two parts: a Renderer and a Formatter.

The Formatter is the part which actually generates output. There are several base classes for different output types: Ruport::Formatter::HTML, Ruport::Formatter::PDF, Ruport::Formatter::Text, Ruport::Formatter::CSV. Each of these has methods relevant to generating output in that particular format.

The Renderer's job is to pick the correct Formatter to use when output is requested. For example, if you ask for :pdf output then it picks your PDF formatter class, which will most likely be a subclass of Ruport::Formatter::PDF. On the other hand, if you ask for :html output then it picks your HTML formatter class, and so on.

The Renderer also sets instance variables in the Formatter holding the data object (the thing you want to format) and an options object, which may contain auxilliary information to be used while creating the output or controlling its format (things like titles and page margins)

Finally, the Renderer knows which method, or methods, to call in the Formatter to start the ball rolling. These are known as 'stages' in the Renderer, but they are really just named entry points into the Formatter.

Example 1: Ruport::Renderer::Row

This is one of the built-in renderers - it is intended to render a Ruport::Data::Record, which is one row of a table.

r = Ruport::Data::Record.new([1,2,3])
puts Ruport::Renderer::Row.render(:csv, :data=>r)

What's happening here? Ruport::Renderer::Row contains a formats hash mapping :csv => Ruport::Formatter::CSV (amongst others). So it creates an instance of that class, and sets its 'data' attribute to your provided record. Finally, it knows it must call the method called build_row in the Formatter object it just built, and that's what does the work.

You can find the glue which links this all together within the Ruport source. The CSV Formatter associates itself with several Renderers, and the Renderer declares a stage which is the method to call in the Formatter.

  # in lib/ruport/formatter/csv.rb
  class Formatter::CSV < Formatter
    renders :csv, :for => [ Renderer::Row,   Renderer::Table,
                            Renderer::Group, Renderer::Grouping ]
  end

  # in lib/ruport/renderer/table.rb
  class Renderer::Row < Renderer
    stage :row
  end

As it happens, build_row just calls 'each' on the data object, so using duck typing you can pass in an Array or anything enumerable.

puts Ruport::Renderer::Row.render(:csv, :data=>1..5)

You can render the output differently just by passing in a different symbol:

puts Ruport::Renderer::Row.render(:text, :data=>1..5)

In this case the formatter is of a different class, but the Renderer still calls its build_row method. If you don't need the flexibility of choosing the format at runtime, you can use a shortcut API where the format is extracted from the method name.

puts Ruport::Renderer::Row.render_csv(1..5)

(Note: If you're playing with this, you may discover that Ruport::Formatter::PDF doesn't have a build_row method. It can only build whole tables, not individual rows)

Example 2: Ruport::Renderer::Table

t = Table(%w[a b c])
t << [1,2,3] << [4,5,6]
puts Ruport::Renderer::Table.render(:text, :data=>t)

This is slightly more complex internally because it has multiple stages. Here's how the renderer is defined:

  # lib/ruport/renderer/table.rb
  class Renderer::Table < Renderer
    options { |o| o.show_table_headers = true }

    prepare :table

    stage :table_header, :table_body, :table_footer

    finalize :table
  end

This renderer will call several methods in turn on the Formatter:

  • prepare_table
  • build_table_header
  • build_table_body
  • build_table_footer
  • finalize_table

(in that order). If the Formatter doesn't implement any of these methods, they are silently skipped.

Custom formatters and renderers

Rendering a single primitive object like a Ruport::Data::Table is fine, but how do you output a composite object? The answer is to make a custom Formatter (or Formatters), and to link it to a custom Renderer.

Let's start with a simple data object which contains two tables.

class Accounts
  attr_accessor :balance_sheet, :profit_and_loss
end

A first cut at making a custom Renderer and HTML Formatter might look like this:

class AccountsRenderer < Ruport::Renderer
  stage :report

  class HTML < Ruport::Formatter::HTML
    renders :html, :for => AccountsRenderer

    def build_report
      output << "<h1>Accounts summary</h1>\n"
      Ruport::Renderer::Table.render(:html,
                                     :data=>data.balance_sheet, :io=>output)
      Ruport::Renderer::Table.render(:html,
                                     :data=>data.profit_and_loss, :io=>output)
    end
  end
end

a = Accounts.new
a.balance_sheet = Table(%w[item amount])
a.balance_sheet << ["Pencils", 123.40]
a.balance_sheet << ["Paperclips", 56.30]
a.balance_sheet << ["Capital", -179.70]

a.profit_and_loss = Table(%w[item amount])
a.profit_and_loss << ["Sales", 483.00]
a.profit_and_loss << ["Bad debt", -200.00]

AccountsRenderer.render(:html, :data=>a, :io=>$stdout)

The useful work is done in the build_report method, which creates the output we need. As part of that, it renders the two tables (a process which actually creates two new Renderer and Formatter objects behind the scenes, but you needn't let that concern you).

When rendering each table, we tell it to write to the same 'output' object that we are building for the whole report. In fact this is a common pattern and there's a helper method to make it less verbose:

    def build_report
      output << "<h1>Accounts summary</h1>"
      render_table(data.balance_sheet)
      render_table(data.profit_and_loss)
    end

The formatter knows that it is formatting :html at the moment, and so render_table passes this to the table renderer, saving you having to duplicate that information.

Adding PDF output

In principle you just add another formatter class. Your first attempt might look like this:

  class PDF < Ruport::Formatter::PDF
    renders :pdf, :for => AccountsRenderer

    def build_report
      add_text "Accounts summary", :font_size => 18,
               :justification => :center
      pad(20) { render_table(data.balance_sheet) }
      pad(20) { render_table(data.profit_and_loss) }
    end
  end

However this isn't quite right as it stands. The problem is that you need to write multiple items onto the same PDF document, but a new throwaway Renderer and Formatter is created when rendering each table. So you need to explicitly pass the existing PDF::Writer object as the shared 'canvas' onto which everything is to be written.

  class PDF < Ruport::Formatter::PDF
    renders :pdf, :for => AccountsRenderer

    def build_report
      add_text "Accounts summary", :font_size => 18,
               :justification => :center
      pad(20) { render_table(data.balance_sheet, :formatter => pdf_writer) }
      pad(20) { render_table(data.profit_and_loss, :formatter => pdf_writer) }
    end
  end

...

AccountsRenderer.render(:pdf, :data=>a, :file=>"foo.pdf")

FIXME: The documentation says you should also have to call method 'render_pdf' to finalize the output of the report. However the above example runs without it. Need to explain why this is, and under what circumstances render_pdf is actually required.

So under some circumstances you may need to write the following:

class AccountsRenderer < Ruport::Renderer
  stage :report
  finalize :report

  class PDF < Ruport::Formatter::PDF
    renders :pdf, :for => AccountsRenderer

    def build_report
      add_text "Accounts summary", :font_size => 18,
               :justification => :center
      render_table(data.balance_sheet, :formatter => pdf_writer)
      render_table(data.profit_and_loss, :formatter => pdf_writer)
    end

    def finalize_report
      render_pdf   # note: different from the Renderer.render_pdf() call
    end
  end
end

Binding the data object to the Renderer

So far we have been using fairly cumbersome calls to initiate the rendering:

AccountsRenderer.render(:html, :data=>a, :io=>$stdout)
AccountsRenderer.render(:pdf, :data=>a, :file=>"foo.pdf")

If a data model only renders with one particular renderer, which is often the case, then you can add some simple glue to bind them together.

class Accounts
  include Ruport::Renderer::Hooks
  renders_with AccountsRenderer
end

After this, you have access to very simple methods for generating output from this object:

a.as(:html, :io=>$stdout)
a.save_as("foo.pdf")

Adding annotations

Maybe you want to display some extra information on the report, such the date it was generated. You could add extra attributes to the data model itself, which may be appropriate in some cases. Otherwise, you can use the 'options' in the formatter to pass extra data. You can tag particular options as being mandatory, so the report will fail if they are not set.

class AccountsRenderer < Ruport::Renderer
  stage :report
  required_option :date

  class HTML < Ruport::Formatter::HTML
    renders :html, :for => AccountsRenderer

    def build_report
      output << "<h1>Accounts summary as at #{options.date}</h1>"
      render_table(data.balance_sheet)
      render_table(data.profit_and_loss)
    end
  end
end
...
puts a.as(:html)                   # exception, :date not set
puts a.as(:html, :date=>Time.now)  # correct

You should consider carefully whether options are the right way to pass data, because you have to set them explicitly at rendering time, and they do not persist afterwards. In some cases it may be cleaner to keep everything as attributes of your data model object, rather than splitting information between 'data' and 'options'. However if you have an existing data model, and you want to annotate the report without modifying that model, then adding options may be the right way.

(In this balance sheet example, then it probably makes more sense for the date at which it was extracted from the accounts to be an attribute of the balance sheet itself, rather than something which has to be carried around as a separate piece of data)

Helpers

If you are generating output in multiple formats, the method which generates each format has to live in a separate subclass. However you can still share code between them, using a Helpers module.

The following example combines all the code we have seen so far, and includes a method format_time() which is shared between the HTML and PDF reports.

require 'rubygems'
require 'ruport'

class AccountsRenderer < Ruport::Renderer
  stage :report
  finalize :report
  required_option :date

  module Helpers
    def format_time(t)
      t.strftime "%Y-%m-%d %H:%M"
    end
  end

  class HTML < Ruport::Formatter::HTML
    renders :html, :for => AccountsRenderer

    def build_report
      output << "<h1>Accounts summary as at #{format_time(options.date)}</h1>"
      render_table(data.balance_sheet)
      render_table(data.profit_and_loss)
    end
  end

  class PDF < Ruport::Formatter::PDF
    renders :pdf, :for => AccountsRenderer

    def build_report
      add_text "Accounts summary as at #{format_time(options.date)}",
               :font_size => 18,
               :justification => :center
      pad(20) { render_table(data.balance_sheet, :formatter => pdf_writer) }
      pad(20) { render_table(data.profit_and_loss, :formatter => pdf_writer) }
    end

    def finalize_report
      render_pdf
    end
  end
end

class Accounts
  include Ruport::Renderer::Hooks
  renders_with AccountsRenderer

  attr_accessor :balance_sheet, :profit_and_loss
end

a = Accounts.new
a.balance_sheet = Table(%w[item amount])
a.balance_sheet << ["Pencils", 123.40]
a.balance_sheet << ["Paperclips", 56.30]
a.balance_sheet << ["Capital", -179.70]

a.profit_and_loss = Table(%w[item amount])
a.profit_and_loss << ["Sales", 483.00]
a.profit_and_loss << ["Bad debt", -200.00]

a.as(:html, :io=>$stdout, :date=>Time.now)
a.save_as("foo.pdf", :date=>Time.now)

Assorted hints and tips

Hint 1: Interact with Renderer class methods; forget Renderer instances

Renderer and Formatter instances are created quietly behind the scenes, are run once and discarded. Use the Renderer class methods to start this process off, and don't try to work with instances of these classes.

Hopefully the reasons for this will become clear.

Hint 2: Don't misuse 'setup'

The following code is in the Ruport book:

  class CallInRenderer < Ruport::Renderer

    stage :call_in_sheet

    def setup
      self.data =
        CallInAggregator.new(:start => options[:start_date]).to_grouping
    end
  end

This might tempt you to try to use a Renderer instance as a data store for the content of a report. However if you pursue this too far you may end up in a dead end.

In my case, I put some long-running data-gathering code in setup, and then thought I would like to render this same data several times (say once as HTML, and then again as CSV). My thought process went:

"Obviously, the setup method is storing some data in this Renderer instance. So I can just tell the same Renderer instance to render the data a second time in a different format, or I can Marshal.dump it and use it again later. Hmm, there doesn't seem to be an API for re-running an existing Renderer instance. OK I'll add that, should take about 5 minutes..."

Unfortunately I was wrong, and it turns out I fell at the first hurdle.

Despite appearances, "self.data = ..." does NOT store any data in the Renderer instance. What it actually does is "self.formatter.data = ...". The Renderer instance delegates storage of both data and options to a Formatter instance.

When you call a Renderer class method, it creates a Renderer and a Formatter as a symbiotic pair, then calls your setup method and then generates the report. A Renderer instance simply cannot do work prior to a Formatter being chosen and created, nor can it be usefully reattached to a new Formatter.

A Formatter instance could be created by itself, but it's not especially useful. For example, it can't render a table without guidance from a Renderer::Table.

So how should you handle data which needs to be re-used in this way? Well, you have to keep it in a separate model class. The example in the book could be changed along these lines:

ca = CallInAggregator.new(:start => Time.parse(params[:period]))
res1 = CallInRenderer.render_html(:data => ca.to_grouping)
res2 = CallInRenderer.render_pdf(:data => ca.to_grouping)

In other words, you create a CallInAggregator which gathers some data. Then you create a temporary CallInRenderer to draw it. The original code was the other way round: you told the CallInRenderer class to render whatever data it felt like at that instant, and it created a temporary CallInAggregator to fetch it.

To make this useful, you need to modify the CallInAggregator so it does its work in the initialize method and saves it in an instance variable, and to_grouping returns the saved object, otherwise it will end up doing the work twice.

Unfortunately this code is still incomplete, because the formatter also makes use of options.start_date for the report heading, so you need to pass this explicitly:

period = Time.parse(params[:period])
ca = CallInAggregator.new(:start => period)
res1 = CallInRenderer.render_html(:data => ca.to_grouping, :start_date => period)
res2 = CallInRenderer.render_pdf(:data => ca.to_grouping, :start_date => period)

Arguably, the book example has split the report data model across two places: the start_date is in 'options', and the timesheet grouping is in 'data'

Since the CallInAggregator already has an @start instance variable, you can expose this via an accessor. Then the PDF and HTML formatters can be changed to use data.to_grouping and data.start, instead of data and options.start_date. In my opinion, that would make the design less brittle.

Now you just pass the whole CallInAggregator object to be rendered:

ca = CallInAggregator.new(:start => Time.parse(params[:period])
res1 = CallInRenderer.render_html(:data => ca)
res2 = CallInRenderer.render_pdf(:data => ca)

Once you've decided to do that, you can further simplify the API by hooking the CallInAggregator directly to the CallInRenderer.

class CallInAggregator
  include Ruport::Renderer::Hooks
  renders_with CallInRenderer
end

ca = CallInAggregator.new(:start => Time.parse(params[:period]))
res1 = ca.as(:html)
res2 = ca.as(:pdf)

Perhaps if the book had used this approach in the first place, it would have been a better foundation for more complex reports, where a clear separation is required between the data gathering phase and the outputting phase.

Hint 3: don't misuse 'required_options'

The CallInRenderer example also uses another Renderer feature:

    required_option :start_date

This further misled me to believe that Renderers were a good place for doing data gathering, because there was a nice convenient feature for validating that all the necessary parameters were present. Again, unless you are doing small disposable amounts of work, it would be better done in a separate object, and keep the Renderer for just, well, rendering it.

This means you'll have to do your own checking for required arguments, and you'll also lose the Renderer's method access for options, e.g. options.foo rather than options[:foo].

This gives the following:

class CallInAggregator           

  def initialize(options={})
    @start = options[:start]
    raise "Missing :start option" unless @start
    ... now do the work
  end

There is now a clear distinction between options used to build the report data model (passed to CallInAggregator.new), and options used only for controlling the output rendering.

Hint 4: Formatters call Renderers

It's pretty obvious that Renderers call Formatters to do work; it's less obvious that Formatters also call Renderers to do work. If you're in the middle of a custom Formatter, and then decide to output a table, this will ultimately call Renderer::Table to do the work, which in turn will create a new Formatter::XXX. In ruport-1.4, for every row of the table it then called a Renderer::Row which in turn created another Formatter::XXX.

MyRenderer -> MyFormatter -> Renderer::Table -> Formatter::XXX -> Renderer::Row -> Formatter::XXX

This has been optimised out in trunk, and table rows are now output directly by the formatter:

MyRenderer -> MyFormatter -> Renderer::Table -> Formatter::XXX

But how does it know which class Formatter::XXX to create? Well, MyFormatter remembers what it is currently formatting, say :html, and passes this to Renderer::Table. So Renderer::Table picks the formatter class it knows about for :html, which is Formatter::HTML, and creates a new instance of it.

However, this does mean that if MyFormatter is a subclass of Formatter::HTML, and it overrides the build_row method, this won't be used when rendering a table because Renderer::Table will create a standard Formatter::HTML instead.

Hint 5: Be aware of block initializers

"Block initializers" are where you pass a block at object creation time to perform additional initialization on the new object. Example:

   table = Table(%w[a b c]) do |t|
      t << [1,2,3]
      t << [4,5,6]
   end

which in this case is effectively the same as the more imperative style

   table = Table(%w[a b c])
   table << [1,2,3]
   table << [4,5,6]

You should take note of the block initialization style, because it plays an important role in the operation of Renderers. When you write

   myobject.as(:text) do |r|
     ...
   end

the renderer/formatter pair is created, then your initialisation block is called, then the renderer/formatter pair is run (including the 'setup' method)

This style pervades deeply, and blocks are created down through multiple levels, e.g.

    Renderer::Hooks::Classmethods.as()  or  Formatter#render_helper()
--> Renderer.render()
--> Renderer.build()
--> Renderer#setup()

Think about the blocks as simply initialization code which will be inserted at the "right" point in the object's construction.

Unlike iterators, generally they're only called once. (The now-obsolete "renderer_data_by_row" method from ruport-1.4 invoked Renderer::Row multiple times, and therefore this initialization block was invoked once for each row)

Hint 6: Strings versus Streams

If you write your formatters like this:

    output << data.table1.to_csv
    output << data.table1.as(:csv)

then table1 will be rendered into a string, then this string appended to the output. For a table with tens of thousands of rows this may lead to a pause before you see the rows being output, and higher memory usage.

You can stream your output like this:

    data.table1.as(:csv, :io=>output)

or within a formatter only, using

    render_table(data.table1)

which merges :io=>output into the table rendering options for you.