Ticket #377 (closed enhancement: fixed)

Opened 10 months ago

Last modified 10 months ago

Speed up CSV writer

Reported by: ruport Assigned to:
Milestone: Keywords:
Cc: b.candler@pobox.com

Description

This patch speeds up CSV writing by keeping a persistent FCSV instance for the lifetime of the formatter.

Benchmarks for a ~13,000 line CSV output:

This patch?  render_data_by_row patch?   Time taken
---------------------------------------------------
    No                  No                  34.6s
    Yes                 No                  24.2s
    No                  Yes                 10.4s
    Yes                 Yes                  6.1s

Note: no specs changed or broken. However it *does* rely on FCSV using its instance cache, and I'm not sure exactly why. Try changing FCSV(output, ...) to FCSV.new(output, ...) and the specs break, as table headers go missing. Possibly not a good idea to wrap the same String with multiple StringIO wrappers?

Attachments

ruport-csv-writer.diff (3.4 kB) - added by ruport on 01/31/08 03:54:16.

Change History

01/31/08 03:54:16 changed by ruport

  • attachment ruport-csv-writer.diff added.

01/31/08 04:51:23 changed by ruport

Further improvements:

- simply moving "require 'fastercsv'" outside of csv_writer reduces the time to 2.9s. (I guess that 'require' is very expensive when rubygems is around)

- if you change build_table_body as follows:

    def build_table_body
      fcsv = csv_writer
      data.each { |row| fcsv << row }
    end

then it still takes about 2.9s. However this improvement is now independent of the possibly controversial render_data_by_row patch.

- there is another issue. In my renderer, if I replace

      data.detail.as(:csv, :io=>output)

with

      FCSV(output) { |fcsv|
        fcsv << data.detail.column_names
        data.detail.each { |row| fcsv << row }
      }

then the time drops to 1.1 seconds. In fact, even if I make build_table_body an empty method, data.detail.as(...) still takes 1.6s, without generating any output other than the column headings!

Profiling shows that the issue is Renderer taking a fullblown copy (or two) of the table:

  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 13.33    13.90     13.90    26312     0.53     4.93  Array#each
 11.66    26.06     12.16    26302     0.46     4.97  Ruport::Data::Table#recordize
 11.05    37.59     11.53    92064     0.13     1.44  Kernel.dup
 10.63    48.68     11.09    92082     0.12     0.21  Kernel.send
  7.81    56.83      8.15    13151     0.62     1.12  Ruport::Data::Record#initialize
  6.00    63.09      6.26    26303     0.24     1.40  Ruport::Data::Table#record_class
  5.31    68.63      5.54    13151     0.42     3.13  Ruport::Data::Table#normalize_hash
  4.28    73.09      4.46    92061     0.05     0.05  Module#===
...
  0.00   104.31      0.00        2     0.00 52100.00  Ruport::Renderer#data=
..
  0.00   104.31      0.00        1     0.00 104200.00  Ruport::Data::Table#initialize_copy
..
  0.00   104.31      0.00        2     0.00 52165.00  Ruport::Renderer#render
..
  0.00   104.31      0.00        1     0.00 104300.00  Ruport::Renderer::Hooks.as

This can be eliminated simply by changing

  def data=(val)
    formatter.data = val.dup
  end

to

  def data=(val)
    formatter.data = val
  end

I'm not sure if a deep copy is warranted here, when simply telling the formatter about the source object. If it is there may be a cheaper way to do it, e.g. Marshal.load(Marshal.dump(..))

With these changes my report now takes 1.0-1.1s to write the CSV table to disk, more than 30x faster than when I started.

(follow-up: ↓ 3 ) 02/02/08 13:22:00 changed by sandal

  • status changed from new to closed.
  • resolution set to fixed.

patch applied: r1255

Remember, the bug tracker is not a mailing list! :)

Please see further discussion at:

(in reply to: ↑ 2 ) 02/02/08 13:22:29 changed by sandal

Replying to sandal:

patch applied: r1255 Remember, the bug tracker is not a mailing list! :) Please see further discussion at:

http://groups.google.com/group/ruport-dev/browse_thread/thread/88d48fc664e597f8