How To Write a Custom Decoupled CSV Importer

Bogdan Crudu

Hey there! Last time I spoke about how to add a bit of microservices style architecture flavor to your monolithic app and the benefits of doing it in such a way. Now I’m going to go through a concrete working example of a modularly built, ‘plug-in and play’, intuitive and reusable CSV importer. We’ll have it import a Post object for our blog

To start things off let’s look at what we’ll need and what we’ll be doing. we’ll be doing it the TDD way so we’ll have a spec file, a importer file, a ImportPostsController and it’s subsequent route and also a rake task for mass imports.

Go to your filetree, right click app and create a new directory called ‘data_importers’. Inside it place the importer.rb file. This is where you’ll put your importer code. Do the same for the spec and create you’re importer_spec.rb file where you’ll host your tests.

Normally you’ll go the lines of red, green, refactor path, but to keep this as short as possible I’ll just put in the whole importer_spec file in it’s entirety followed by the importer file.

require 'rails_helper'
require 'ostruct'

class ModelMock < OpenStruct
  def save
    true
  end
end

RSpec.describe DataImporters::Importer do

  it 'can be instantiated with a file path' do
    importer = DataImporters::Importer.new file: 'data/export.csv'
    expect(importer.file).to eq 'data/export.csv'
  end

  it 'can be instantiated with a hash of source and target fields' do
    importer = DataImporters::Importer.new field_mappings: { target: :source }
    expect(importer.field_mappings).to eq({ target: :source })
  end

  it 'can be instantiated with the ActiveRecord model that will hold the data' do
    importer = DataImporters::Importer.new import_to: OpenStruct
    expect(importer.model).to eq OpenStruct
  end

  it 'the import method returns false if the file doesn\t exist'  do
    importer = DataImporters::Importer.new import_to: OpenStruct,
                                           field_mappings: { target: :source }
    expect(importer.import).to be_falsey
  end

  it 'the import method returns false if the model doesn\t exist'  do
    importer = DataImporters::Importer.new file: 'data/export.csv',
                                           field_mappings: { target: :source }
    expect(importer.import).to be_falsey
  end

  it 'the import method returns false if the field_mappings are wrong'  do
    importer = DataImporters::Importer.new file: 'data/export.csv',
                                           import_to: OpenStruct
    expect(importer.import).to be_falsey
  end

  it 'the import function returns the number of successful imports' do
    allow(CSV).to receive(:foreach)
                      .and_yield({csv_name: 'John', csv_email: 'john@example.com'})
                      .and_yield({csv_name: 'Dan', csv_email: 'Dan@example.com'})
    importer = DataImporters::Importer.new file: 'data/export.csv',
                                           import_to: ModelMock,
                                           field_mappings: { name: :csv_name,
                                                             email: :csv_email}
    allow(importer).to receive(:file_exists?) { true }

    result = importer.import
    expect(result[:success]).to eq 2
  end

  it 'the import function returns the number of failed imports' do
    class ModelMock
      def save
        false
      end
    end
    allow(CSV).to receive(:foreach)
                      .and_yield({csv_name: 'John', csv_email: 'john@example.com'})
                      .and_yield({csv_name: 'Dan', csv_email: 'Dan@example.com'})
    importer = DataImporters::Importer.new file: 'data/export.csv',
                                           import_to: ModelMock,
                                           field_mappings: { name: :csv_name,
                                                             email: :csv_email}
    allow(importer).to receive(:file_exists?) { true }

    result = importer.import
    expect(result[:failure]).to eq 2
  end
end

And here is the importer.rb file that makes the test pass:

require 'celluloid/current'

module DataImporters
  class Importer

    attr_reader :file, :model, :field_mappings, :print_progress

    def initialize(file: '', import_to: nil, field_mappings: {}, print_progress: false)
      @file = file
      @model = import_to
      @field_mappings = field_mappings
      @print_progress = print_progress
    end

    def import
      return false unless valid_input?
      result = { success: 0, failure: 0}
      CSV.foreach(file, headers: true) do |record|
        save_to_model(record) ? result[:success] += 1 : result[:failure] += 1
        puts "Imported #{result[:success]}" if print_progress && (result[:success] % 100 == 0)
      end
      result
    end

    private

    def valid_input?
      file_exists? && model_exists? && field_mappings_exist?
    end

    def file_exists?
      File.exist? file
    end

    def model_exists?
      model != nil
    end

    def field_mappings_exist?
      field_mappings != {}
    end

    def save_to_model(record)
      m = model.new parse(record)
      m.save
    end

    def parse(record)
      result = {}
      field_mappings.each do |target, source|
        result[target] = record[source]
      end
      result
    end

  end
end

Now to use this, we create a rake task where we invoke the logic. We can use this rake task via the server console and import a multitude of records via one single task.

require File.expand_path('../config/application', __FILE__)

Rails.application.load_tasks

desc 'Import My Posts'
task :post_import => :environment do
  file = Rails.root.join 'posts.csv'
  field_mappings = {
      title: 'Title',
      user: 'User',
      text: 'Text'
  }
  importer = DataImporters::Importer.new file: file,
                                         import_to: Post,
                                         field_mappings: field_mappings,
                                         print_progress: true
  result = importer.import
  puts "Successfully imported #{result[:success]} records. #{result[:failure]} failures"
end

For individual imports we just create a ImportPostsController where we invoke the importer logic, create a route for it and add the link in the UI so that an individual user can do it. But this post is getting kind of long, so we’ll do that in a future article