NOTE: This guide covers Grafter 0.6.0
Running Pipeline Transformations
This is the second part of the Grafter Getting Started Guide.
In this section you’ll see how a grafter can be used to:
- Clean tabular data
- Convert a CSV file into an Excel file or vice versa
- Convert tabular data into Linked Data (RDF)
You’ll also learn how to list and run grafter transformations with our command line tools, how transformations themselves are specified and how to run them at the REPL.
Running Transformations from the Commandline
To understand how Grafter works lets first take a look at the example
CSV file (
./data/example-data.csv) which was installed by the
This dataset follows the common pattern in CSV files of specifying the header on the first row followed by the rows of source data. Here we see there are two records, one for Alice, the other Bob listing both their sex and age.
Next lets take a look at what transformations are defined in this project, by running the command:
lein grafter list is one of the commands provided by the
plugin to list all of the pipelines defined within a project. It
scans the projects classpath finds all of the pipelines (defined with
defgraft) and displays their name, their type, the
arguments they expect and their documentation string.
The template project defines two pipelines, the first,
convert-persons-data is a
pipe, which means it converts tabular
data back into another tabular format. The second pipeline is a
graft which means it converts tabular data into graph data.
We’ll talk more about this distinction later but lets try running both of the pipelines to get an idea about the differences between pipes and grafts.
Executing a Pipeline Transformation
You can run both pipe and graft transformations with the
command. The format of the run command is:
lein run <pipeline-name> input-args... output-file
So remember pipes convert data from one tabular format to another
tabular format. So we can use the
test-project.pipeline/convert-persons-data pipe to convert our
example-data.csv file into another csv file. We know from running
lein grafter list that the pipeline is expecting one input argument
as its source data-file. Note that pipelines declare their input
arguments, but not their outputs, so its upto us and
lein run to
supply a final output argument, lets ask for it to put the output into
a new CSV file (
The output from the command above indicates that the pipeline function operated on its sole input file and produced an output file as we asked. Looking at the output we can see what happened:
We can see the transformation has converted the representation of
gender from the strings ‘m’ and ‘f’ to ‘male’ and ‘female’, whilst
deriving a new column called
person-uri which has been built out of
a prefix and their name. So lets ask for the data in an Excel file:
This time the same table will be output but this time as an Excel file. You can see from this that for registered exporters, grafter will detect desired file format from the file extension. Grafter currently ships with tabular data exporters for CSV, and Excel (both xls and xlsx), we plan to add more, such as open-office in the future.
Exporting to a format like Excel has some benefits over CSV, in that CSV will lose type information by converting all values back into string values.
Now lets try running the
convert-persons-data-to-graph that the project defines. This time
with graft runs we’re expected to give it a linked data serialisation
format. Grafter supports all main RDF serialisations including turtle
.ttl), n-triples (
.nt), trig (
.trig), trix (
.nq) and RDF XML (
.rdf). Again the desired format is infered by
grafter from the file extension, so lets ask for some linked data in
Again we can see that like
grafts are functions, however
this time they output graph data, as linked data in the chosen format.
Lets take a look at the linked data we generated:
Now this is more interesting, we’ve converted our tabular data into an RDF graph of triples! Lets see what it looks like in n-triples:
Its also worth mentioning that the Grafter plugin can also load inputs from URL’s e.g.
Now you know how to list and run pipelines defined in a Grafter project, lets take an in depth look at how these transformations have been expressed with Grafter.