CSVY: yaml frontmatter for csv file format

This project is maintained by the csvy team

Welcome to CSVY.

This page describe the specs of yaml frontmatter for csv file format. The main goals of the format are extreme simplicity and readability.

Because for data human’s curators from no-data, CSV, metadata+CSV to Semi-structured data, the technological gap is too large. A simple file format to add metadata to the existing datasets is needed, json is very cryptic for humans, but yaml can do the job.

Based on Tabular Data Packages

There are important initiatives, like Tabular Data Packages which it plans to use (json + csv), but most are meant to be published and read by machines.

CSVY is a simple container of a Tabular Data Package, where the (Metadata+Schema) are translated from JSON to YAML and put in the YAML frontmatter part of the file, after the YAML frontmatter part is the Data part stored using the CSV Dialect Description Format. It’s possible put multiple Data resources separates by the YAML Header delimiter.

YAML Header delimiter

A YAML metadata block is a valid YAML object, delimited by a line of three hyphens --- at the top and a line of three hyphens --- or three dots ... at the bottom.

Defining the Table Schema

Use the Table Schema, the only difference with the Tabular Data Package Specifications, it’s change the path field by order (started by one) to support multiple Data resources.

name: my-dataset
- order: 1
    - name: var1
      type: string
    - name: var2
      type: integer
    - name: var3
      type: number
    csvddfVersion: 1.0
    delimiter: ","
    doubleQuote: false
    lineTerminator: "\r\n"
    quoteChar: "\""
    skipInitialSpace: true
    header: true

Libraries supporting CSVY

Backwards Compatibility

For backward compatibility you can always add to your data.csv a data.yml metadata file, the next step when there is proper implementation make a single file container, data.csvy will not be a problem at all.

Parser support for skipping multiple lines in the header (which would contain the YAML), and for comment lines (lines starting with #). Based on CSV Parser Notes by @hubgit.

Language Parser Skip lines Comment lines Comments
Excel Mac   yes no  
Python pandas.read_csv yes yes  
R read.table yes yes  
Ruby csv.read no yes skip lines via regex

Authors and Contributors

Support or Contact

Use Github Issues.