CSVParse 1.0.1

There is a newer version of this package available.
See the version list below for details.
dotnet add package CSVParse --version 1.0.1                
NuGet\Install-Package CSVParse -Version 1.0.1                
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="CSVParse" Version="1.0.1" />                
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add CSVParse --version 1.0.1                
#r "nuget: CSVParse, 1.0.1"                
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install CSVParse as a Cake Addin
#addin nuget:?package=CSVParse&version=1.0.1

// Install CSVParse as a Cake Tool
#tool nuget:?package=CSVParse&version=1.0.1                

CSVParse

The Ultra Fast .NET CSV Parser

Nuget

CSVParse is a minimal, pure C# CSV file parser. It implements the vast majority of RFC 4180. It's designed with the key goals of being fast, and having a straight-forward API.

Features

  • Supports any separator character (allowing it to support CSV, TSV, and similar formats).
  • Parses data with zero GC allocations.
  • Supports parsing (or ignoring) CSV headers.
  • Supports fields in quotation marks, which can be automatically unescaped, as per the RFC 4180 spec.
  • The CSV parser instance can be resued if multiple CSV files need to be parsed. (Doing so saves the cost of constructing a new parser instance and doing reflection)
  • Custom deserializers can be written to parse fields into arbitrary data types.
  • Supports Windows/Unix/Mac line endings.
  • Automatic file encoding detection for UTF-8, UTF-16-LE, and UTF-16-BE.
  • Runtime IL generation to optimise deserializing into arbitrary classes/structs with minimal use of reflection.

Important Limitations

  • The RFC 4180 specification allows for line breaks inside quoted fields, due to the internal design of CSVParse, this is not supported.
  • Currently on platforms that do not support runtime IL generation (this includes NativeAOT), a much slower reflection based fallback path is used. This may be circumvented in the future with the use of compile-time source generation.
  • To reduce the number of memory allocations, CSVParse currently uses a fixed size line buffer. This means that lines in the CSV file cannot be longer than the configured maximum for the parser. This limit is configurable and may be lifted in the future.
  • Currently a CSVParser instance can only be used from a single thread.

Benchmarks

See CSVParse.Benchmarks/CSVParseBenchmarks.cs for details.

// * Summary *

BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3880/23H2/2023Update/SunValley3)
Intel Core i5-8300H CPU 2.30GHz (Coffee Lake), 1 CPU, 8 logical and 4 physical cores
.NET SDK 8.0.303
  [Host]     : .NET 8.0.7 (8.0.724.31311), X64 RyuJIT AVX2
  Job-SZBXTU : .NET 8.0.7 (8.0.724.31311), X64 RyuJIT AVX2

Runtime=.NET 8.0  IterationCount=20  WarmupCount=4

| Method                            | Mean     | Error    | StdDev   | Gen0      | Gen1      | Gen2     | Allocated   |
|---------------------------------- |---------:|---------:|---------:|----------:|----------:|---------:|------------:|
| TestCSVParseClass                 | 45.59 ms | 0.792 ms | 0.813 ms | 1818.1818 | 1181.8182 | 363.6364 |  9703.92 KB |
| TestCSVParseStruct                | 42.48 ms | 1.435 ms | 1.652 ms | 1250.0000 |  750.0000 | 250.0000 |  10432.3 KB |
| TestCSVParseStructNoIt            | 38.94 ms | 0.759 ms | 0.874 ms | 1230.7692 |  769.2308 | 230.7692 |  7898.36 KB |
| TestCSVParseStructNoItNoCustomSer | 31.07 ms | 1.034 ms | 1.149 ms |  750.0000 |  437.5000 | 125.0000 |  5554.23 KB |
| TestCSVParseStructNoItNoAlloc     | 23.59 ms | 0.604 ms | 0.695 ms |         - |         - |        - |    15.31 KB |
| TestCSVParseOldClass              | 47.36 ms | 1.169 ms | 1.346 ms | 1800.0000 |  900.0000 | 300.0000 | 10057.61 KB |
| TestCSVHelper                     | 71.52 ms | 2.225 ms | 2.563 ms | 3000.0000 | 1333.3333 | 333.3333 | 17964.23 KB |

Installation

Simply add the NuGet package to your project.

> dotnet add package CSVParse

Usage

The simplest way to use CSVParse is to call that static Parse<>() method.

using CSVParse;

// Define a data structure to representing a row of data
public struct Row
{
	public string name;
	public int id;
	public float time;
}

// Open a file to parse, this can be replaced with any C# stream
string path = @"C:\path\to\your\csv.txt";
using FileStream fs = File.OpenRead(path);

// Call the static parse method
var csv = CSVParser.Parse<Row>(fs);

// Now you can process the rows as needed
foreach (var row in csv)
	Console.WriteLine($"{row.id}: {row.name} [time: {row.time}]");

A number of parsing options can be passed to CSVParse to control how CSV files are parsed:

string path = @"C:\path\to\your\csv.txt";
using FileStream fs = File.OpenRead(path);

// Specify some options, any of these can be left blank to use the default value.
var options = new CSVSerializerOptions()
{
    IncludeFields = true,
    IncludeProperties = true,
    IncludePrivate = false,
    HandleSpeechMarks = true
    Separator = ',',
    MaximumLineSize = 2048,
    HeaderMode = CSVHeaderMode.Parse,
    DefaultEncoding = null
};

// Call the static parse method
var csv = CSVParser.Parse<Row>(fs, options);

CSVParse provides a few attributes which can be applied to the fields of the row data structure to control how they are parsed:

// The [CSVName(...)] Attribute can be applied to fields where the field name miht not 
// match the CSV column name. In ths example, in the CSV header, the column named "trip_id" 
// will be matched to the field "tripID".
public readonly struct GTFSStopTimeStruct
{
    [CSVName("trip_id")]
    public readonly string tripID;
    [CSVName("arrival_time")]
    public readonly TransitTime arrivalTime;
    [CSVName("departure_time")]
    public readonly TransitTime departureTime;
    [CSVName("stop_id")]
    public readonly string stopID;
    [CSVName("shape_dist_traveled")]
    public readonly float? shapeDistTraveled;
}

// If your CSV file doesn't have a header (or you want to skip parsing it), then you can use
// the [CSVIndex(...)] attribute to control which column is associated with which field by 
// column index. This also makes it possible to skip parsing certain fields in the CSV file 
// if they aren't needed.
public readonly struct GTFSStopTimeStruct
{
    [CSVIndex(0)]
    public readonly string tripID;
    [CSVIndex(1)]
    public readonly TransitTime arrivalTime;
    [CSVIndex(2)]
    public readonly TransitTime departureTime;
    [CSVIndex(3)]
    public readonly string stopID;
    [CSVIndex(8)]
    public readonly float? shapeDistTraveled;
}

// To be able to deserialize custom data types you can apply the [CSVCustomSerializer<>]
// attribute to a field to specify a custom parser for that field. This attribute takes a 
// generic type argument which implements ICustomCSVSerializer.
public readonly struct GTFSStopTimeStruct
{
    [CSVName("trip_id")]
    public readonly string tripID;
    [CSVCustomSerializer<TransitTimeCSVSerializer>]
    [CSVName("arrival_time")]
    public readonly TransitTime arrivalTime;
    [CSVCustomSerializer<TransitTimeCSVSerializer>]
    [CSVName("departure_time")]
    public readonly TransitTime departureTime;
    [CSVName("stop_id")]
    public readonly string stopID;
    [CSVName("shape_dist_traveled")]
    public readonly float? shapeDistTraveled;
}

Custom field parsers can be implemented as shown in the following example. Note that there are two methods of implementing custom parses. Either using the [CSVCustomSerializer] attribute on a field an implementing a custom serializer (see TransitTimeCSVSerializer); or by simply implementing a constructor on the type which takes a single ReadOnlySpan<char> as a parameter and implementing the int Serialize(Span<char> dst) method.

//////// From CSVParser.cs
public interface ICustomCSVSerializer
{
    public object? Deserialize(ReadOnlySpan<char> data, int lineNumber);
    public ReadOnlySpan<char> Serialize(object? data, int lineNumber) => data?.ToString();
}
////////

public readonly struct GTFSStopTimeStruct
{
    [CSVName("trip_id")]
    public readonly string tripID;

    [CSVCustomSerializer<TransitTimeCSVSerializer>]
    [CSVName("arrival_time")]
    public readonly TransitTime arrivalTime;
}

public class TransitTimeCSVSerializer : ICustomCSVSerializer
{
    public object? Deserialize(ReadOnlySpan<char> data, int lineNumber)
    {
        return new TransitTime(data);
    }
}

public readonly struct TransitTime : ICSVSerializable
{
    public readonly int time;

    public TransitTime(ReadOnlySpan<char> s)
    {
        int hour = int.Parse(s[..2]);
        int min = int.Parse(s[3..5]);
        int second = int.Parse(s[6..8]);
        time = hour * 3600 + min * 60 + second;
    }

    public TransitTime(int seconds)
    {
        this.time = seconds;
    }

    public override string ToString()
    {
        var h = (time / 3600); // = 25
        var m = (time / 60 - (h * 60)); // = 30
        var s = time % 60;
        return $"{h:D2}:{m:D2}:{s:D2}";
    }

    public int Serialize(Span<char> dst)
    {
        // Note that this implementation make an unnecessary string allocation...
        var str = ToString();
        str.CopyTo(dst);
        return str.Length;
    }
}
Performance Considerations

CSVParse uses reflection to work out which fields in the CSV map to which fields in the row data structure. To avoid unnecessary copies and allocations we can store an instance of the CSVParser to be reused with other CSV files of the same type:

string path = @"C:\path\to\your\csv.txt";
using FileStream fs = File.OpenRead(path);

// Create an instance of the parser, this can be reused later if needed.
var parser = new CSVParser<Row>(options);

// Call the parse method
var csv = parser.Parse(fs);

If you intend on processing CSV data one line at a time, more allocations can be avoided by reusing a single row object for each row in the CSV:

string path = @"C:\path\to\your\csv.txt";
using FileStream fs = File.OpenRead(path);

// Create an instance of the parser, this can be reused later if needed.
var parser = new CSVParser<Row>(options);

// Initialise the parser and read the header
var header = parser.Initialize(fs);
var row = new Row();
while (parser.ParseRow(ref header, fs, ref row))
{
    // Do something with the parsed row
}

Additionally, when processing one line at a time, string allocations can be avoided by making your data types use pre-allocated mutable strings. The PreAllocatedString helper struct is effectively a Memory<char>, and can be used to represent strings.

public readonly struct Row
{
    public Row() : this(256) { }

    public Row(int preallocateStringSize)
    {
        name = new PreAllocatedString(preallocateStringSize);
    }

	public readonly PreAllocatedString name;
	public readonly int id;
	public readonly float time;
}

string path = @"C:\path\to\your\csv.txt";
using FileStream fs = File.OpenRead(path);

// Create an instance of the parser, this can be reused later if needed.
var parser = new CSVParser<Row>(options);

// Initialise the parser and read the header
var header = parser.Initialize(fs);
var row = new Row(1024);
while (parser.ParseRow(ref header, fs, ref row))
{
    // Do something with the parsed row
}

Bugs? Feature Requests?

Feel free to open a GitHub issue 😉

License

This library is available under the very permissive MIT license.

Acknowledgements

Adam and Derek Mathieson for help finding bugs.

Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.
  • net8.0

    • No dependencies.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last updated
1.2.2 101 11/6/2024
1.1.5 154 8/14/2024
1.1.3 116 8/14/2024
1.1.2 119 8/12/2024
1.0.1 137 8/8/2024