CsvReaderAdvanced 2.3.8

There is a newer version of this package available.
See the version list below for details.
dotnet add package CsvReaderAdvanced --version 2.3.8                
NuGet\Install-Package CsvReaderAdvanced -Version 2.3.8                
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="CsvReaderAdvanced" Version="2.3.8" />                
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add CsvReaderAdvanced --version 2.3.8                
#r "nuget: CsvReaderAdvanced, 2.3.8"                
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install CsvReaderAdvanced as a Cake Addin
#addin nuget:?package=CsvReaderAdvanced&version=2.3.8

// Install CsvReaderAdvanced as a Cake Tool
#tool nuget:?package=CsvReaderAdvanced&version=2.3.8                

CsvReaderAdvanced

The faster and most modern CSV reader adapted to DI principles.

Combine the power of the configuration JSON files with customized CSV reading.

How to install

Via tha Package Manager:

Install-Package CsvReaderAdvanced

Via the .NET CLI

dotnet add package CsvReaderAdvanced

How to use

First add the service to the ServiceCollection.

 builder.ConfigureServices((context, services) =>
        {
            services.AddCsvReader(context.Configuration);
        ...

Csv schemas via appsettings.json

To understand exactly what the method does, it assumes that the current configuration file contains a csvSchemas section, typically in the appsettings.json file:

public static IServiceCollection AddCsvReader(this IServiceCollection services, IConfiguration configuration)
{
    services.AddScoped<CsvReader>();
    services.AddScoped<CsvFileFactory>();

    //Microsoft.Extensions.Hosting must be referenced
    services.Configure<CsvSchemaOptions>(configuration.GetSection(CsvSchemaOptions.CsvSchemasSection));
    return services;
}

The schema in the appsettings.json file typically contains a property named csvSchemas:

"csvSchemas": {
    "schemas": [
      {
        "name": "products",
        "fields": [
          {
            "name": "ProductID",
            "alternatives": [ "Product ID" ],
            "required": true
          },
          {
            "name": "Weight",
            "unit": "t",
            "alternativeFields": [ "Volume", "TEU" ],
            "required": true
          },
          {
            "name": "Volume",
            "unit": "m^3",
            "alternativeUnits": [ "m3", "m^3" ]
...

We assume that we get the options via DI like the following example:

public Importer(
    IServiceProvider provider,
    ILogger logger,
    IOptions<CsvSchemaOptions> options)
{
    _provider = provider;
    _logger = logger;
    _options = options.Value;
}

protected readonly IServiceProvider _provider;
protected readonly ILogger _logger;
protected readonly CsvSchemaOptions _options;

public CsvSchema? GetSchema(string name) =>
    _options?.Schemas?.FirstOrDefault(s => s.Name == name);

public ValidationResult CheckForSchema(string name)
{
    if (_options?.Schemas is null || !_options.Schemas.Any())
    {
        _logger.LogError("Could not retrieve csv schemas from settings");
        return new ValidationResult(
            new ValidationFailure[] { new ValidationFailure("CsvSchemas", "Cannot retrieve csv schemas from settings") });
    }

    var schema = GetSchema(name);

    if (schema is null)
    {
        _logger.LogError("Could not retrieve '{schemaName}' schema from settings",name);
        return new ValidationResult(
            new ValidationFailure[] { new ValidationFailure(name, $"Cannot retrieve '{name}' schema from settings") });
    }
    return new ValidationResult();
}

Read the file

We instantiate a CsvFile via the CsvFileFactory (NOTE: this has changed in version 2.0). Note that the aforementioned CsvSchema is not needed if we do not have a header and/or do not want to validate the existence of fields. For the example below, we assume that a CsvSchema is checked.

//We assume that _provider is an IServiceProvider which is injected via DI
var fileFactory = _provider.GetCsvFileFactory();
var file = fileFactory.ReadWholeFile(path, Encoding.UTF8, withHeader:true);

//To minimally instantiate the file we should call the GetFile, which reads the header
var file = fileFactory.GetFile(path, Encoding.UTF8, withHeader:true);

If the withHeader argument is true, then the ReadHeader() method is called which populates the Header property. The PopulateColumns() method updates the internal ExistingColumns dictionary. The ExistingColumns dictionary is case insensitive and stores the index location for each column. The index location is zero-based. To check the existence of fields against a schema we should call the CheckAgainstSchema() method as shown below:

CsvScema schema = _options.Schemas.FirstOrDefault(s => s.Name == "products");
file.CheckAgainstSchema(schema);

The CheckAgainstSchema() method also calls the PopulateColumns() method if the ExistingColumns property is not populated. It then updates the ExistingFieldColumns dictionary, which is a dictionary of the column index location based on the field name. Additional properties (Hashsets) are populated: MissingFields, MissingRequiredFields.

Lines and ParsedValue

The most important updated property after the ReadFromFile() call is the Lines property, whic is a List of TokenizedLine? objects. The TokenizedLine struct contains the Tokens property which is a List of string objects. The power of this library is that each TokenizedLine may potentially span more than 1 lines. This can occur in the case of quoted strings which may span to the next line. In general all cases where quoted strings are met, are cases where a simple string.Split() cannot work. That's why the properties FromLine to ToLine exist. The latter are important for debugging purposes. The GetDouble/GetFloat/GetString/GetInt/GetByte/GetLong/GetDateTime/GetDateTimeOffset methods return a ParsedValue<T> struct. The ParsedValue is a useful wrapper the contains a Value, a IsParsed and a IsNull property.

var c = file.ExistingFieldColumns;

//we can use the following instead, in case we want to use the original field names within the header the CSV file
//var c = file.ExistingColumns;

foreach (var line in file.Lines)
{
    TokenizedLine l = line.Value;
    
    //for strings we can immediately retrieve the token based on the field name
    string name = l.Tokens[c["ProductName"]];

    var weightValue = l.GetDouble("Weight", c);
    if (!weightValue.Parsed)
        _logger.LogError("Cannot parse Weight {value} at line {line}.", weightValue.Value, l.FromLine);
    else
    {
        //implicit conversion to double if value exists
        double weight = weightValue;
    ...
    }

    //or implicit conversion to double? - can be both null or non null
    double? weight2 = weightValue;
...

Example 1 - Simple case without schema

Let's assume that we have a simple csv file with known headers. The simplest case is to use the ExistingColumns property. This is populated after the call to ReadFromFile when the withHeader argument is set to true.

Suppose that there are 3 labels in the header, namely: FullName, DoubleValue and IntValue representing a string, double and int field for each record. The sample content of the file is the following:

FullName;DoubleValue;IntValue
name1;20.0;4
name2;30.0;5

The full code to read them is then:

//build the app
var host = Host.CreateDefaultBuilder(args).ConfigureServices((c, s) => s.AddCsvReader(c.Configuration));
var app = host.Build();


string path = @".\samples\hard.csv";

//read the whole file
var file = app.Services.GetCsvFileFactory()
    .ReadWholeFile(path, Encoding.UTF8, withHeader: true);

//get the values
var c = file.ExistingColumns; //Dictionary<string, int>
foreach (var l in file.Lines!)
{
    if (!l.HasValue) return;
    var t = l.Value.Tokens; //List<string>
    string? v1 = l.Value.GetString("FullName", c);
    double? v2 = l.Value.GetDouble("DoubleValue", c);
    int? v3 = l.Value.GetInt("IntValue", c);
    ...
}

Example 2 - Avoid preloading the whole data from the file

We can use the Read method in order to load the file in a lazy-read manner (i.e. the lines are not pre-loaded). In this case we should instantiate the CsvFile instance using the GetFile method instead. See the modified example below, which in practice saves memory in many cases:

//read the header only from the file
CsvFile file = app.Services.GetCsvFileFactory()
    .GetFile(path, Encoding.UTF8, withHeader: true);

//get the values
var c = file.ExistingColumns; //Dictionary<string, int>

//lazy enumerate using the Read function
foreach (TokenizedLine? l in file.Read(skipHeader: true))
{
    if (!l.HasValue) return;
    var t = l.Value.Tokens; //List<string>

STAY TUNED

Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages (1)

Showing the top 1 NuGet packages that depend on CsvReaderAdvanced:

Package Downloads
EndpointProviders

A modern way to add Dependency Injection used for Minimal API apps. See README.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last updated
2.4.0 81 11/15/2024
2.3.8 92 10/16/2024
2.3.7 138 7/27/2024
2.3.6 101 7/27/2024
2.3.5 81 7/26/2024
2.3.4 94 7/12/2024
2.3.3 112 6/22/2024
2.3.2 100 6/22/2024
2.3.1 256 3/1/2024
2.3.0 217 12/13/2023
2.2.1 152 11/28/2023
2.2.0 181 10/17/2023
2.1.1 147 10/15/2023
2.1.0 132 10/15/2023
1.3.3 125 10/14/2023
1.3.2 120 10/14/2023
1.3.0 139 10/13/2023
1.2.6 144 9/29/2023
1.2.5 196 7/18/2023
1.2.4 192 7/16/2023
1.2.2 159 7/16/2023
1.2.1 170 7/14/2023
1.2.0 165 7/14/2023
1.1.15 171 7/14/2023
1.1.14 170 7/14/2023
1.1.13 181 7/7/2023
1.1.12 273 7/6/2023
1.1.11 158 7/5/2023
1.1.10 180 7/5/2023
1.1.9 159 6/27/2023
1.1.8 146 6/26/2023
1.1.7 148 6/24/2023
1.1.6 144 6/24/2023
1.1.5 147 6/23/2023
1.1.2 154 6/23/2023
1.0.28 165 6/23/2023
1.0.27 157 6/23/2023
1.0.26 140 6/19/2023
1.0.25 167 6/18/2023
1.0.24 147 6/18/2023
1.0.23 166 6/18/2023
1.0.22 154 6/18/2023
1.0.21 153 6/17/2023
1.0.20 153 6/17/2023
1.0.19 151 6/17/2023
1.0.18 154 6/17/2023
1.0.17 164 6/17/2023
1.0.16 148 6/17/2023
1.0.15 152 6/17/2023
1.0.12 146 6/17/2023
1.0.11 155 6/17/2023
1.0.10 139 6/17/2023
1.0.9 148 6/17/2023
1.0.8 157 6/17/2023
1.0.7 143 6/17/2023
1.0.6 156 6/16/2023
1.0.5 167 6/16/2023
1.0.4 146 6/16/2023 1.0.4 is deprecated because it is no longer maintained.
1.0.2 137 6/16/2023 1.0.2 is deprecated because it is no longer maintained.