Overview

IdComLog.Data is a CSV and fixed-width parsing library aiming to be feature-rich and easy to use while maintaining high performance and flexibility:

  • Built on .NET 4.5
  • Reads and writes both raw records and serialized objects.
  • Synchronous and asynchronous operation through Tasks, IEnumerable, and IAsyncEnumerable.
  • Integrates with DataAnnotations to control object serialization.
  • Uses optimized dynamic compilation to maintain performance with serialization.

The public API is documented thoroughly with XML comments. This page currently only provides a basic overview of the library.

Object Serialization

All public fields and properties of objects will be serialized, unless they are annotated with the NotMapped attribute. Serialization can be further controlled by other annotations:

Column The Name property controls the name of CSV headers, and the Order property will control the order in which they are written. Part of System.ComponentModel.DataAnnotations.Schema.
DataFormat For member types which implement IFormattable, DataFormat can specify the format string to use when serializing the member.
DataWidth For fixed-width formats, DataWidth must be used to specify the width of the data. If not specified, it will be assumed to be 0.

When writing, the object's ToString() method is called. When reading, a static Parse() method for the member's type is called. Variants taking an IFormatProvider are preferred, if found.

Common interfaces

IDataReader

IDataReader allows reading of raw records, providing the ReadRow() methods to be implemented by CsvReader and FixedReader.

IDataReader<T>, inheriting from IDataReader, allows reading of serialized objects, providing the ReadObject() methods to be implemented by CsvReader<T> and FixedReader<T>.

There are a number of extension methods which provide for common needs: ReadRowAsync() and ReadObjectAsync() without a cancellation token, EnumerateRows(), and EnumerateObjects().

IDataWriter

IDataWriter allows writing of raw records, providing the WriteRow(), Close(), and Flush() methods.

IDataWriter<T>, inheriting from IDataWriter, allows writing of serialized objects, providing the WriteObject() methods to be implemented by CsvWriter<T> and FixedWriter<T>.

To improve efficiency, writers implement a small internal buffer. When a writer is finished being used, the Close() method must be called to ensure this internal buffer is flushed to the underlying TextWriter.

The Flush() method will not only flush the internal buffer, but the underlying TextWriter as well.

There are a number of extension methods which provide for common needs: WriteRowAsync(), WriteObjectAsync(), and FlushAsync() without a cancellation token, WriteRows(), and WriteObjects().

CSV

Despite being formally defined in RFC 4180, CSV has been around far longer and thus it's very easy and common to find a fragile implementation which produces ambiguously quoted data. To be as robust as possible, quotes inside of quoted values will be interpreted as data if they are not immediately followed by a separator or newline. Newlines can be any of CR, LF, CRLF, or LFCR.

When writing, RFC 4180 is fully obeyed.

CsvReader

CsvReader's constructor has two parameters which configure the file format, and one which trades memory for performance:

char separator The separator to use between values in a record. Default ','.
bool decodeMidQuotes If true, two double-quotes will be decoded as a single double-quote of data, as per RFC 4180. If false, they will be treated as two double-quotes. Default true.
int bufferLength An internal buffer length. Larger values will provide higher performance and perform less I/O, at the cost of consuming more memory. Default 512.

The ReadRow method should be called until it returns null, indicating the end of stream has been reached.

CsvReader<T> has an additional parameter which controls formatting:

IFormatProvider formatProvider The format provider to use when a member type implements IFormattable. Default null.

The ReadObject method returns an Optional<T>, a type similar to Nullable<T> but supporting reference types. Note unlike Nullable<T>, null is a valid value for Optional<T>. ReadObject should be called until an empty Optional<T> is returned, testable either through the HasValue property or by comparison to Optional<T>.Empty.

CsvWriter

CsvWriter's constructor has one parameter which configures the file format:

char separator The separator to use between values in a record. Default ','.

The WriteRow method should be called to write all rows, followed by calling the Close method to ensure any internally buffered data is flushed to the underlying TextReader.

CsvWriter<T> has an additional parameter which controls formatting:

IFormatProvider formatProvider The format provider to use when a member type implements IFormattable. Default null.

The WriteObject method should be called to write all objects, followed by calling the Close method to ensure any internally buffered data is flushed to the underlying TextReader.

Last edited Sep 3, 2012 at 10:22 PM by CoryIdComLog, version 4

Comments

No comments yet.