In this post, we will read a Comma Separated Value (CSV) file using LINQ and perform some calculations on the data.
Create a Console Application. Right click the project > Add > New File > Select a text template and rename it to Sample.csv. Add the following data in the CSV file
The data shown above shows Quarter wise sales by each SalesPerson. There are 5 columns. The first column is the SalesPersonID and the rest represents total items sold by the salesperson in each quarter.
We will now use LINQ to query this data. Write the following code:
using System;
using System.Linq;
using System.IO;
using System.Collections.Generic;
namespace CSVLINQ
{
class Program
{
static void Main(string[] args)
{
IEnumerable<string> strCSV =
File.ReadLines(@"../../Sample.csv");
var results = from str in strCSV
let tmp = str.Split(',')
.Skip(1)
.Select(x => Convert.ToInt32(x))
select new {
Max = tmp.Max(),
Min = tmp.Min(),
Total = tmp.Sum(),
Avg = tmp.Average()
};
// caching for performance
var query = results.ToList();
foreach (var x in query)
{
Console.WriteLine(
string.Format("Maximum: {0}, " +
"Minimum: {1}, " +
"Total: {2}, " +
"Average: {3}",
x.Max, x.Min, x.Total, x.Avg));
}
Console.ReadLine();
}
}
}
Shown above is an example that calculates the Max, Min, Sum and Average on the rows of a .CSV file. We start by skipping the SalesPersonID column and take the rest. Then each string is converted to an int and the entire sequence is selected as one row in ‘results’. Since ‘results’ is an IEnumerable, the query is not executed till we read from it. For large csv files, you can cache the results for performance gain. Finally loop and print the values.
Note: Make sure to handle empty spaces, errors and other characters in the CSV file. Check the different overloads of the Split() method.
OUTPUT
Tweet
No comments:
Post a Comment