Java CSV parser library comparisons 

Joined:
04/09/2007
Posts:
776

June 22, 2011 12:14:15    Last update: June 22, 2011 12:15:12
Parsing a CVS file seemed to be a sore spot for software development. It's not simple enough that you can roll your own code in a couple of hours, and yet not deemed big enough for a full-fledged project. As a result, there are multiple libraries, each one with its own quirks. This is a summary of some simple tests I've done with five CSV parsers.

  1. Apache commons CSV parser:
    • Does not escape backslash. The backslash is treated as literal if not proceeding a double quote, but then there's no way to have a backslash as the last character before the ending quote (even though it's a rare scenario).
    • IOException thrown for unmatched quotes


  2. SuperCSV parser
    • Can't handle escapes inside quotes
    • Throws Exception for unmatched quotes


  3. OstermillerUtils CSV parser
    • Can't handle escapes inside quotes
    • Space before quotation mark messes up parsing
    • Does not handle new line inside quotes


  4. OpenCSV CSV parser
    • Mishandles spaces before and between items
    • Silently ignores unmatched quotes


  5. Skife CSV parser
    • Does not handle new line inside quotes


Conclusion? I'll use the Skife CSV parser if I know there are no "new line" characters in my data. I'll use the Apache CSV parser if I know there are no backslashes in my data.
Share |
| Comment  | Tags