This is an automated email from the ASF dual-hosted git repository. ggregory pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/commons-csv.git
commit c7ae7ff9a40cd334c45df07cbe00cacbb2bead9f Author: Gary Gregory <garydgreg...@gmail.com> AuthorDate: Wed Mar 12 20:25:18 2025 -0400 Migrate the User Guide to Javadoc --- src/main/javadoc/overview.html | 317 +++++++++++++++++++++++++++++++++++++++++ src/site/xdoc/index.xml | 17 +-- src/site/xdoc/user-guide.xml | 175 +---------------------- 3 files changed, 320 insertions(+), 189 deletions(-) diff --git a/src/main/javadoc/overview.html b/src/main/javadoc/overview.html new file mode 100644 index 00000000..46df7b2e --- /dev/null +++ b/src/main/javadoc/overview.html @@ -0,0 +1,317 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + https://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> +<html> +<head> +<title>Apache Commons CSV Overview</title> +</head> +<body> + <img src="../images/commons-logo.png" alt="Apache Commons CSV"> + <p> + You can find the Javadoc package list at the <a href="#all-packages-table">bottom of this page</a>. + </p> + <section> + <h1>Introducing Commons CSV</h1> + <p>Apache Commons CSV reads and writes files in variations of the Comma Separated Value (CSV) format.</p> + <p> + Common CSV formats are predefined in the <a href="org/apache/commons/csv/CSVFormat.html">CSVFormat</a> class: + <table> + <caption>CSV Formats</caption> + <thead> + <tr> + <th scope="col">CSVFormat</th> + <th scope="col">Description</th> + <th scope="col">Since Version</th> + </tr> + </thead> + <tbody> + <tr> + <td><a href="org/apache/commons/csv/CSVFormat.html#DEFAULT">DEFAULT</a></td> + <td>IO for the Standard Comma Separated Value format, like <a href="https://datatracker.ietf.org/doc/html/rfc4180">RFC 4180</a> but allowing + empty lines. + </td> + <td>1.0</td> + </tr> + <tr> + <td><a href="org/apache/commons/csv/CSVFormat.html#EXCEL">EXCEL</a></td> + <td>IO for the <a href="https://support.microsoft.com/en-us/office/import-or-export-text-txt-or-csv-files-5250ac4c-663c-47ce-937b-339e391393ba">Microsoft + Excel CSV.</a> format. + </td> + <td>1.0</td> + </tr> + <tr> + <td><a href="org/apache/commons/csv/CSVFormat.html#INFORMIX_UNLOAD">INFORMIX_UNLOAD</a></td> + <td>IO for the <a href="https://www.ibm.com/docs/en/informix-servers/14.10?topic=statements-unload-statement">Informix UNLOAD TO file_name</a> + command. + </td> + <td>1.3</td> + </tr> + <tr> + <td><a href="org/apache/commons/csv/CSVFormat.html#INFORMIX_UNLOAD_CSV">INFORMIX_UNLOAD_CSV</a></td> + <td>IO for the <a href="https://www.ibm.com/docs/en/informix-servers/14.10?topic=statements-unload-statement">Informix UNLOAD CSV TO + file_name</a> command with escaping disabled. + </td> + <td>1.3</td> + </tr> + <tr> + <td><a href="org/apache/commons/csv/CSVFormat.html#MONGODB_CSV">MONGODB_CSV</a></td> + <td>IO for the <a href="https://docs.mongodb.com/manual/reference/program/mongoexport/">MongoDB CSV <code>mongoexport</code></a> command. + </td> + <td>1.7</td> + </tr> + <tr> + <td><a href="org/apache/commons/csv/CSVFormat.html#MONGODB_TSV">MONGODB_TSV</a></td> + <td>IO for the <a href="https://docs.mongodb.com/manual/reference/program/mongoexport/">MongoDB Tab Separated Values (TSV)<code>mongoexport</code></a> + command. + </td> + <td>1.7</td> + </tr> + <tr> + <td><a href="org/apache/commons/csv/CSVFormat.html#MYSQL">MYSQL</a></td> + <td>IO for the <a href="https://dev.mysql.com/doc/refman/8.0/en/mysqldump-delimited-text.html">MySQL CSV</a> format. + </td> + <td>1.0</td> + </tr> + <tr> + <td><a href="org/apache/commons/csv/CSVFormat.html#ORACLE">ORACLE</a></td> + <td>IO for the <a href="https://docs.oracle.com/database/121/SUTIL/GUID-D1762699-8154-40F6-90DE-EFB8EB6A9AB0.htm#SUTIL4217">Oracle CSV</a> format + of the SQL*Loader utility. + </td> + <td>1.6</td> + </tr> + <tr> + <td><a href="org/apache/commons/csv/CSVFormat.html#POSTGRESQL_CSV">POSTGRESQL_CSV</a></td> + <td>IO for the <a href="https://www.postgresql.org/docs/current/static/sql-copy.html">PostgreSQL CSV</a> format used by the <code>COPY</code> + operation. + </td> + <td>1.5</td> + </tr> + <tr> + <td><a href="org/apache/commons/csv/CSVFormat.html#POSTGRESQL_TEXT">POSTGRESQL_TEXT</a></td> + <td>IO for the <a href="https://www.postgresql.org/docs/current/static/sql-copy.html">PostgreSQL Text</a> format used by the <code>COPY</code> + operation. + </td> + <td>1.5</td> + </tr> + <tr> + <td><a href="org/apache/commons/csv/CSVFormat.html#RFC4180">RFC4180</a></td> + <td>IO for the RFC-4180 format defined by<a href="https://datatracker.ietf.org/doc/html/rfc4180">RFC 4180</a>. + </td> + <td>1.0</td> + </tr> + <tr> + <td><a href="org/apache/commons/csv/CSVFormat.html#TDF">TDF</a></td> + <td>IO for the <a href="https://en.wikipedia.org/wiki/Tab-separated_values">Tab Delimited Format</a> (also known as Tab Separated Values). + </td> + <td>1.0</td> + </tr> + </tbody> + </table> + <p>Custom formats can be created using a fluent style API.</p> + </section> + <section> + <h1>Parsing Standard CSV Files</h1> + <p> + Parsing files with Apache Commons CSV is relatively straight forward. Pick a + <code>CSVFormat</code> + and go from there. + </p> + <section> + <h2>Parsing an Excel CSV File</h2> + <p>To parse an Excel CSV file, write:</p> + <pre> + <code> +Reader in = new FileReader("path/to/file.csv"); +Iterable<CSVRecord> records = CSVFormat.EXCEL.parse(in); +for (CSVRecord record : records) { + String lastName = record.get("Last Name"); + String firstName = record.get("First Name"); +} + </code> + </pre> + </section> + </section> + <section> + <h1>Parsing Custom CSV Files</h1> + <p> + You can define your own using IO rules by building your own CSVFormat instance. Starting with + <code>CSVFormat.builder()</code> + lets you start from a predefined format and customize. For example: + </p> + <pre> + <code> +CSVFormat myFormat = CSVFormat.DEFAULT.builder() + .setCommentMarker('#') + .setEscape('+') + .setIgnoreSurroundingSpaces(true) + .setQuote('"') + .setQuoteMode(QuoteMode.ALL) + .get() + </code> + </pre> + </section> + <section> + <h1>Handling Byte Order Marks</h1> + <p> + To handle files that start with a Byte Order Mark (BOM), like some Excel CSV files, you need an extra step to deal with the optional BOM bytes. Using the + <a href="https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BOMInputStream.html"> BOMInputStream </a> class from <a + href="https://commons.apache.org/proper/commons-io/">Apache Commons IO</a> simplifies this task; for example: + </p> + <pre> + <code> +try (Reader reader = new InputStreamReader(BOMInputStream.builder() + .setPath(path) + .get(), "UTF-8"); + CSVParser parser = CSVFormat.EXCEL.builder() + .setHeader() + .get() + .parse(reader)) { + for (CSVRecord record : parser) { + String string = record.get("ColumnA"); + // ... + } +} + </code> + </pre> + <p>You might find it handy to create something like this:</p> + <pre> + <code> +/** + * Creates a reader capable of handling BOMs. + * + * @param path The path to read. + * @return a new InputStreamReader for UTF-8 bytes. + * @throws IOException if an I/O error occurs. + */ +public InputStreamReader newReader(final Path path) throws IOException { + return new InputStreamReader(BOMInputStream.builder() + .setPath(path) + .get(), StandardCharsets.UTF_8); +} + </code> + </pre> + </section> + <section> + <h1>Using Headers</h1> + <p> + Apache Commons CSV provides several ways to access record values. The simplest way is to access values by their index in the record. However, columns in + CSV files often have a name, for example: ID, CustomerNo, Birthday, etc. The CSVFormat class provides an API for specifying these <i>header</i> names and + CSVRecord on the other hand has methods to access values by their corresponding header name. + </p> + <section> + <h2>Accessing column values by index</h2> + <p>To access a record value by index, no special configuration of the CSVFormat is necessary:</p> + <pre> + <code> +Reader in = new FileReader("path/to/file.csv"); +Iterable<CSVRecord> records = CSVFormat.RFC4180.parse(in); +for (CSVRecord record : records) { + String columnOne = record.get(0); + String columnTwo = record.get(1); +} + </code> + </pre> + </section> + <section> + <h2>Defining a header manually</h2> + <p>Indices may not be the most intuitive way to access record values. For this reason it is possible to assign names to each column in the file:</p> + <pre> + <code> +Reader in = new FileReader("path/to/file.csv"); +Iterable<CSVRecord> records = CSVFormat.RFC4180.builder() + .setHeader("ID", "CustomerNo", "Name") + .build() + .parse(in); +for (CSVRecord record : records) { + String id = record.get("ID"); + String customerNo = record.get("CustomerNo"); + String name = record.get("Name"); +} + </code> + </pre> + Note that column values can still be accessed using their index. + </section> + <section> + <h2>Using an enum to define a header</h2> + <p>Using String values all over the code to reference columns can be error prone. For this reason, it is possible to define an enum to specify header + names. Note that the enum constant names are used to access column values. This may lead to enums constant names which do not follow the Java coding + standard of defining constants in upper case with underscores:</p> + <pre> + <code> +public enum Headers { + ID, CustomerNo, Name +} +Reader in = new FileReader("path/to/file.csv"); +Iterable<CSVRecord> records = CSVFormat.RFC4180.builder() + .setHeader(Headers.class) + .build() + .parse(in); +for (CSVRecord record : records) { + String id = record.get(Headers.ID); + String customerNo = record.get(Headers.CustomerNo); + String name = record.get(Headers.Name); +} + </code> + </pre> + Again it is possible to access values by their index and by using a String (for example "CustomerNo"). + </section> + <section> + <h2>Header auto detection</h2> + <p>Some CSV files define header names in their first record. If configured, Apache Commons CSV can parse the header names from the first record:</p> + <pre> + <code> +Reader in = new FileReader("path/to/file.csv"); +Iterable<CSVRecord> records = CSVFormat.RFC4180.builder() + .setHeader() + .setSkipHeaderRecord(true) + .build() + .parse(in); +for (CSVRecord record : records) { + String id = record.get("ID"); + String customerNo = record.get("CustomerNo"); + String name = record.get("Name"); +} + </code> + </pre> + This will use the values from the first record as header names and skip the first record when iterating. + </section> + <section> + <h2>Printing with headers</h2> + <p>To print a CSV file with headers, you specify the headers in the format:</p> + <pre> + <code> +Appendable out = ...; +CSVPrinter printer = CSVFormat.DEFAULT.builder() + .setHeader("H1", "H2") + .build() + .print(out); + </code> + </pre> + <p>To print a CSV file with JDBC column labels, you specify the ResultSet in the format:</p> + <pre> + <code> +try (ResultSet resultSet = ...) { + CSVPrinter printer = CSVFormat.DEFAULT.builder() + .setHeader(resultSet) + .build() + .print(out); +} + </code> + </pre> + </section> + </section> +</body> +</html> diff --git a/src/site/xdoc/index.xml b/src/site/xdoc/index.xml index 0e10975c..491a384b 100644 --- a/src/site/xdoc/index.xml +++ b/src/site/xdoc/index.xml @@ -24,26 +24,13 @@ limitations under the License. <!-- ================================================== --> <section name="Using Apache Commons CSV"> <p>Commons CSV reads and writes files in variations of the Comma Separated Value (CSV) format.</p> - <p>The most common CSV formats are predefined in the <a href="apidocs/org/apache/commons/csv/CSVFormat.html">CSVFormat</a> class: - <ul> - <li><a href="https://support.microsoft.com/en-us/office/import-or-export-text-txt-or-csv-files-5250ac4c-663c-47ce-937b-339e391393ba">Microsoft Excel</a></li> - <li><a href="https://www.ibm.com/docs/en/informix-servers/14.10?topic=statements-unload-statement">Informix UNLOAD</a></li> - <li><a href="https://www.ibm.com/docs/en/informix-servers/14.10?topic=statements-unload-statement">Informix UNLOAD CSV</a></li> - <li><a href="https://dev.mysql.com/doc/refman/8.0/en/mysqldump-delimited-text.html">MySQL</a></li> - <li><a href="https://docs.oracle.com/database/121/SUTIL/GUID-D1762699-8154-40F6-90DE-EFB8EB6A9AB0.htm#SUTIL4217">Oracle</a></li> - <li><a href="https://www.postgresql.org/docs/current/static/sql-copy.html">PostgreSQL CSV</a></li> - <li><a href="https://www.postgresql.org/docs/current/static/sql-copy.html">PostgreSQL Text</a></li> - <li><a href="https://datatracker.ietf.org/doc/html/rfc4180">RFC 4180</a></li> - <li><a href="https://en.wikipedia.org/wiki/Tab-separated_values">TDF</a></li> - </ul> - </p> - <p>Custom formats can be created using a fluent style API.</p> + <p>Read the documentation starting with the <a href="apidocs/index.html">Javadoc Overview</a>.</p> </section> <!-- ================================================== --> <section name="Documentation"> <p> An overview of the functionality is provided in the -<a href="user-guide.html">user guide</a>. +<a href="apidocs/index.html">user guide</a>. Various <a href="project-reports.html">project reports</a> are also available. </p> <p> diff --git a/src/site/xdoc/user-guide.xml b/src/site/xdoc/user-guide.xml index 3ec3dd9b..64d9a403 100644 --- a/src/site/xdoc/user-guide.xml +++ b/src/site/xdoc/user-guide.xml @@ -21,179 +21,6 @@ limitations under the License. <author email="d...@commons.apache.org">Apache Commons Documentation Team</author> </properties> <body> - <!-- ================================================== --> - - <h1>Apache Commons CSV User Guide</h1> - - <macro name="toc"> - </macro> - - <section name="Parsing files"> - - Parsing files with Apache Commons CSV is relatively straight forward. - The CSVFormat class provides some commonly used CSV variants: - - <dl> - <dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#DEFAULT">DEFAULT</a></dt><dd>Standard Comma Separated Value format, as for RFC4180 but allowing empty lines.</dd> - <dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#EXCEL">EXCEL</a></dt><dd>The Microsoft Excel CSV format.</dd> - <dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#INFORMIX_UNLOAD">INFORMIX_UNLOAD<sup>1.3</sup></a></dt><dd>Informix <a href="http://www.ibm.com/support/knowledgecenter/SSBJG3_2.5.0/com.ibm.gen_busug.doc/c_fgl_InOutSql_UNLOAD.htm">UNLOAD</a> format used by the <code>UNLOAD TO file_name</code> operation.</dd> - <dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#INFORMIX_UNLOAD_CSV">INFORMIX_UNLOAD_CSV<sup>1.3</sup></a></dt><dd>Informix <a href="http://www.ibm.com/support/knowledgecenter/SSBJG3_2.5.0/com.ibm.gen_busug.doc/c_fgl_InOutSql_UNLOAD.htm">CSV UNLOAD</a> format used by the <code>UNLOAD TO file_name</code> operation (escaping is disabled.)</dd> - <dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#MYSQL">MONGO_CSV<sup>1.7</sup></a></dt><dd>MongoDB CSV format used by the <code>mongoexport</code> operation.</dd> - <dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#MYSQL">MONGO_TSV<sup>1.7</sup></a></dt><dd>MongoDB TSV format used by the <code>mongoexport</code> operation.</dd> - <dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#MYSQL">MYSQL</a></dt><dd>The MySQL CSV format.</dd> - <dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#ORACLE">ORACLE<sup>1.6</sup></a></dt><dd>Default Oracle format used by the SQL*Loader utility.</dd> - <dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#POSTGRESSQL_CSV">POSTGRESSQL_CSV<sup>1.5</sup></a></dt><dd>Default PostgreSQL CSV format used by the COPY operation.</dd> - <dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#POSTGRESSQL_TEXT">POSTGRESSQL_TEXT<sup>1.5</sup></a></dt><dd>Default PostgreSQL text format used by the COPY operation.</dd> - <dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#RFC4180">RFC-4180</a></dt><dd>The RFC-4180 format defined by <a href="https://datatracker.ietf.org/doc/html/rfc4180">RFC-4180</a>.</dd> - <dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#TDF">TDF</a></dt><dd>A tab delimited format.</dd> - </dl> - - <subsection name="Example: Parsing an Excel CSV File"> - <p>To parse an Excel CSV file, write:</p> - <source>Reader in = new FileReader("path/to/file.csv"); -Iterable<CSVRecord> records = CSVFormat.EXCEL.parse(in); -for (CSVRecord record : records) { - String lastName = record.get("Last Name"); - String firstName = record.get("First Name"); -} - </source> - </subsection> - <subsection name="Handling Byte Order Marks"> - <p> - To handle files that start with a Byte Order Mark (BOM) like some Excel CSV files, you need an extra step to - deal with these optional bytes. - You can use the - <a href="https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BOMInputStream.html"> - BOMInputStream - </a> - class from - <a href="https://commons.apache.org/proper/commons-io/">Apache Commons IO</a> - for example: - </p> - <source> -try (Reader reader = new InputStreamReader(BOMInputStream.builder() - .setPath(path) - .get(), "UTF-8"); - CSVParser parser = CSVFormat.EXCEL.builder() - .setHeader() - .get() - .parse(reader)) { - for (final CSVRecord record : parser) { - final String string = record.get("ColumnA"); - // ... - } -} - </source> - <p> - You might find it handy to create something like this: - </p> - <source> -/** - * Creates a reader capable of handling BOMs. - * - * @param path The path to read. - * @return a new InputStreamReader for UTF-8 bytes. - * @throws IOException if an I/O error occurs. - */ -public InputStreamReader newReader(final Path path) throws IOException { - return new InputStreamReader(BOMInputStream.builder() - .setPath(path) - .get(), StandardCharsets.UTF_8); -} - </source> - </subsection> - </section> - <section name="Working with headers"> - Apache Commons CSV provides several ways to access record values. - The simplest way is to access values by their index in the record. - However, columns in CSV files often have a name, for example: ID, CustomerNo, Birthday, etc. - The CSVFormat class provides an API for specifying these <i>header</i> names and CSVRecord on - the other hand has methods to access values by their corresponding header name. - <subsection name="Accessing column values by index"> - To access a record value by index, no special configuration of the CSVFormat is necessary: - <source>Reader in = new FileReader("path/to/file.csv"); -Iterable<CSVRecord> records = CSVFormat.RFC4180.parse(in); -for (CSVRecord record : records) { - String columnOne = record.get(0); - String columnTwo = record.get(1); -} - </source> - </subsection> - <subsection name="Defining a header manually"> - Indices may not be the most intuitive way to access record values. For this reason it is possible to - assign names to each column in the file: - <source>Reader in = new FileReader("path/to/file.csv"); -Iterable<CSVRecord> records = CSVFormat.RFC4180.builder() - .setHeader("ID", "CustomerNo", "Name") - .build() - .parse(in); -for (CSVRecord record : records) { - String id = record.get("ID"); - String customerNo = record.get("CustomerNo"); - String name = record.get("Name"); -} - </source> - Note that column values can still be accessed using their index. - </subsection> - <subsection name="Using an enum to define a header"> - Using String values all over the code to reference columns can be error prone. For this reason, - it is possible to define an enum to specify header names. Note that the enum constant names are - used to access column values. This may lead to enums constant names which do not follow the Java - coding standard of defining constants in upper case with underscores: - <source>public enum Headers { - ID, CustomerNo, Name -} -Reader in = new FileReader("path/to/file.csv"); -Iterable<CSVRecord> records = CSVFormat.RFC4180.builder() - .setHeader(Headers.class) - .build() - .parse(in); -for (CSVRecord record : records) { - String id = record.get(Headers.ID); - String customerNo = record.get(Headers.CustomerNo); - String name = record.get(Headers.Name); -} - </source> - Again it is possible to access values by their index and by using a String (for example "CustomerNo"). - </subsection> - <subsection name="Header auto detection"> - Some CSV files define header names in their first record. If configured, Apache Commons CSV can parse - the header names from the first record: - <source>Reader in = new FileReader("path/to/file.csv"); -Iterable<CSVRecord> records = CSVFormat.RFC4180.builder() - .setHeader() - .setSkipHeaderRecord(true) - .build() - .parse(in); -for (CSVRecord record : records) { - String id = record.get("ID"); - String customerNo = record.get("CustomerNo"); - String name = record.get("Name"); -} - </source> - This will use the values from the first record as header names and skip the first record when iterating. - </subsection> - <subsection name="Printing with headers"> - <p> - To print a CSV file with headers, you specify the headers in the format: - </p> - <source>final Appendable out = ...; -final CSVPrinter printer = CSVFormat.DEFAULT.builder() - .setHeader("H1", "H2") - .build() - .print(out); - </source> - <p> - To print a CSV file with JDBC column labels, you specify the ResultSet in the format: - </p> - <source>try (final ResultSet resultSet = ...) { - final CSVPrinter printer = CSVFormat.DEFAULT.builder() - .setHeader(resultSet) - .build() - .print(out); -} - </source> - </subsection> - </section> + <p>The User Guide migrated to the <a href="apidocs/index.html">Javadoc</a>.</p> </body> </document>