chrisonoracle

A blog about Application Express and Oracle related topics.

Read Excel file in PL/SQL, using Java in the DB

We made some necessary internal changes in APEX 4.2.3, that stopped the Apex Listener Excel upload from working. As mentioned on our Known Issues page, a fix is available now for 4.2.3. While investigating the problem, I noticed that it is quite easy to parse Excel directly in the database, with Java and the Apache POI library. Below is an example how to use POI from PL/SQL. As mentioned, we do have a fix now, but maybe my little experiment is still useful to somebody.

First, I downloaded POI from here (I used version 3.9). After unzipping, I installed lib/commons-codec-1.5.jar, poi-ooxml-3.9-20121203.jar and poi-3.9-20121203.jar with loadjava. The package below has a pipelined table function that returns Excel cells, see the spec for an example.

create or replace package excelreader as
--------------------------------------------------------------------------------
--
--    NAME
--      excelreader.sql
--
--    DESCRIPTION
--      Read Excel file (xls or xlsx) in PL/SQL, using a pipelined table
--      function.
--
--    SYNOPSIS
--      SQL> select xls.*
--             from my_data,
--                  table(excelreader.get_cells(my_data.blob_value)) xls
--            where my_data.id = 42;
--
--      SHEET_NAME              ROW_NUM    COL_NUM CELL_VALUE
--      -------------------- ---------- ---------- --------------------
--      Sheet1                        1          1 h1
--      Sheet1                        1          2 h2
--      Sheet1                        1          3 h3
--      Sheet1                        2          1 1
--      Sheet1                        2          2 two
--      Sheet1                        2          3 three
--
--    MODIFIED   (MM/DD/YYYY)
--    cneumuel    10/31/2013 - Created
--
--------------------------------------------------------------------------------

type t_cell is record (
    sheet_name varchar2(4000),
    row_num    number,
    col_num    number,
    cell_value varchar2(32767) );
type t_cells is table of t_cell;

--==============================================================================
-- pipelined table function to return cells of an excel (xls or xlsx) file
--==============================================================================
function get_cells (
    p_source in blob )
    return t_cells
    pipelined;

--==============================================================================
-- internal use only
--==============================================================================
procedure private_add_cell (
    p_sheet_name in varchar2,
    p_row_num    in number,
    p_col_num    in number,
    p_cell_value in varchar2 );

end excelreader;
/

create or replace package body excelreader as
--------------------------------------------------------------------------------
--
--    NAME
--      excelreader.plb
--
--    DESCRIPTION
--      Read Excel file (xls or xlsx) in PL/SQL, using a pipelined table
--      function.
--
--    MODIFIED   (MM/DD/YYYY)
--    cneumuel    10/31/2013 - Created
--
--------------------------------------------------------------------------------

g_cells t_cells;

--==============================================================================
procedure java_generate_cells (
    p_source in blob )
is language java
name 'ExcelReader.generateCells(java.sql.Blob)';

--==============================================================================
function get_cells (
    p_source in blob )
    return t_cells
    pipelined
is
begin
    g_cells := new t_cells();

    if p_source is not null then
        java_generate_rows (
            p_source => p_source );
    end if;

    for i in 1 .. g_cells.count loop
        pipe row (g_cells(i));
    end loop;

    g_cells := null;
    return;
end get_cells;

--==============================================================================
procedure private_add_cell (
    p_sheet_name in varchar2,
    p_row_num    in number,
    p_col_num    in number,
    p_cell_value in varchar2 )
is
    l_row t_cell;
begin
    l_row.sheet_name := p_sheet_name;
    l_row.row_num    := p_row_num;
    l_row.col_num    := p_col_num;
    l_row.cell_value := p_cell_value;
    g_cells.extend;
    g_cells(g_cells.count) := l_row;
end private_add_cell;

end excelreader;
/
show err

--
-- the parsing code below is based on
--
-- https://svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/poi/ss/examples/ToCSV.java
--
create or replace java source named "ExcelReader" as
import java.lang.*;
import java.io.InputStream;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFRow;
import org.apache.poi.hssf.usermodel.HSSFCell;
import org.apache.poi.ss.usermodel.WorkbookFactory;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.DataFormatter;
import org.apache.poi.ss.usermodel.FormulaEvaluator;
import org.apache.poi.openxml4j.exceptions.InvalidFormatException;

public class ExcelReader {

    static void addCell (
        String pSheetName,
        int    pRowNum,
        int    pColNum,
        String pVarchar2Value )
        throws Exception
    {
        #sql {
            begin
                excelreader.private_add_cell (
                    p_sheet_name => :pSheetName,
                    p_row_num    => :pRowNum,
                    p_col_num    => :pColNum,
                    p_cell_value => :pVarchar2Value );
            end;
        };
    }

    public static void generateCells (
        java.sql.Blob pBlob )
        throws Exception
    {
        InputStream      lBlobStream = pBlob.getBinaryStream();
        HSSFWorkbook     lWorkbook   = new HSSFWorkbook(lBlobStream);
        FormulaEvaluator lEvaluator  = lWorkbook.getCreationHelper().createFormulaEvaluator();
        DataFormatter    lFormatter  = new DataFormatter(true);
        //---------------
        HSSFSheet        lSheet;
        int              lNumSheets = lWorkbook.getNumberOfSheets();
        //---------------
        int              lLastRowNum;
        Row              lRow;
        //---------------
        int              lLastCellNum;
        Cell             lCell;
        //---------------
        String           lValue;

        for (int lSheetNum=0; lSheetNum < lNumSheets; lSheetNum++) {
            //
            // process sheet #lSheetNum
            //
            lSheet = lWorkbook.getSheetAt(lSheetNum);

            if (lSheet.getPhysicalNumberOfRows() > 0) {
                lLastRowNum = lSheet.getLastRowNum();
                for (int lRowNum = 0; lRowNum <= lLastRowNum; lRowNum++) {
                    lRow = lSheet.getRow(lRowNum);
                    //
                    // process row #lRowNum
                    //
                    if (lRow != null) {
                        lLastCellNum = lRow.getLastCellNum();
                        for (int lCellNum = 0; lCellNum <= lLastCellNum; lCellNum++) {
                            //
                            // process cell #lCellNum
                            //
                            lCell = lRow.getCell(lCellNum);
                            if (lCell == null) {
                                lValue = "";
                            } else if (lCell.getCellType() != Cell.CELL_TYPE_FORMULA) {
                                lValue = lFormatter.formatCellValue(lCell);
                            } else {
                                lValue = lFormatter.formatCellValue(lCell, lEvaluator);
                            }
                            ExcelReader.addCell (
                                lWorkbook.getSheetName(lSheetNum),
                                lRowNum+1,
                                lCellNum+1,
                                lValue );
                        }
                    }
                }
            }
        }

        lBlobStream.close();
    }
};
/

That code is not very efficient. It calls back from Java to PL/SQL for each cell, but better solutions should not be difficult to implement. Btw, there exists a solution for XLSX only (i.e. no XLS), in pure PL/SQL.

About these ads

2 responses to “Read Excel file in PL/SQL, using Java in the DB

  1. Anton November 13, 2013 at 20:44

    Did you know that the PLSQL-solution you mentioned is used in this plugin http://www.apex-plugin.com/oracle-apex-plugins/process-type-plugin/excel2collections_271.html , which does do xls (and csv),?

  2. chrisonoracle November 14, 2013 at 09:30

    Anton, I think I read that you wrote that plugin, but did not know that you even had a solution for XLS. Impressive, I’ll have a look at this. Can it also evaluate formulas? The current Apex Listener implementation copies the formula into the collection.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.