chrisonoracle

A blog about Application Express and Oracle related topics.

Read Excel file in PL/SQL, using Java in the DB

We made some necessary internal changes in APEX 4.2.3, that stopped the Apex Listener Excel upload from working. As mentioned on our Known Issues page, a fix is available now for 4.2.3. While investigating the problem, I noticed that it is quite easy to parse Excel directly in the database, with Java and the Apache POI library. Below is an example how to use POI from PL/SQL. As mentioned, we do have a fix now, but maybe my little experiment is still useful to somebody.

First, I downloaded POI from here (I used version 3.9). After unzipping, I installed lib/commons-codec-1.5.jar, poi-ooxml-3.9-20121203.jar and poi-3.9-20121203.jar with loadjava. The package below has a pipelined table function that returns Excel cells, see the spec for an example.

create or replace package excelreader as
--------------------------------------------------------------------------------
--
--    NAME
--      excelreader.sql
--
--    DESCRIPTION
--      Read Excel file (xls or xlsx) in PL/SQL, using a pipelined table
--      function.
--
--    SYNOPSIS
--      SQL> select xls.*
--             from my_data,
--                  table(excelreader.get_cells(my_data.blob_value)) xls
--            where my_data.id = 42;
--
--      SHEET_NAME              ROW_NUM    COL_NUM CELL_VALUE
--      -------------------- ---------- ---------- --------------------
--      Sheet1                        1          1 h1
--      Sheet1                        1          2 h2
--      Sheet1                        1          3 h3
--      Sheet1                        2          1 1
--      Sheet1                        2          2 two
--      Sheet1                        2          3 three
--
--    MODIFIED   (MM/DD/YYYY)
--    cneumuel    10/31/2013 - Created
--
--------------------------------------------------------------------------------

type t_cell is record (
    sheet_name varchar2(4000),
    row_num    number,
    col_num    number,
    cell_value varchar2(32767) );
type t_cells is table of t_cell;

--==============================================================================
-- pipelined table function to return cells of an excel (xls or xlsx) file
--==============================================================================
function get_cells (
    p_source in blob )
    return t_cells
    pipelined;

--==============================================================================
-- internal use only
--==============================================================================
procedure private_add_cell (
    p_sheet_name in varchar2,
    p_row_num    in number,
    p_col_num    in number,
    p_cell_value in varchar2 );

end excelreader;
/

create or replace package body excelreader as
--------------------------------------------------------------------------------
--
--    NAME
--      excelreader.plb
--
--    DESCRIPTION
--      Read Excel file (xls or xlsx) in PL/SQL, using a pipelined table
--      function.
--
--    MODIFIED   (MM/DD/YYYY)
--    cneumuel    10/31/2013 - Created
--
--------------------------------------------------------------------------------

g_cells t_cells;

--==============================================================================
procedure java_generate_cells (
    p_source in blob )
is language java
name 'ExcelReader.generateCells(java.sql.Blob)';

--==============================================================================
function get_cells (
    p_source in blob )
    return t_cells
    pipelined
is
begin
    g_cells := new t_cells();

    if p_source is not null then
        java_generate_cells (
            p_source => p_source );
    end if;

    for i in 1 .. g_cells.count loop
        pipe row (g_cells(i));
    end loop;

    g_cells := null;
    return;
end get_cells;

--==============================================================================
procedure private_add_cell (
    p_sheet_name in varchar2,
    p_row_num    in number,
    p_col_num    in number,
    p_cell_value in varchar2 )
is
    l_row t_cell;
begin
    l_row.sheet_name := p_sheet_name;
    l_row.row_num    := p_row_num;
    l_row.col_num    := p_col_num;
    l_row.cell_value := p_cell_value;
    g_cells.extend;
    g_cells(g_cells.count) := l_row;
end private_add_cell;

end excelreader;
/
show err

--
-- the parsing code below is based on
--
-- https://svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/poi/ss/examples/ToCSV.java
--
create or replace java source named "ExcelReader" as
import java.lang.*;
import java.io.InputStream;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFRow;
import org.apache.poi.hssf.usermodel.HSSFCell;
import org.apache.poi.ss.usermodel.WorkbookFactory;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.DataFormatter;
import org.apache.poi.ss.usermodel.FormulaEvaluator;
import org.apache.poi.openxml4j.exceptions.InvalidFormatException;

public class ExcelReader {

    static void addCell (
        String pSheetName,
        int    pRowNum,
        int    pColNum,
        String pVarchar2Value )
        throws Exception
    {
        #sql {
            begin
                excelreader.private_add_cell (
                    p_sheet_name => :pSheetName,
                    p_row_num    => :pRowNum,
                    p_col_num    => :pColNum,
                    p_cell_value => :pVarchar2Value );
            end;
        };
    }

    public static void generateCells (
        java.sql.Blob pBlob )
        throws Exception
    {
        InputStream      lBlobStream = pBlob.getBinaryStream();
        HSSFWorkbook     lWorkbook   = new HSSFWorkbook(lBlobStream);
        FormulaEvaluator lEvaluator  = lWorkbook.getCreationHelper().createFormulaEvaluator();
        DataFormatter    lFormatter  = new DataFormatter(true);
        //---------------
        HSSFSheet        lSheet;
        int              lNumSheets = lWorkbook.getNumberOfSheets();
        //---------------
        int              lLastRowNum;
        Row              lRow;
        //---------------
        int              lLastCellNum;
        Cell             lCell;
        //---------------
        String           lValue;

        for (int lSheetNum=0; lSheetNum < lNumSheets; lSheetNum++) {
            //
            // process sheet #lSheetNum
            //
            lSheet = lWorkbook.getSheetAt(lSheetNum);

            if (lSheet.getPhysicalNumberOfRows() > 0) {
                lLastRowNum = lSheet.getLastRowNum();
                for (int lRowNum = 0; lRowNum <= lLastRowNum; lRowNum++) {
                    lRow = lSheet.getRow(lRowNum);
                    //
                    // process row #lRowNum
                    //
                    if (lRow != null) {
                        lLastCellNum = lRow.getLastCellNum();
                        for (int lCellNum = 0; lCellNum <= lLastCellNum; lCellNum++) {
                            //
                            // process cell #lCellNum
                            //
                            lCell = lRow.getCell(lCellNum);
                            if (lCell == null) {
                                lValue = "";
                            } else if (lCell.getCellType() != Cell.CELL_TYPE_FORMULA) {
                                lValue = lFormatter.formatCellValue(lCell);
                            } else {
                                lValue = lFormatter.formatCellValue(lCell, lEvaluator);
                            }
                            ExcelReader.addCell (
                                lWorkbook.getSheetName(lSheetNum),
                                lRowNum+1,
                                lCellNum+1,
                                lValue );
                        }
                    }
                }
            }
        }

        lBlobStream.close();
    }
};
/

That code is not very efficient. It calls back from Java to PL/SQL for each cell, but better solutions should not be difficult to implement. Btw, there exists a solution for XLSX only (i.e. no XLS), in pure PL/SQL.

12 responses to “Read Excel file in PL/SQL, using Java in the DB

  1. Anton November 13, 2013 at 20:44

    Did you know that the PLSQL-solution you mentioned is used in this plugin http://www.apex-plugin.com/oracle-apex-plugins/process-type-plugin/excel2collections_271.html , which does do xls (and csv),?

    • Bill October 14, 2014 at 17:09

      Hi Anton, Do you have a plugin that uses a table instead of a Collection?
      This would get around the Collection limit… I have 70 columns..
      Thank you, Bill

  2. chrisonoracle November 14, 2013 at 09:30

    Anton, I think I read that you wrote that plugin, but did not know that you even had a solution for XLS. Impressive, I’ll have a look at this. Can it also evaluate formulas? The current Apex Listener implementation copies the formula into the collection.

  3. daniyal November 18, 2014 at 13:12

    java_generate_rows invalid identifier

  4. daniyal November 19, 2014 at 08:44

    org.apache.poi.openxml4j.exceptions.InvalidFormatException: Can’t read content types part !

    InputStream lBlobStream = pBlob.getBinaryStream();

    XSSFWorkbook lWorkbook = new XSSFWorkbook(lBlobStream);

    error while executed the query.
    i think inputStream having some error

  5. chrisonoracle November 21, 2014 at 13:37

    Hi,
    as you probably already found out, there was a typo in the package body. It’s java_generate_cells and not java_generate_rows. I fixed that in the blog post – thanks for finding it! Not sure about the java error, though. I just tested my prototype again and could not reproduce the error you got.

  6. Apurva February 8, 2016 at 13:14

    the java class inside the package body does not compile and gives warning with status as invalid.

  7. lavs June 14, 2016 at 11:32

    In order to use #sql in the java code, should we being any setup / jars as I am getting compiler issues when I am seeing the java code alone on a Java IDE (Eclipse)

    • chrisonoracle June 14, 2016 at 11:44

      You just have to install the .jar files mentioned above with Oracle’s loadjava program and then run the script in sqlplus. Please post errors here, I have no idea what is in invalid state in your db, and why.

  8. PerplexedMan March 2, 2018 at 15:16

    If you want to deploy POI code in DB, you should check your Java DB version first:

    dbms_java.get_ojvm_property(PROPSTRING=>’java.version’)

    Default 11g (11.2.0.3.0) version is 1.5. Considering that POI 3.15 and later can’t be used on 11g because it requires commons-collections library developed with Java 1.6 specific code.

    POI 3.14 can be loaded but it comes compiled in Java 1.6 so you’ll get compilation error if you try to compile your custome package:
    org.apache.poi.hssf.usermodel.HSSFWorkbook
    bad class file: null
    class file has wrong version 50.0, should be 49.0

    You have been warned.

Leave a reply to daniyal Cancel reply