chrisonoracle
A blog about Application Express and Oracle related topics.
Read Excel file in PL/SQL, using Java in the DB
November 13, 2013
Posted by on We made some necessary internal changes in APEX 4.2.3, that stopped the Apex Listener Excel upload from working. As mentioned on our Known Issues page, a fix is available now for 4.2.3. While investigating the problem, I noticed that it is quite easy to parse Excel directly in the database, with Java and the Apache POI library. Below is an example how to use POI from PL/SQL. As mentioned, we do have a fix now, but maybe my little experiment is still useful to somebody.
First, I downloaded POI from here (I used version 3.9). After unzipping, I installed lib/commons-codec-1.5.jar, poi-ooxml-3.9-20121203.jar and poi-3.9-20121203.jar with loadjava. The package below has a pipelined table function that returns Excel cells, see the spec for an example.
create or replace package excelreader as -------------------------------------------------------------------------------- -- -- NAME -- excelreader.sql -- -- DESCRIPTION -- Read Excel file (xls or xlsx) in PL/SQL, using a pipelined table -- function. -- -- SYNOPSIS -- SQL> select xls.* -- from my_data, -- table(excelreader.get_cells(my_data.blob_value)) xls -- where my_data.id = 42; -- -- SHEET_NAME ROW_NUM COL_NUM CELL_VALUE -- -------------------- ---------- ---------- -------------------- -- Sheet1 1 1 h1 -- Sheet1 1 2 h2 -- Sheet1 1 3 h3 -- Sheet1 2 1 1 -- Sheet1 2 2 two -- Sheet1 2 3 three -- -- MODIFIED (MM/DD/YYYY) -- cneumuel 10/31/2013 - Created -- -------------------------------------------------------------------------------- type t_cell is record ( sheet_name varchar2(4000), row_num number, col_num number, cell_value varchar2(32767) ); type t_cells is table of t_cell; --============================================================================== -- pipelined table function to return cells of an excel (xls or xlsx) file --============================================================================== function get_cells ( p_source in blob ) return t_cells pipelined; --============================================================================== -- internal use only --============================================================================== procedure private_add_cell ( p_sheet_name in varchar2, p_row_num in number, p_col_num in number, p_cell_value in varchar2 ); end excelreader; / create or replace package body excelreader as -------------------------------------------------------------------------------- -- -- NAME -- excelreader.plb -- -- DESCRIPTION -- Read Excel file (xls or xlsx) in PL/SQL, using a pipelined table -- function. -- -- MODIFIED (MM/DD/YYYY) -- cneumuel 10/31/2013 - Created -- -------------------------------------------------------------------------------- g_cells t_cells; --============================================================================== procedure java_generate_cells ( p_source in blob ) is language java name 'ExcelReader.generateCells(java.sql.Blob)'; --============================================================================== function get_cells ( p_source in blob ) return t_cells pipelined is begin g_cells := new t_cells(); if p_source is not null then java_generate_cells ( p_source => p_source ); end if; for i in 1 .. g_cells.count loop pipe row (g_cells(i)); end loop; g_cells := null; return; end get_cells; --============================================================================== procedure private_add_cell ( p_sheet_name in varchar2, p_row_num in number, p_col_num in number, p_cell_value in varchar2 ) is l_row t_cell; begin l_row.sheet_name := p_sheet_name; l_row.row_num := p_row_num; l_row.col_num := p_col_num; l_row.cell_value := p_cell_value; g_cells.extend; g_cells(g_cells.count) := l_row; end private_add_cell; end excelreader; / show err -- -- the parsing code below is based on -- -- https://svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/poi/ss/examples/ToCSV.java -- create or replace java source named "ExcelReader" as import java.lang.*; import java.io.InputStream; import org.apache.poi.hssf.usermodel.HSSFWorkbook; import org.apache.poi.hssf.usermodel.HSSFSheet; import org.apache.poi.hssf.usermodel.HSSFRow; import org.apache.poi.hssf.usermodel.HSSFCell; import org.apache.poi.ss.usermodel.WorkbookFactory; import org.apache.poi.ss.usermodel.Workbook; import org.apache.poi.ss.usermodel.Sheet; import org.apache.poi.ss.usermodel.Row; import org.apache.poi.ss.usermodel.Cell; import org.apache.poi.ss.usermodel.DataFormatter; import org.apache.poi.ss.usermodel.FormulaEvaluator; import org.apache.poi.openxml4j.exceptions.InvalidFormatException; public class ExcelReader { static void addCell ( String pSheetName, int pRowNum, int pColNum, String pVarchar2Value ) throws Exception { #sql { begin excelreader.private_add_cell ( p_sheet_name => :pSheetName, p_row_num => :pRowNum, p_col_num => :pColNum, p_cell_value => :pVarchar2Value ); end; }; } public static void generateCells ( java.sql.Blob pBlob ) throws Exception { InputStream lBlobStream = pBlob.getBinaryStream(); HSSFWorkbook lWorkbook = new HSSFWorkbook(lBlobStream); FormulaEvaluator lEvaluator = lWorkbook.getCreationHelper().createFormulaEvaluator(); DataFormatter lFormatter = new DataFormatter(true); //--------------- HSSFSheet lSheet; int lNumSheets = lWorkbook.getNumberOfSheets(); //--------------- int lLastRowNum; Row lRow; //--------------- int lLastCellNum; Cell lCell; //--------------- String lValue; for (int lSheetNum=0; lSheetNum < lNumSheets; lSheetNum++) { // // process sheet #lSheetNum // lSheet = lWorkbook.getSheetAt(lSheetNum); if (lSheet.getPhysicalNumberOfRows() > 0) { lLastRowNum = lSheet.getLastRowNum(); for (int lRowNum = 0; lRowNum <= lLastRowNum; lRowNum++) { lRow = lSheet.getRow(lRowNum); // // process row #lRowNum // if (lRow != null) { lLastCellNum = lRow.getLastCellNum(); for (int lCellNum = 0; lCellNum <= lLastCellNum; lCellNum++) { // // process cell #lCellNum // lCell = lRow.getCell(lCellNum); if (lCell == null) { lValue = ""; } else if (lCell.getCellType() != Cell.CELL_TYPE_FORMULA) { lValue = lFormatter.formatCellValue(lCell); } else { lValue = lFormatter.formatCellValue(lCell, lEvaluator); } ExcelReader.addCell ( lWorkbook.getSheetName(lSheetNum), lRowNum+1, lCellNum+1, lValue ); } } } } } lBlobStream.close(); } }; /
That code is not very efficient. It calls back from Java to PL/SQL for each cell, but better solutions should not be difficult to implement. Btw, there exists a solution for XLSX only (i.e. no XLS), in pure PL/SQL.
Did you know that the PLSQL-solution you mentioned is used in this plugin http://www.apex-plugin.com/oracle-apex-plugins/process-type-plugin/excel2collections_271.html , which does do xls (and csv),?
Hi Anton, Do you have a plugin that uses a table instead of a Collection?
This would get around the Collection limit… I have 70 columns..
Thank you, Bill
Anton, I think I read that you wrote that plugin, but did not know that you even had a solution for XLS. Impressive, I’ll have a look at this. Can it also evaluate formulas? The current Apex Listener implementation copies the formula into the collection.
java_generate_rows invalid identifier
org.apache.poi.openxml4j.exceptions.InvalidFormatException: Can’t read content types part !
InputStream lBlobStream = pBlob.getBinaryStream();
XSSFWorkbook lWorkbook = new XSSFWorkbook(lBlobStream);
error while executed the query.
i think inputStream having some error
Hi,
as you probably already found out, there was a typo in the package body. It’s java_generate_cells and not java_generate_rows. I fixed that in the blog post – thanks for finding it! Not sure about the java error, though. I just tested my prototype again and could not reproduce the error you got.
the java class inside the package body does not compile and gives warning with status as invalid.
Can you please provide the error and your DB version?
I have compiled this in plsql.. It is compiled but is in invalid state. Db version is oracle 11g. Can you please help !
In order to use #sql in the java code, should we being any setup / jars as I am getting compiler issues when I am seeing the java code alone on a Java IDE (Eclipse)
You just have to install the .jar files mentioned above with Oracle’s loadjava program and then run the script in sqlplus. Please post errors here, I have no idea what is in invalid state in your db, and why.
If you want to deploy POI code in DB, you should check your Java DB version first:
dbms_java.get_ojvm_property(PROPSTRING=>’java.version’)
Default 11g (11.2.0.3.0) version is 1.5. Considering that POI 3.15 and later can’t be used on 11g because it requires commons-collections library developed with Java 1.6 specific code.
POI 3.14 can be loaded but it comes compiled in Java 1.6 so you’ll get compilation error if you try to compile your custome package:
org.apache.poi.hssf.usermodel.HSSFWorkbook
bad class file: null
class file has wrong version 50.0, should be 49.0
You have been warned.