Oracle External Tables by Examples part 2 – HEADER and FOOTER


reading file with header and footer

I would like to dedicate Part 2 of the series about Oracle External Tables to reading files with header and footer ( as part 1, all these samples were tested on 10.2.0.4 on Sun10).

I love files with header and footer and really try to use it every time I can. This is probably personal preference and some developers like plain file with no header/no footer and separate control file (either ascii or XML). The reason why I like file with header and footer is very simple

  • you transfer only one file – sometimes less is better  :-)
  • it is very easy to see whether you received partial file just by looking at the TAIL of the file (think about your production support team during your development)
  • if the header/tail is done correctly and follows the same standards for all files, you get not only the actual data but also job name, when the job ran, who ran it, … in one single file (there is nothing to loose when you move files from folder to folder)

of course, there are some disadvantages as well …

  • speed (it is faster to load separate control file – see below)
  • you can add structured information to the control file (if you use XML) – just imagine that all your incoming files have uniform XML format control file
  • if you want to use control file(s) for generic statistics (some companies create centralized control files ETL and load all control files to one repository) – parsing footer is always more complex. If you ask why they would do that, then the answer is simple – this gives you excellent view on all your incoming files (sizes, timings, ….) – you can see how things change over time and plan space and network utilization better.

 

in general I try to follow simple formula  LESS CODING = SMALLER ROOM FOR ERROR

 

 

there are 3 solutions for reading files with header and footer using oracle external tables

 

1. use unix level tool and remove footer (and header if you wish)

I don’t like this method because it requires “touching” the file. I prefer to keep the file in a raw state without any manipulations (sometimes this is not possible of course, sometimes unix tools like syncsort perform much faster – it really depends of file structure).

 

2. reject limit

The other solution is simply setting SKIP 1 (for header) and  REJECT LIMIT 1 (for footer) – let oracle reject one row (footer) every time you read the data. I really don’t want to say that I don’t use this method, because I do, but I really try to avoid it. Just the idea of blindly rejecting 1 row every time you read the file does not make me feel good.

 

3. LOAD WHEN

This is my preferred method. You basically use WHERE clause (called LOAD WHEN) and load only records you want to load (skip HEADER and FOOTER or whatever records you want). 

here is small sample …

 

I will base the sample on 3 oracle directories which map to three unix (or windows) directories. I always recommend to separate actual data directory from LOG/BAD directories. This is very good security practice – oracle ID can have READ access to data files and READ/WRITE to LOG/BAD files. All my samples will use schema JIRI

CREATE OR REPLACE DIRECTORY "DIR_USERS_JIRI_DATA" AS '/work/users/jiri/data';
CREATE OR REPLACE DIRECTORY "DIR_USERS_JIRI_LOG" AS '/work/users/jiri/log';
CREATE OR REPLACE DIRECTORY "DIR_USERS_JIRI_BAD" AS '/work/users/jiri/bad';

GRANT READ, WRITE ON DIRECTORY DIR_USERS_JIRI_DATA TO JIRI;
GRANT READ ON DIRECTORY DIR_USERS_JIRI_LOG TO JIRI;
GRANT READ ON DIRECTORY DIR_USERS_JIRI_BAD TO JIRI;

 

let’s assume you have simple ASCII comma separated file with LF (unix) end-of-line character, the file has header row, you can download sample file from here

 

CREATE TABLE JIRI.SAMPLE_02_BODY_EXT
(
   ROW_TYPE        VARCHAR2(1),
   PRODUCT         VARCHAR2(60),
   NUMBER_OF_UNITS NUMBER(10,0),
   TOTAL_PRICE     NUMBER(10,2)
)
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
  DEFAULT DIRECTORY DIR_USERS_JIRI_DATA
  ACCESS PARAMETERS
  ( RECORDS DELIMITED BY '\n'
    LOAD WHEN ( ROW_TYPE = 'B')
    SKIP 0
    LOGFILE DIR_USERS_JIRI_LOG:'sample_02.log'
    BADFILE DIR_USERS_JIRI_BAD:'sample_02.bad'
    NODISCARDFILE
    FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
    MISSING FIELD VALUES ARE NULL
  )
  LOCATION (DIR_USERS_JIRI_DATA:'sample_02.dat')
)
REJECT LIMIT 0
NOPARALLEL
NOMONITORING;
 
CREATE TABLE JIRI.SAMPLE_02_TAIL_EXT
(
   ROW_TYPE         VARCHAR2(1),
   JOB_NM           VARCHAR2(255),
   EXECUTED_BY      VARCHAR2(255),
   JOB_STATED_ON    VARCHAR2(255),
   SOURCE_SERVER_NM VARCHAR2(255)
  
)
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
  DEFAULT DIRECTORY DIR_USERS_JIRI_DATA
  ACCESS PARAMETERS
  ( RECORDS DELIMITED BY '\n'
    LOAD WHEN ( ROW_TYPE = 'T')
    SKIP 0

 

 

This is very clean and easy way how to load data as well as trailing record(s) without using reject and any unix tool to strip the record. 

Please note that I did not have to use SKIP for skipping header row. Also I used excape characters - if you run samples above on unix server, you can use RECORDS DELIMITED BY NEWLINE instead.

 

As I mentioned before, there is a drawback of LOAD WHEN solution. If you have very large file then reading SAMPLE_02_TAIL_EXT might take long time (much longer than using separate control file).

About these ads
Tagged , , , ,

13 thoughts on “Oracle External Tables by Examples part 2 – HEADER and FOOTER

  1. Astrid says:

    Hi Jiri

    Thanks for the nice article.

    Please could you explain in more detail how the LOAD WHEN ( ROW_TYPE = ‘B’)
    logic works.

    What is row_type of ‘B’ as opposed to ‘T’ in the 2nd example.

    Tried researching this online and not found anything else about it.

    Astrid

  2. jiri says:

    ROW_TYPE is a column name in external table, B is a value in my sample file. Very simply say LOAD WHEN ( ROW_TYPE = ‘B’) mean “load record if field ROW_TYPE eq. value B”

    you can use it for any field, for example LOAD WHEN ( LAST_NAME = ‘Smith’ )

    in general (and that is my personal preference) I like when there is easy way to separate header/footer/body of the file. One way is to have first character in each row the file defining what section it is – H/B/T (header/body/tail).

    hope this helps
    jiri

  3. Astrid says:

    OK, that makes sense. Thanks for the prompt response.

  4. jiri says:

    one very small yet important and interesting detail …

    Oracle does not read the records you dont want to load and suppress using LOAD WHEN. It reads the records only till the condition (in my case only first field). This means

    1. loading is usually faster (depends on how many records you suppress)

    2. as you can see in my example the trailing record has different format. Since oracle does not read the full row, it does not throw an error

    LOAD_WHEN is not only great for separating body and tail, but very useful in case when the file is a mix of two formats (common in some mainframe files) – for example CUSTOMER INFO then X rows with corresponding invoices, another CUSTOMER INFO then X rows with corresponding invoices …. some people use unix syncsort to sort the file first, then split it into two files and load them separatelly. If you use LOAD_WHEN you can easily load the file without any unix manipulation.

  5. russsha says:

    thanks for your amazing blog. I was wondering if you could help with this nagging issue. My goal is to load the following data skipping the header rows (thats contains few lines). I could really use your expert opinion in this.

    start=Thu Jun 24 07:55:06 2010
    start_epoch=1277380506
    total_lines=19717
    c=/cmwrite/loader/cmd/ls1277380417-loade1-24168–682994762-19717-19717.cmd
    upload_to=db
    emailreport=xxx.russelbhai.com
    unsubcodes=u
    field=ESA_SOURCE,EAN_NUM,STORE_NUM,CHANNEL,FULL_NAME,FNAME,LNAME,COUNTRY,ADDRESS,ADDRESS_2,CITY,STATE,ZIP,PHONE,EMAIL,SOURCE,CELLID,IGNORE

    ROW_NUMBER|TYPE:MSG|ESA_SOURCE|EAN_NUM|STORE_NUM|CHANNEL|FULL_NAME|FNAME|LNAME|COUNTRY|ADDRESS|ADDRESS_2|CITY|STATE|ZIP|PHONE|EMAIL|SOURCE|CELLID|IGNORE
    1|MAIL:|lmts||258|7||||||||||12345|,nolimit@gu.com|P|123
    2|MAIL:|lmts||287|7||||||||||546252|,nolimit2@gu.com|P|236
    3|MAIL:|lmt||488|7|||||||||||nolimit3@gu.com0|P|585

  6. jiri says:

    I am out of town, I will look at it tomorrow. :-)

    Just completed scuba dive 100

  7. russsha says:

    haha..enjoy.

  8. jiri says:

    I have two questions

    1. Are you using Oracle 11.2 or older version?
    2. Is number of records in the header static? e.g. 10 rows?

  9. russsha says:

    1.yes
    2.no

  10. russsha says:

    I have tried tried including LOAD WHEN ( email != blanks) . However, that added header row of the body to the actual table. Is there a way to load this file without having to load big header and header row on the body ?
    the table would look like
    row_number | type:mssg | email|….
    row_number |type:msg|email..
    123|lmts|nolimit@gu.com|…..

  11. jiri says:

    With 11.2 yes using preprocessor. Will get back to you soon.

  12. jiri says:

    because your header is not static number of rows and because the header/body is not so easy to split using LOAD WHEN I would probably use unix SED command to get rid of the header.

    Since you have 11.2 you don’t need to run anything manually, Oracle will run unix command for you on a fly everytime you query external table – there are no temporary files created and even 20 people can run it at the same time (no problem there).

    1. look at my post about 11.2 preprocessor and maybe try that sample http://jiri.wordpress.com/2010/01/19/whats-new-in-oracle-11g-r2-external-tables-preprocessor/

    2. use this unix script instead the sample script in my post in the 11.2 preprocessor to get rid of the header (assuming your file name which has data is called test.txt

    basically sed will remove everything between words start and |IGNORE


    #!/bin/bash
    /bin/sed '/start/,/|IGNORE/d' test.txt

  13. rey says:

    why dont you use load when column1 = ‘ROW_NUMBER’

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: