Oracle External Tables by Examples part 3 – READSIZE


READSIZE parameter and what is it good for

 

READSIZE parameter is one of the less used and known parameters, if you look at Oracle Documentation you can see that it’s very small 512kb by default (READSIZE parameter in external table DDL is in bytes) and you might be asking yourself whether the load from external table can be faster if you increase it to …  128MB   ouch   I ran few tests on different files with different settings, I would like to share the results here. The exact use would look like something like this

CREATE TABLE ABC
(
  FIELD1        NUMBER(17,8),
  FIELD2        NUMBER(17,8)
)
ORGANIZATION EXTERNAL
  (  TYPE ORACLE_LOADER
     DEFAULT DIRECTORY DIR_USERS_JIRI
     ACCESS PARAMETERS
       ( RECORDS DELIMITED BY NEWLINE
         READSIZE 67108864
         SKIP 0
         LOGFILE DIR_USERS_JIRI_LOG:'FILE_NAME.log'
         BADFILE DIR_USERS_JIRI_BAD:'FILE_NAME.bad'
         NODISCARDFILE
         FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"'
        )
     LOCATION (DIR_USERS_JIRI:'FILE_NAME.TXT')
  )
REJECT LIMIT 0
NOPARALLEL
NOMONITORING;

 

All tests ran on Oracle 10.2.0.4 on Solaris 10, each test was created is separate session.

 

FILE TYPES

FILE 1
comma separated ascii file
LF end-of-line
no header
no footer
total number of rows 20mm
total unix file size 1.3GB
row length approx 60 characters
total number of fields 12
all fields were numeric
all records were loaded (no condition set)
 

FILE 2
pipe separated ascii file
LF end-of-line
no header
no footer
total number of rows 18mm
total unix file size 5.8GB
row length approx 360 characters
total number of fields 133
most fields were numeric, some varchar2
all records were loaded (no condition set)

 

 

LOAD TYPES

 INSERT INTO

data were inserted to a table, the table was emtpy with pre-allocated space (TRUNCATE TABLE xxx REUSE STORAGE), the table had PCTFREE 0, there were no indexes, there were no constraints

a. simple insert – INSERT INTO xxx SELECT * FROM yyy

b. using APPEND hint – INSERT /*+ APPEND */ INTO xxx SELECT * FROM yyy

c. using PARALLEL hint (with alter session enable parallel dml) – - INSERT /*+ APPEND PARALLEL(A, 4, 1) */ INTO xxx a SELECT /*+ parallel (b, 4, 1) */ b.* FROM yyy b

 

CREATE TABLE

a. no parallel – CREATE TABLE xxx PCTFREE 0 AS SELECT * FROM yyy

b. using parallel – CREATE TABLE xxx PCTFREE 0 PARALLEL 4 AS SELECT /*+ parallel (b, 4, 1) */ b.* FROM yyy b

 

 

READSIZE

I used 8 different sizes from 512kb to 128mb

 

 

RESULTS

1. As you can see from results below, changing READSIZE does not significantly improve load speed and I would recommend not to change it unless the records (rows) you load from external tables are extra wide.

2. The results also show that using APPEND hint and parallel (parallel load uses always direct path – there is no need to add append hint in parallel loads) in general can significantly improve the load time. Please note actual number of parallels depends on your server and file system(s). In my case setting PARALLEL 8 had no speed increase in comparison to PARALLEL 4.

3. CREATE TABLE is slightly faster than INSERT and although I see using CREATE TABLE a lot in ETL, I would not recommend it for several reasons:

a. the load time usually does not decrease much using CREATE TABLE
b. with INSERT you work with static table and it allows you to pre-set grants (ETL process ID should be very restricted)
c. with INSERT you work with static table and it you cna turn on specific auditing
c.with INSERT you work with static table and it allows you to set additonal parameters (tablespaces, indexes, constraints, pctfree, …) independently on the load (DBAs can change it anytime)
d. in general I do not like to see any objects disaapear from the system

 

all load times are in seconds
FILE1 – INSERT
READSIZE no hints APPEND hint PARALLEL 4
524288 282 196 52
2097152 284 195 49
4194304 284 198 49
8388608 276 197 49
16777216 269 268 52
33554432 270 194 51
67108864 269 198 55
134217728 269 196 62

 

FILE 1 – CREATE TABLE
READSIZE no  parallel PARALLEL 4
524288 206 50
2097152 200 49
4194304 194 49
8388608 197 50
16777216 194 53
33554432 195 54
67108864 195 58
134217728 195 64

 

FILE 2 – INSERT
READSIZE no hints APPEND hint PARALLEL 4
524288 1717 1421 362
2097152 1741 1434 371
4194304 1744 1445 373
8388608 1769 1402 376
16777216 1763 1725 379
33554432 1772 1406 376
67108864 1756 1411 381
134217728 1762 1421 390

 

FILE 2 – CREATE TABLE
READSIZE no  parallel PARALLEL 4
524288 1383 352
2097152 1419 340
4194304 1399 342
8388608 1402 345
16777216 1417 350
33554432 1386 350
67108864 1414 357
134217728 1389 362

 

I hope these results help you during your next project. Feel free to share your experiences with READSIZE or any other parameters.

About these ads
Tagged , ,

3 thoughts on “Oracle External Tables by Examples part 3 – READSIZE

  1. Anonymous says:

    Hi,
    Try 30 processes with readsize 20MB.
    Cheers.

  2. Juan says:

    Hello. I have an error about the log file. Depending of the READ SIZE parameter, the number of the rows informed aren’t correct. For example, if I set READSIZE parameter to 10000 and I try to load a file with errors, the line informed in the log is 29 but it should be 28, the line informed in the log is 58 but it should be 59 and so.

    Any help?
    Thanks!

  3. jiri says:

    are you skipping header?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: