帳票開発の諸々 (Oracle BI Publisher Blog)

2013/03/16

MySQLでの小計と合計 (MySQL: Subtotal and Grand Total)

過去の「SQLで小計と合計を取得する (SQL: Subtotal and Grand Total)」では、オラクルDBでの集計を行いました。
たまたまMySQLを使う機会があったため、今回はMySQLを使用して小計と合計を求めます。構文と結果は以下の通りです。

List 1: createData.pl

mysql> select deptno, job, sum(sal) as sum_sal
    ->   from EMP
    ->   group by deptno, job with rollup;
+--------+-----------+---------+
| deptno | job       | sum_sal |
+--------+-----------+---------+
|     10 | CLERK     |    1300 |
|     10 | MANAGER   |    2450 |
|     10 | PRESIDENT |    5000 |
|     10 | NULL      |    8750 |
|     20 | ANALYST   |    6000 |
|     20 | CLERK     |    1100 |
|     20 | MANAGER   |    2975 |
|     20 | NULL      |   10075 |
|     30 | CLERK     |     950 |
|     30 | MANAGER   |    2850 |
|     30 | SALESMAN  |    5600 |
|     30 | NULL      |    9400 |
|   NULL | NULL      |   28225 |
+--------+-----------+---------+
13 rows in set (0.00 sec)

Dev Shed Forumsではcoalesceを使用したサンプルが紹介されています。
http://forums.devshed.com/mysql-help-4/group-by-with-rollup-replace-null-value-580155.html

List 2: Workaround 1

select coalesce( mth , 'GRAND TOTAL' ) as dates,
       sum( salary ) as sum_salary
  from ( select month(date) as mth,
                salary
           from tb1 ) as dt
 group by mth with rollup

stackoverflow では、ifnullを使用したサンプルが紹介されています。この例では複数列の集計を一度に行っています。
http://stackoverflow.com/questions/6822666/how-can-i-replace-null-category-titles-in-mysql-rollup-function

List 3: Workaround 2

SELECT IFNULL(name, 'MainTotal') AS name,
       IFNULL(location, 'SubTotal') AS location,
       SUM(IF(division='OEM',totalHours,NULL)) AS OEM,
       SUM(IF(division='A/M',totalHours,NULL)) AS AM,
       SUM(IF(division='SKF',totalHours,NULL)) AS SKF,
       SUM(IF(division='RE',totalHours,NULL)) AS RE,
       location as location
  from $databasetable
 group by location, name with rollup

なお、明記はされていませんが、これらの回避策は集計キーとなる列にNull値が含まれていない事を前提としていますので注意が必要です。上記のサンプルを使用する際には、「元データにNULLが含まれていない」という前提を確認して下さい。

なお、MySQLには、集計列を判別する為の関数（Oracleのgrouping関数に相当）が提供されていないようです。

参考資料：マニュアル（日本語: MySQL 5.1）
http://dev.mysql.com/doc/refman/5.1/ja/group-by-modifiers.html

[Summary]
Not only Oracle, but MySQL also provides rollup function. List 1, List 2 and List 3 shows the basic usage.
You may need to determine the summarized row, like Oracle "grouping" function. But MySQL does not provide that kind of function.

References:

Manual (English: MySQL 5.6)
http://dev.mysql.com/doc/refman/5.6/en/group-by-modifiers.html

Workaround (Dev Shed Forums)
http://forums.devshed.com/mysql-help-4/group-by-with-rollup-replace-null-value-580155.html

Workaround (stackoverflow)
http://stackoverflow.com/questions/6822666/how-can-i-replace-null-category-titles-in-mysql-rollup-function

2013/01/28

BI PublisherからMySQLに接続する (Connect to MySQL)

WebLogicにはあらかじめMySQL用のJDBCドライバが組み込まれています。

List 1: MySQL JDBC Driver

[oracle@server1 ~ ]$ find $WL_HOME/server/lib/ -name mysql*.jar
/opt/oracle/middleware/wlserver_10.3/server/lib/mysql-connector-java-commercial-5.1.14-bin.jar

パスも通されている為、別段の準備の必要もなく使用できます。これを利用して、Oracle BI PublisherからMySQLへの接続を確認します。

設定は以下の通りです。

Driver Class: com.mysql.Driver
Connection String: jdbc:mysql//{server_name}:{port}/{db_name}

Figure 1: Connection succeeded

問題なく接続できます。

[Summary]
MySQL JDBC driver is registered with WebLogic library by default, and is ready to connect always.
List 1 shows the connection settings.

2013/01/07

AWKによる集計性能その２ (AWK performance, Part 2)

前回に続き、AWKの性能検証を続けます。
今回はレコード（行）のサイズを256バイトに減らして計測を行います。

【検証環境】
前回と同様です。

【事前準備】
データ生成のソースは以下の通りです。
前回からの変更は、固定文字 "x" の数を減らしてサイズを調整する点のみです。

List 4: createData.pl

#!/usr/bin/perl

use strict;
use warnings;

foreach my $i ( 1 .. 1000000 ){
    print sprintf( "%010d,",     $i);                   # row number
    print "x" x 211 . ",";                              # fixed text (dummy)
    print sprintf( "id%02d,",  int( rand(100) ));       # Key-1: eg) country id
    print sprintf( "id%05d,",  int( rand(1000) ));      # Key-2: eg) branch id
    print sprintf( "id%010d,", int( rand(100000000) )); # Key-3: eg) customer id
    print sprintf( "%7d\n",    int( rand(1000000) ));   # value
}

生成結果は、1000万行で約2.4GBとなります。今回は4000万行、および1億行まで増やして検証します。

【検証結果】
各サイズのファイルに対し、検証を5回行った平均は以下の通りです。

Table 2: Result

File size	Records (rows)	Elapsed time (average, mm:ss)
2.4GB	10,000,000	1:25
9.6GB	40,000,000	5:39
24GB	100,000,000	14:22

Figure 2: Result (256 bytes record)

1億行まで、単位時間当たりの性能劣化はほとんどありません。

【多重実行】
続いて、多重実行の検証を行います。スクリプト例は以下の通りです。

List 5: parallel.sh

#!/usr/bin/bash
cat sampledata_a.txt | awk -F "," '{ s[$3 $4] += $6; c[$3 $4] += 1} END { for( x in s) print x,s[x],c[x] }' | sort > result.sub.a &
cat sampledata_b.txt | awk -F "," '{ s[$3 $4] += $6; c[$3 $4] += 1} END { for( x in s) print x,s[x],c[x] }' | sort > result.sub.b &
cat sampledata_c.txt | awk -F "," '{ s[$3 $4] += $6; c[$3 $4] += 1} END { for( x in s) print x,s[x],c[x] }' | sort > result.sub.c &
cat sampledata_d.txt | awk -F "," '{ s[$3 $4] += $6; c[$3 $4] += 1} END { for( x in s) print x,s[x],c[x] }' | sort > result.sub.d &
cat sampledata_e.txt | awk -F "," '{ s[$3 $4] += $6; c[$3 $4] += 1} END { for( x in s) print x,s[x],c[x] }' | sort > result.sub.e &
cat sampledata_f.txt | awk -F "," '{ s[$3 $4] += $6; c[$3 $4] += 1} END { for( x in s) print x,s[x],c[x] }' | sort > result.sub.f &
cat sampledata_g.txt | awk -F "," '{ s[$3 $4] += $6; c[$3 $4] += 1} END { for( x in s) print x,s[x],c[x] }' | sort > result.sub.g &
cat sampledata_h.txt | awk -F "," '{ s[$3 $4] += $6; c[$3 $4] += 1} END { for( x in s) print x,s[x],c[x] }' | sort > result.sub.h &

wait

cat result.sub.[a-h] > result.txt

データは先ほどの256バイト長の2.4GBのテキストファイル（1ファイルあたり1千万行）を使用します。実行結果は以下の通りです。

Table 3: Result

Parallelism	Records (rows)	Elapsed time (average, mm:ss)
4	10,000,000 * 4 = 40 Million	1:27
8	10,000,000 * 8 = 80 Million	1:30
12	10,000,000 * 12 = 120 Million	1:36
16	10,000,000 * 16 = 160 Million	2:08

Figure 3: Result (Parallel Processing)

マシンの物理コア数（12コア）まではほぼリニアに性能が向上しています。概ね良好な結果が得られました。
16多重では処理時間が大幅に増加していますが、単位時間当たりの処理性能（処理行数）はほぼ変わっていません。

1億2千万件（12多重）の集計に1分30秒程度という処理性能は非常に魅力的であると言えます。

[Summary]
Again, AWK performance test. The record size is reduced to 256 bytes.

[Hardware]
Same as the previous post.

[Source code]
Please see List 4.

[Result]
Table 2 and Figure 2 show the number of target records and the average of process time.

[Parallel Processing]
List 5, the script performs parallel processing. Its results are shown in Table 3 and Figure 3.
120 million records are summarized in one and half minute (1:36). Quite nice.

2012/12/07

AWKによる集計性能 (AWK performance)

テキストファイルの集計はBigDataで、という世の中ですが、敢えて AWK での集計性能を計測しました。

【検証環境】
CPU： Xeon X5675 (6コア) x 2 (3.06 GHz)
メモリ： 96GB
ディスク： 300GB 10000RPM SAS x 4 (RAID 5)

【事前準備】
1行1024バイトのテキストデータをランダムに生成します。
100万行で約1GB、1000万行で約10GBとなります。

生成のソースは以下の通りです。

List 1: createData.pl

#!/usr/bin/perl

use strict;
use warnings;

foreach my $i ( 1 .. 1000000 ){
    print sprintf( "%010d,",     $i);                   # row number
    print "x" x 979 . ",";                              # fixed text (dummy)
    print sprintf( "id%02d,",  int( rand(100) ));       # Key-1: eg) country id
    print sprintf( "id%05d,",  int( rand(1000) ));      # Key-2: eg) branch id
    print sprintf( "id%010d,", int( rand(100000000) )); # Key-3: eg) customer id
    print sprintf( "%7d\n",    int( rand(1000000) ));   # value
}

データの内容のイメージは以下の通りです。

List 2: Data

$ perl createData.pl > sampledata.txt
$ tail -1 sampledata.txt 0001000000,xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,id56,id00603,id0060743162, 211271

このデータに対し、以下のコマンドで集計をかけます。
List 3: command

cat sampledata_1.txt | awk -F "," '{ s[$3 $4] += $6; c[$3 $4] += 1} END { for( x in s) print x,s[x],c[x] }' | sort > result.txt

【検証結果】
各サイズのファイルに対し、検証を5回行った平均は以下の通りです。

Table 1: Result

File size	Records (rows)	Elapsed time (average, mm:ss)
1GB	1,000,000	0:31
10GB	10,000,000	4:49
20GB	20,000,000	9:20

H/Wスペックが比較的高いサーバで実施した事もあり、20GBまで性能の劣化はほとんどありません。

Figure 1: Result

集計対象となるログファイルがさらに巨大である場合はHadoop等のBigData系技術の採用が有用ですが、数十GB程度のテキストであれば、AWKによる処理も検討できます。
AWKによる処理ではデータロード等の前処理の必要がないことも考慮すると、優位性はさらに高まると考えられます。

なお、多くの場合において、全ての処理・ビジネスロジックをAWKで実装する必要はありません。あくまでも「高速な前処理・中間集計」として割り切って使用する事がシステム全体の最適化につながります。

[Summary]
It may sounds old fashioned, but AWK is still good solution when you summarise text file.
In this post, I list the result of simple performance test.

[Hardware]
CPU： Xeon X5675 (6 cores) x 2 (3.06 GHz)
Memory: 96GB
Disk: 300GB 10000RPM SAS x 4 (RAID 5)

[Source code]
List 1 produces the sample data. The size of each text record is 1024 bytes. List 2 shows the data.
List 3 is the main part. It summarize the total and count by two key colums (3rd and 4th column).

[Result]
The test cases were performed 5 times each. Table 1 and Figure 1 show the target file size and the average of process time.
When you handle mid-size text files, you should consider about AWK first. It may provide you good cost effectiveness.
In most of the cases, you should NOT try to process whole business logic with AWK. Use AWK simply to get preliminary results, and to reduce the file size.

2012/11/16

HULFT新機能：ファイルトリガー

久しぶりにHULFTに出会いました。
帳票やデータファイルは「出力して終わり」と言う事は少なく、多くの場合、配信処理とセットで考える必要があります。
今回はHULFT 7.2から新機能として提供された「ファイルトリガ機能」を紹介します。

ファイルトリガは、「ファイルの状態（新規作成、変更、削除）を監視し、変化を検知したタイミングで任意のジョブを呼び出す」という地味な機能です。
概要については製品紹介のページを参照下さい（http://www.hulft.com/software/hulft_7/function.html）。
7回もバージョンアップするまでこの機能が実装されていなかったという事が個人的にはよろしくないとは思いますが、それはさておき、この機能がHULFT内に実装されたことで、ファイル連携周辺の運用設計をシンプルにすることができます。

最も簡単な例を以下に示します。
この設定では、ファイルトリガTRG_001がファイル /hulft/outbox/sample.txt を10秒間隔で監視し、ファイルが新規作成または更新された場合にジョブ JOB_001 を呼び出します。ジョブは配信設定 FILE_001 を指定してsample.txtを配信します。

Figure1: 配信設定

Figure2: ジョブ設定

Figure3: ファイルトリガ設定

※集信設定は割愛。

この様に、「アプリがファイルさえ出力してしまえば後はHULFT任せ」というシンプルな構成を取ることができます。
監視ツールでバッチの成否判定を一元管理するような場合は、HULFTの履歴ログ（utllistのSTATUS列）を監視する事で対応可能です。
ESB導入等の基盤要件が規定されていない場合には、検討の価値のある実装と言えます。

[Summary]
HULFT is one of the most popular file transfer middleware in Japan (Unfortunately, not so popular outside Japan). So, this post is for the engineers working in Japan. No English translation of this post.

2012/10/26

Hadoop + HiveからExcelへの帳票出力 (How to fetch data from Hadoop via Hive)

今回はHadoopおよびHiveを用いて抽出したデータを、Apache POIを使用してExcelシート上に出力します。

HiveはHiveQLというSQLに近い言語で開発が可能であり、JDBCドライバも提供されています。
過去の記事（Apache POI によるエクセルファイルの出力その1）のソースコードを流用することで、極めて簡単にHadoopとExcelを連携させることができます。

Figure 1: Overall image

【バージョン】
HadoopおよびHiveはそれぞれ現時点で入手可能な最新の安定版を使用します。

Hadoop: Release 1.0.4 (12 October, 2012)
Hive: Release 0.9.0 (30 April, 2012)

※Hive 0.9.0から、Hadoop1.xでの動作をサポートした様です。

また、HiveのJDBCドライバ（${HIVE_HOME}/lib/hive-jdbc-0.9.0.jar）がHadoop 1.xのAPIと不整合を起こしている様ですので、Hadoop 0.1xのhadoop-core-0.19.1.jarも併せて使用します。

【前準備】
サンプルプログラムのプロジェクトに、$HIVE_HOME/lib 配下のjarファイル、およびhadoop-core-0.19.1.jarを参照設定します。
※HiveのJDBCドライバ（hive-jdbc-0.9.0.jar）の依存するjarを特定していない為、今回は全てのjarを参照する事としています。

【実装】
ソースコードの変更は、DBコネクション生成に関わる部分のみです。

List 1: Main logic

package util;

import java.io.*;
import java.util.Iterator;
import java.sql.*;
import org.apache.poi.xssf.usermodel.*;
import org.apache.poi.ss.usermodel.*;


public class Ora2Excel {

  // db objects
  private Connection con;
  private Statement  stm;
  
  // poi objects
  private Workbook wbk;
  
  public Ora2Excel() throws SQLException {
    DriverManager.registerDriver(new org.apache.hadoop.hive.jdbc.HiveDriver());
  }

  public void openDb(String userId, String password, String connString) throws SQLException {
    con = DriverManager.getConnection(connString, userId, password);
    stm = con.createStatement();
  }
  
  public void closeDb() throws SQLException {
    stm.close();
    con.close();
  }
  
  public void openBook(String fileName) throws IOException {
    wbk = new XSSFWorkbook(fileName);
  }
  
  public void saveBook(String fileName) throws IOException {
    FileOutputStream out = new FileOutputStream(fileName);
    wbk.write(out);
    out.close();
  }
  
  public void closeBook() {
    wbk = null;
  }
  
  public void extract(String sheetName, String sql) throws SQLException {
    Sheet wsh = wbk.getSheet(sheetName);
    ResultSet rst = stm.executeQuery(sql);
    int colCount = rst.getMetaData().getColumnCount();
    
    // determine the start position: search "$start"
    int rowStart = 0;
    int colStart = 0;
    Iterator<Row> iRow = wsh.rowIterator();
    while (iRow.hasNext() && rowStart + colStart == 0) {
      Row currentRow = (Row) iRow.next();
      Iterator<Cell> iCol = currentRow.cellIterator();
      while (iCol.hasNext()) {
        Cell currentCell = (Cell) iCol.next();
        if (currentCell.getCellType() == Cell.CELL_TYPE_STRING  && currentCell.getStringCellValue().trim().equalsIgnoreCase("$start")) {
          rowStart = currentCell.getRowIndex();
          colStart = currentCell.getColumnIndex();
          break;
        }
      }
    }
    
    // get "template row"
    Row templateRow = wsh.getRow(rowStart);
    
    // set cell values
    int idxRow = rowStart;
    while (rst.next()) {
      wsh.shiftRows(idxRow, wsh.getLastRowNum()+1, 1);
      
      Row r = wsh.createRow(idxRow);
      for (int idxCol = templateRow.getFirstCellNum(); idxCol < templateRow.getLastCellNum(); idxCol++) {
        Cell c = r.createCell(idxCol);
        
        if (idxCol >= colStart && idxCol - colStart < colCount) {
          int idxDbCol = idxCol-colStart + 1;
          switch(rst.getMetaData().getColumnType(idxDbCol)){
          case Types.NUMERIC:
            c.setCellValue(rst.getDouble(idxDbCol));
            break;
          case Types.DATE:
              c.setCellValue(rst.getDate(idxDbCol));
            break;
          case Types.TIMESTAMP:
              c.setCellValue(rst.getDate(idxDbCol));
            break;
          default:
            c.setCellValue(rst.getString(idxDbCol));
          }
        } else if (templateRow.getCell(idxCol).getCellType() == Cell.CELL_TYPE_FORMULA){
          c.setCellFormula(templateRow.getCell(idxCol).getCellFormula());
        } else if (templateRow.getCell(idxCol).getCellType() == Cell.CELL_TYPE_NUMERIC){
          c.setCellValue(templateRow.getCell(idxCol).getNumericCellValue());
        } else if (templateRow.getCell(idxCol).getCellType() == Cell.CELL_TYPE_STRING){
          c.setCellValue(templateRow.getCell(idxCol).getStringCellValue());
        } 
        c.setCellStyle(templateRow.getCell(idxCol).getCellStyle());
        
      }
      idxRow++;
    }
    rst.close();
    
    // remove the template row.
    wsh.removeRow(templateRow);
    wsh.shiftRows(idxRow,  wsh.getLastRowNum()+1, -1);
    
    // calculate formula cells
    XSSFFormulaEvaluator.evaluateAllFormulaCells((XSSFWorkbook) wbk);
  }
}

【テスト用ソース（呼び出し部）】

上記クラスを呼び出す側の例を以下に示します。

List 2: Calling Main Logic

import util.Ora2Excel;

public class Test1 {

  public static void main(String[] args) throws Exception {
    Ora2Excel o2x = new Ora2Excel();
    
    o2x.openDb("", "", "jdbc:hive://HiveServer:10000/default");
    
    o2x.openBook("c:\\template.xlsx");
    o2x.extract("Sheet1", "select deptno, empno, ename, job, mgr, hiredate, sal, comm from emp order by deptno, empno");
    o2x.saveBook("c:\\result.xlsx");
    o2x.closeBook();
    
    o2x.closeDb();
  }

HiveのJDBCドライバはまだ開発途上ではありますが、Hadoopから帳票を出力する際には極めて効率的なツールであると言えます。

参考URL:
https://cwiki.apache.org/Hive/hiveclient.html

[Summary]
Hive provides JDBC driver. List 1 shows how to get data from Hadoop via Hive.

The version of Hadoop and Hive are:

Hadoop: Release 1.0.4 (12 October, 2012)
Hive: Release 0.9.0 (30 April, 2012)

${HIVE_HOME}/lib/hive-jdbc-0.9.0.jar needs other jar files under ${HIVE_HOME}/lib.
Since it needs org/apache/hadoop/io/Writable that is not included in Hive 0.9.0, this time I bring hadoop-core-0.19.1.jar from another project.
To run the source code, you have to add these jar files above as External JAR.

Please see Export to Excel file via Apache POI, Part 1 for your reference.

Also, https://cwiki.apache.org/Hive/hiveclient.html is good reference.

2012/10/11

update文と副問い合わせ (UPDATE with sub query)

見逃されがちなUPDATE文の構文について説明します。

以下のSQLは、同一の表 sales に副問合せで3回アクセスしています。

List 1:

update customers cst
   set last_purchase_date = (select max(time_id) from sales where cust_id = cst.cust_id),
       total_amount       = (select sum(amount_sold) from sales where cust_id = cst.cust_id),
       annual_amount      = (select sum(case when time_id > to_date('20000101','yyyymmdd') then amount_sold else 0 end) from sales where cust_id = cst.cust_id)
 where cust_id = 2


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0        389          2           1
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.00       0.00          0        389          2           1


Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  UPDATE  CUSTOMERS (cr=389 pr=0 pw=0 time=2019 us)
         1          1          1   INDEX UNIQUE SCAN CUSTOMERS_PK (cr=2 pr=0 pw=0 time=20 us cost=1 size=40 card=1)(object id 76070)
         1          1          1   SORT AGGREGATE (cr=129 pr=0 pw=0 time=854 us)
       176        176        176    TABLE ACCESS BY GLOBAL INDEX ROWID SALES PARTITION: ROW LOCATION ROW LOCATION (cr=129 pr=0 pw=0 time=829 us cost=90 size=1690 card=130)
       176        176        176     INDEX RANGE SCAN SALES_CUST_IX (cr=3 pr=0 pw=0 time=21 us cost=3 size=0 card=130)(object id 76319)
         1          1          1   SORT AGGREGATE (cr=129 pr=0 pw=0 time=421 us)
       176        176        176    TABLE ACCESS BY GLOBAL INDEX ROWID SALES PARTITION: ROW LOCATION ROW LOCATION (cr=129 pr=0 pw=0 time=282 us cost=90 size=1300 card=130)
       176        176        176     INDEX RANGE SCAN SALES_CUST_IX (cr=3 pr=0 pw=0 time=10 us cost=3 size=0 card=130)(object id 76319)
         1          1          1   SORT AGGREGATE (cr=129 pr=0 pw=0 time=567 us)
       176        176        176    TABLE ACCESS BY GLOBAL INDEX ROWID SALES PARTITION: ROW LOCATION ROW LOCATION (cr=129 pr=0 pw=0 time=550 us cost=90 size=1300 card=130)
       176        176        176     INDEX RANGE SCAN SALES_CUST_IX (cr=3 pr=0 pw=0 time=192 us cost=3 size=0 card=130)(object id 76319)

この様なupdate文については、以下の構文への変換が効果的です。

List 2:

update customers cst
   set (last_purchase_date, total_amount, annual_amount)
       =
       (select max(time_id),
               sum(amount_sold),
               sum(case when time_id > to_date('20000101','yyyymmdd') then amount_sold else 0 end)
          from sales
         where cust_id = cst.cust_id)
 where cust_id = 2


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0        131          2           1
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.00       0.00          0        131          2           1



Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  UPDATE  CUSTOMERS (cr=131 pr=0 pw=0 time=672 us)
         1          1          1   INDEX UNIQUE SCAN CUSTOMERS_PK (cr=2 pr=0 pw=0 time=11 us cost=1 size=40 card=1)(object id 76070)
         1          1          1   SORT AGGREGATE (cr=129 pr=0 pw=0 time=604 us)
       176        176        176    TABLE ACCESS BY GLOBAL INDEX ROWID SALES PARTITION: ROW LOCATION ROW LOCATION (cr=129 pr=0 pw=0 time=725 us cost=90 size=2340 card=130)
       176        176        176     INDEX RANGE SCAN SALES_CUST_IX (cr=3 pr=0 pw=0 time=12 us cost=3 size=0 card=130)(object id 76319)

実行計画からも、処理が改善された事が分かります。
sales 表への読み込みが3回→1回に減ったことで、論理読み込みも低減しています。

最近の潮流ではExadata等、足回り（H/W）の強化でボトルネックを解消する事が多くなっていますが、ミッションクリティカルなシステムでは、この様に処理そのものの改善で対応する事が先決です。

[summary]
List 1 shows bad UPDATE statement. It accesses sales table three times.
To reduce consistent read access, it should be written like List 2.

登録: 投稿 (Atom)