GT.M Update Helper Processes

Technical Bulletin: GT.M Update Helper Processes

May 03, 2006

Revision History
Revision 1.1	03 May 2006
Revision 1.0	06 September 2005

                        GT.M Group

                        Fidelity National Information Services, Inc.

                        2 West Liberty Boulevard, Suite 300

                        Malvern, PA  19355,

                        United States of America

                        GT.M Support: +1 (610) 578-4226

                        Switchboard: +1 (610) 296-8877

                        Fax: +1 (484) 595-5101

                        http://www.fis-gtm.com

                        gtmsupport@fnf.com

Table of Contents

Summary

Detailed Description

MUPIP Commands

DSE Commands

Average Blocks Read per 100 Records
Update Process Reserved Area
Pre read trigger factor
Update writer trigger factor

Typographical Conventions

Return to top

Summary

On the secondary, it is now possible to start "helper" processes to improve the rate at which the transaction stream from the primary can be processed. mupip replicate –receiver -start now accepts an additional qualifier -he[lpers]=[m[,n]], where m is the total number of helper processes and n is the number of reader helper processes. There are additional parameters in the database file header to tune the performance of the update process and its helpers. DSE can be used to modify these parameters. (D9E10-002497)

Return to top

Detailed Description

GT.M replication can be thought of as a pipeline, where a transaction ^[1] is committed at the primary, transported over a TCP connection to the secondary, and committed at the secondary. While there is buffering to handle load spikes, the sustained throughput of the pipeline is limited by the capacity of its narrowest stage. Except when the bottleneck is the first stage, there will be a build up of a backlog of transactions within the pipeline. ^[2] Note also that there is always a bottleneck that limits throughput - if there were no bottleneck, throughput would be infinite.

Since GT.M has no control over the network from the primary to the secondary, it is not discussed here. If the network is the bottleneck, the only solution is to increase its capacity.

Unusual among database engines in not having a daemon, the GT.M database performs best when there are multiple processes accessing the database and cooperating with one another to manage it. When GT.M replication is in use at a logical dual site deployment of an application, the processes at the primary need to execute business logic to compute database updates, whereas the processes at the secondary do not. Thus, if throughput at the primary is limited by the execution of business logic, the primary can be the bottleneck, and there would be no backlog. On the other hand, if the throughput at the primary is limited by the rate at which the database can commit data, it is conceivable that the multiple processes of the primary can outperform a secondary with a solitary update process, thus causing the build-up of a backlog.

To a first approximation, there are two ways that the multiple GT.M processes of a primary that executes business logic can outperform a secondary executing only one GT.M process on identical hardware:

In order to update a database, the database blocks to be updated must first be read from disk, into the operating system buffers and thence into the GT.M global buffer cache. On the primary, the execution of business logic will itself frequently bring the blocks to be updated into the global buffer cache, since the global variables to be updated are likely to be read by the application code before they are updated.
When updating a database, the database blocks and journal generated by one process may well be written to disk by an entirely different process, which better exploits the IO parallelism of most modern operating systems.

For those situations in which the update process on the secondary is a bottleneck, GT.M V5.0-000 implements the concept of helper processes to increase database throughput on the secondary. There can be a maximum of 128 helper processes.

On the secondary, the receive server process communicates with the primary and feeds a stream of update records into the receive pool. The update process reads these update records and applies them to the journal and database files via the journal buffers and global buffer cache in shared memory. Helper processes operate as follows:

Reader helper processes read the update records in the receive pool and attempt to pre-fetch blocks to be updated into the global buffer cache, so that they are available for the update process when it needs them.
Writer helper processes help to exploit the operating system's IO parallelism the way additional GT.M processes do on the primary.

Return to top

MUPIP Commands

The primary interface for managing helper processes is MUPIP.

The command used to start the receiver server, mupip replicate -receiver -start now takes an additional qualifier, -he[lpers][=m[,n]] to start helper processes.

If the qualifier is not used, or if -helpers=0[,n] is specified, no helper processes are started.
If the qualifier is used, but neither m nor n is specified, the default number of helper processes with the default proportion of roles is started. In V5.0-000, the default number of aggregate helper processes is 8, of which 5 are reader helpers.
If the qualifier is used, and m is specified, but n is not specified, m helper processes are started of which floor(5*m/8) processes are reader helpers.
If both m and n are specified, m helper processes are started of which n are reader helpers. If m<n, mupip starts m readers, effectively reducing n to m.

On UNIX/Linux, helper processes are reported (by the ps command, for example) as mupip replicate -updhelper -reader and mupip replicate -updhelper -writer. On OpenVMS, readers have the prefix GTMUHR and writers have the prefix GTMUHW.

Shutting down the receiver server normally, replicate -receiver -shutdown will also shutdown all helper processes. The command mupip replicate -receiver -shutdown -he[lpers] will shut down only the helper processes leaving the receiver server and update process to continue operating.


	Individual helper processes can be shut down with the mupip stop command. Fidelity recommends against this course of action except in the event of some unforseen abnormal event.

mupip replicate -receiver -checkhealth accepts the optional qualifier -he[lpers]. If -he[lpers] is specified, the status of helper processes is displayed in addition to the status of receiver server and update process.

Return to top

DSE Commands

There are a number of parameters in the database file header that control the behavior of helper processes, and which can be tuned for performance. Although it is believed that the performance of the update process with helper processes is not very sensitive to the values of the parameters over a broad range, each operating environment will be different because the helper processes must strike a balance. For example, if the reader processes are not aggressive enough in bringing database blocks into the global buffer cache, this work will be done by the update process, but if the reader processes are too aggressive, then the cache blocks they use for these database blocks may be overwritten by the update process to commit transactions that are earlier in the update stream. ^[3]

The DSE dump -fileheader -u[pdproc] command can be used to get a dump of the file header including these helper process parameters, and the DSE change -fileheader command can be used to modify the values of these parameters, e.g.:

scylla ~/demo 5:54pm 1048: dse dump -fileheader -updproc

File    /xyz/demo/mumps.dat
Region  DEFAULT


File            /xyz/demo/mumps.dat
Region          DEFAULT
Date/Time       20-MAY-2005 17:54:37 [$H = 60040,64477]
  Access method                          BG  Global Buffers                1024
  Reserved Bytes                          0  Block size (in bytes)         4096
  Maximum record size                  4080  Starting VBN                   129
  Maximum key size                      255  Total blocks            0x00000065
  Null subscripts                     NEVER  Free blocks             0x00000062
  Standard Null Collation             FALSE  Free space              0x00006000
  Last Record Backup     0x0000000000000001  Extension Count                100
  Last Database Backup   0x0000000000000001  Number of local maps             1
  Last Bytestream Backup 0x0000000000000001  Lock space              0x00000028
  In critical section            0x00000000  Timers pending                   0
  Cache freeze id                0x00000000  Flush timer            00:00:01:00
  Freeze match                   0x00000000  Flush trigger                  960
  Current transaction    0x0000000000000001  No. of writes/flush              7
  Maximum TN             0xFFFFFFFFDFFFFFFF  Certified for Upgrade to        V5
  Maximum TN Warn        0xFFFFFFFF5FFFFFFF  Desired DB Format               V5
  Master Bitmap Size                     64  Blocks to Upgrade       0x00000000
  Create in progress                  FALSE  Modified cache blocks            0
  Reference count                        11  Wait Disk                        0
  Journal State               [inactive] ON  Journal Before imaging        TRUE
  Journal Allocation                    100  Journal Extension              100
  Journal Buffer Size                   128  Journal Alignsize              128
  Journal AutoSwitchLimit           8388600  Journal Epoch Interval         300
  Journal Yield Limit                     8  Journal Sync IO              FALSE
  Journal File: /xyz/demo/mumps.mjl
  Mutex Hard Spin Count                 128  Mutex Sleep Spin Count         128
  Mutex Spin Sleep Time                2048  KILLs in progress                0
  Replication State                      ON  Region Seqno    0x0000000000000001
  Resync Seqno           0x0000000000000001  Resync trans    0x0000000000000001

  Upd reserved area [% global buffers]   50  Avg blks read per 100 records  200
  Pre read trigger factor [% upd rsrvd]  50  Upd writer trigger [%flshTrgr]  33

Average Blocks Read per 100 Records

The records in the update stream received from the primary describe logical updates to global variables. Each update will involve reading one or more database blocks. Avg blks read per 100 records is an estimate of the number of database blocks that will be read for 100 update records. A good value to use is the average height of the tree on disk for a global variable. In V5.0-000, the default value is 200, which would be a good approximation for a small global variable (one index block plus one data block). For very large databases, the value could be increased up to 400.

The DSE command change -fileheader -avg_blks_read=n sets the value of Avg blks read per 100 Records to n for the current region.

Update Process Reserved Area

When so requested by the update process, reader helpers will read global variables referenced by records from the receive pool. The number of records read from the receive pool will be:

(100-upd_reserved_area)*No_of_global_buffers/avg_blks_read

In other words, this field an approximate percentage (integer value 0 to 100) of the number of global buffers reserved for the update process to use, and the reader helper processes will leave at least this percentage of the global buffers for the update process to use. In V5.0-000, the default value is 50, i.e., 50% global buffers are reserved for update process and up to 50% will be filled by reader helper processes.

The DSE command change -fileheader -upd_reserved_area=n sets the value of Upd reserved area to n for the current region.

Pre read trigger factor

When the reader helpers have read the number of update records from the receive pool, they will suspend their reading. Whenever the update process processes Pre read trigger factor percentage of Upd reserved area, it will signal the reader helper processes to resume processing journal records and reading global variables into the global buffer cache. In V5.0-000, the default value is 50, i.e., when 50% of the upd reserved area global buffers are processed by update process, it triggers the reader helpers to resume, in case they were idle. The number of records read by update process to signal reader helpers to resume reading will be:

upd_reserved_area*pre_read_trigger_factor*No_of_global_buffers/(avg_blks_read*100)

The DSE command change -file_header -pre_read_trigger_factor=n sets the value of Pre read trigger factor to n for the current region.

Update writer trigger factor

One of the parameters used by GT.M to manage the database is the flush trigger. One of several conditions that triggers that causes normal GT.M processes to initiate flushing dirty buffers from the database global buffer cache is when the number of dirty buffers crosses the flush trigger. GT.M processes dynamically tune this value in normal use. In an attempt to never require the update process itself to flush dirty buffers, when the number of dirty global buffers crosses upd writer trigger factor of the flush trigger, writer helper processes start flushing dirty buffers to disk. In V5.0-000, the default value is 33, i.e., 33%.

The DSE command change -file_header -upd_writer_trigger_factor=n sets the value of Upd writer trigger factor to n for the current region.

Return to top

Typographical Conventions

Command Syntax: UNIX syntax (i.e., lowercase text and "-" for flags/qualifiers) is used throughout this document. VMS accepts both lowercase and uppercase text; flags/qualifiers should be preceded with "/".


	The value of /helper must be a quoted string as in /helper="m,n" when both m and n are specified on OpenVMS.

Reference Number: The reference numbers used to track software enhancements and customer support requests appear in parentheses ( ).

Platform Identifier: If a new feature or software enhancement does not apply to all platforms, the relevant platform appears in brackets [ ].

Return to top

^[1]Sans the use of GT.M transaction processing (TStart/TCommit), each individual update can be considered to be a miniature transaction.

^[2]The design of GT.M replication is such that the primary will never slow down, and the backlog at the primary is limited entirely by the disk space available for journal files.

^[3]At least on UNIX/Linux, as long as the system is not constrained in the amount of available RAM, it is probably better to err, if one must, on the side of being aggressive, since the blocks will be in the operating system's unified buffer cache.