Implementing checkpointing in long-running updates

Occasionally we need to run updates or inserts of large numbers of records, something which has this nasty habit of taking out locks that will prevent other sessions from updating or inserting in the same tables. Also, if we terminate such an operation, we’ll have to start all over again from the beginning, wasting a lot of time.

Luckily, with Read Committed Snapshot Isolation which is supported by AX, readers will not be blocked by writers, but writes will still be blocked by other writes. Once SQL Server escalates the row-level locks to table-level locks things start to go downhill rather quickly, and as we’re pretty much guaranteed not to have an outage window in which to run the update operation, someone is going to be on the phone soon. Quite soon…

The solution is to break the operation into smaller transactions.

Introducing checkpointing

Checkpointing is a process by which we can establish a point from which we can resume a long running job, without losing any of the work done prior to the checkpoint.

This technique can be used only if it is not important that the entire set of changes be handled as a single transaction, for example a mass update to initialise the value of a new field, when that field is not yet being used.

What we’ll need is to break the update into smaller transactions that can be committed individually. Each transaction will be small enough that we don’t take out too many or too broad locks before committing, and small enough that the transaction tracking overhead in SQL Server doesn’t cause any problems. (For massive transactions, that overhead can become significant) The transactions still need to be large enough that we don’t incur additional overhead from starting and committing new transactions too often. In the example job below, I’ll use 500 records as a chunk size.

static void EB_InitWorkBasketPaymentDate( Args _args )
{
  EB_TransactionTable trans;
  int i;

  ttsBegin;

  // Include a where-clause to exclude any records we have already updated.
  // This makes the process resumable.
  while select forUpdate trans
         where trans.FieldToUpdate == ""
  {
      // Perform the field updates needed.
      // I call a calculation method here, but use whatever way you prefer.
      trans.FieldToUpdate = trans.calcFieldToUpdate();

      // This test is optional, of course, but it is beneficial to only call update()
      // when the value has actually changed.  It speeds up the operation
      // as it results in fewer database calls.  It also causes fewer locks.
      if( trans.FieldToUpdate != trans.orig().FieldToUpdate)
      {
        trans.doUpdate(); // Or update(), depending on your requirements

        i++; // Increase the counter
        if( ( i mod 500 ) == 0 ) // Check if we have an even 500 updates
        {
          // This may look counter-intuitive, as ttscommit comes before ttsbegin but
          // we're committing the current transaction and then starting a new one.
          ttsCommit;
          ttsBegin;
          // If you're not sure how it works, step through it in the debugger
          // while monitoring the results in the database with SSMS or similar
        }
      }
    }
  }
  // Final commit.
  ttsCommit;
}

The trick is in the ttscommit/ttsbegin pair in the middle of that code.

If the job is terminated part way through, all the work done up until the last ttscommit will have been committed to the database, and will not have to be done again. In the example, I have included a where-clause to exclude any records that have already been updated.

No XPO this time, as you should be able to just copy/paste and adapt the example to your needs.