Improve Performance of Query with Date Conditions Using Custom Indexed.

Have you ever met the error “APEX CPU time limit exceeded”, “Query Timeout” when querying just 1000 records? If your data is only a few thousand, you will not experiences the slowness. This is a common issue when your object has a massive set of data hundreds of thousands or millions. To improve the query performance in Salesforce, we usually use a selective SOQL with indexed fields / Custom indexed Fields. There are many situations and strategies to apply these indexed fields. I only discuss the use of custom indexed (External ID) to replace Date in query conditions.

Let start with creating a Performance Test Object with 1 Date Field and 10 Text Fields:

To demonstrate how the custom index help to improve the query performance, let’s generate 1 million records with randoms texts and Date.

//A batch to create 1000 records for each Start Date with random texts. 
public class GeneratePerformanceRecordBatch implements Database.Batchable<Date> {
    
    list<Date> processDate;
    public GeneratePerformanceRecordBatch(List<Date> lstDate){
        processDate = lstDate;
    }
    
    public Iterable<Date> start(Database.BatchableContext BC){
        return processDate;
    }
    
    public void execute (Database.BatchableContext BC, List<Date> scope){
        List<Performance_Test__c> lst1000 = new List<Performance_Test__c>();
        for(Date d : Scope){
            For(integer i = 0; i < 1000; i++){
                Performance_Test__c pt = new Performance_Test__c();
                pt.Start_Date__c = d;
                pt.Test_1__c = RandomString();
                pt.Test_2__c = RandomString();
                pt.Test_3__c = RandomString();
                pt.Test_4__c = RandomString();
                pt.Test_5__c = RandomString();
                pt.Test_6__c = RandomString();
                pt.Test_7__c = RandomString();
                pt.Test_8__c = RandomString();
                pt.Test_9__c = RandomString();
                pt.Test_10__c = RandomString();
                lst1000.add(pt);
            }
        }
        
        insert lst1000;
    }
    
    public void finish (Database.BatchableContext BC){
        
    }
    
    private string RandomString(){
        integer listLength = Constants.RanromStrings.Size() - 1;
        Integer randomNumber = Integer.valueOf(Math.random() * listLength);
        return Constants.RanromStrings[randomNumber];
    }
    
}

Once we have 1,000,000 records, let’s try to measure the performance by querying data for one date.

Decimal sTime = System.Now().GetTime();
Date d = Date.newInstance(2021, 07, 22);
List<Performance_Test__c> pts = [select Id, Test_1__c, Test_2__c, Test_3__c, Test_4__c, Start_Date__c from Performance_Test__c where Start_Date__c = :d];
Decimal fTime = System.Now().GetTime();

System.debug('Time To Query : ' +  (fTime - sTime) / 1000);

The result is around 1.6 seconds for 1000 records.

Then we generated 1 million records more; the result is around 3.5 seconds. So with the query alone took 3.5 seconds. If we have more logic in the business layer, It will reach the 10 seconds limit quickly.

What happens after we populate custom index for Start_Date__c field? Salesforce does not allow to use Date as external Id; however, we can use a text field or a number field. We can create either Number/Text field with format yyyyMMdd. I use the number because It easy to compare. For example, 20200723 (23/07/2020) is less than 20200823 (23/08/2020).

We can populate the index fields using a simple Batch process which can process up to 50 million records. We can keep the indexed field by creating Insert/update trigger to keep the indexed field up to date.

//Sample Batch Code to Generate a Number field as Date index with format yyyyMMdd
public class GenerateIndexBatch implements Database.Batchable<SObject> {
    
    string query;
    public GenerateIndexBatch(){
    }
    
    public Database.QueryLocator start(Database.BatchableContext BC){
        query = 'select id,Start_Date__c from Performance_Test__c';
        return Database.getQueryLocator(query);
    }
    
    public void execute (Database.BatchableContext BC, List<Performance_Test__c> scope){
        List<Performance_Test__c> updateIndex = new List<Performance_Test__c>();
        for(Performance_Test__c pt : Scope){
            DateTime dt = DateTime.newInstanceGmt(pt.Start_Date__c, Time.newInstance(12	, 0, 0, 0));
            Performance_Test__c p = new Performance_Test__c();
            p.id = pt.id;
            p.Date_Index__c = Decimal.valueOf(dt.formatGmt('yyyyMMdd'));
            updateIndex.add(p);
        }
        
        update updateIndex;
    }
    
    public void finish (Database.BatchableContext BC){
        
    }

}

It will take for a while to run through 2 million records.

What is the result when we change to query using the indexed field?

//Rather than using Start_Date__c =, we use Date_Index__c = a number now.
Decimal sTime = System.Now().GetTime();
List<Performance_Test__c> pts = [select Id, Test_1__c, Test_2__c, Test_3__c, Test_4__c, Start_Date__c from Performance_Test__c where Date_Index__c = 20200723];
Decimal fTime = System.Now().GetTime();
System.debug('Time To Query : ' +  (fTime - sTime) / 1000);

Compare to:

Let try another query with in operation.

We can see a dramatically improvement in term of Time of Query.

No Comments Yet.

Leave a comment