The Art of CRM Batch Jobs

The more CRM projects I do, the more fan I become of the scheduled batch jobs. They are not a silver bullet solution, but in many scenarios they have many benefits over more traditional code customization for CRM, like form scripting, plug-ins and workflows. Of course, the different ways of customizing a CRM or Dynamics 365 solution differs a lot on what you want to accomplish, and I see all of the above as important tools in any CRM dev’s tool chest.

You will also often run into cases where more than one of your tools will be valid solutions to your problems, and that is when you should stop, and consider the pros and cons of each tool available to you, rather than just selecting the hammer time after time. In situations like this, I lately find myself going for a batch job approach more and more often. Why that is I will try to explain in this post, but the key factors for me (if implemented right) are:

  • Scalability
  • Performance
  • Robustness

The typical batch job scenario

Unlike plug-ins, batch jobs are typically used for data processing that are not event driven, involving a large amount of records to be processed in a short amount of time. These are typically integration scenarios, or recalculation jobs. Most often, the data will be transactional.

Another typical scenario is that some logic should be executed on a record on a certain date, or when a certain condition is met, without any changes being made on that record at that time. For example, creating a reminder task for you to remind you of each contact you have who’s birthday it is today.

Plugins/workflows vs. Batch Jobs

There are a couple of caveats when working with plug-ins and workflows in CRM. Both types of customizations are built to be able to add some processing logic to save events on single records in CRM. They are not meant to be used for long running calculations or integrations, affecting many records in the CRM database. If you try to do this, you may run into the following challenges:

Both plug-ins and workflows have a 2 min timeout

Now, 2 min should be more than enough for anyone, but just think about it. You might have a business critical application running a plugin that for some reason runs very slow, and suddenly it just stops! Well written plug-ins or workflows shouldn’t run more than a couple of seconds tops, and it you get longer execution times than that, you should consider if you made the right approach.

Plug-ins and workflows execute in transaction (DB-locking)

This means that if something goes wrong, the state of the affected records gets rolled back which is nice. However it also means that the SQL server will ensure that whatever data used as input for your logic, remains in the state you retrieved it, until the transaction is saved. This means that whichever records you retrieve within the plug-in/workflow, will be locked for read and write until the transaction is done. If you have plugins waiting for external resources (web service replies) running on multiple records, you very quickly get into deadlock scenarios which cause very bad performance.

NB: The read-lock can be bypassed, if you add the no-lock=’true’ hint to your fetchXml queries or QueryExpressions. This way you can read locked data, well knowing that the state of the data you read might be rolled back if a CRM transaction should fail. No-lock is not supported in Linq to CRM queries…

Waiting workflows as a scheduler

By using waiting workflows, it is possible to run some logic on a record at a specific date, or when a condition is met. However, in a transactional setting, waiting workflows are really bad as they fill up the asyncOperationBase table. CRM is not particularly efficient at handling a huge amount of unfinished workflows, and so this is something you should try to avoid.

Transactions in Batch jobs

Ever since CRM 2015, it has been possible to execute multiple request against CRM in a single transaction, using the ExcecuteTransactionRequest. This is a powerful tool for making robust batch jobs. The biggest advantage is that you can retrieve all the data you want without locking any rows in the database, and still get the roll-back functionality when storing the changes in a very quick operation (short locking).

Just remember that if you really need to lock the data you fetch, until the transaction is finished, you need to use a plug-in or workflow. The CRM SDK does not offer any mean of making DB locks on reads.

Also, the ExcecuteTransactionRequest allows you to stack multiple requests together in one request, but does not enable you to insert logic on the result of on request, before moving to the next. This is typically a drawback when creating linked records, and you need the id of the first created record before you can link nr. 2 to it. The workaround is to assign the id’s in code, before calling create. If you want to do this, you should generate sequential guids using this guide. Otherwise, CRM will not be able to sort the records in the created order without an explicit ordering applied. Be aware that the generation of the sequential guids use the servers MAC address, and since the CRM’s internal guids are created on the SQL server, the guids you generate will in most cases not be in sequence with records created manually in CRM.

Best practices for developing batch jobs

Re-use your batch logic for user requests

I prefer to use batch jobs to process data in bulk on a schedule, but at the same time allow a user to trigger the same logic on individual records manually. This can help leverage the disadvantage of scheduled jobs, that data is updated with a significate delay (ie. Hourly or nightly). I typically do this by implementing the core of the processing logic in a workflow assembly, that is then referenced from the batch job application. The logic that is being run on an individual record, can then be exposed to a CRM user as a synchronous ad-hoc workflow or a custom action. Both are implemented in practically the same way, and can be run from a command bar (ribbon) button. However, if you want to make a button to trigger the logic, I prefer to use a custom action, as you would not have to hard code the workflow id in JavaScript.

Cache non-transactional data

Caching is really efficient in batch jobs, because by nature of the job, many of the records being processed by the job will require the same input information. Therefore, caching should be applied for all base and configuration data used by the job. I use MemoryCache.Default in .Net, and for large configuration tables I cache individual entries, while for smaller tables (less than 200 records?), I often cache the entire dataset. Just make sure not to cache data that changes on a regular basis.

Transactional data (salesorders, opportunities) should never be cached, as it changes all the time, and the amount of data you would end up caching would be too much. In most cases, you would not get a performance benefit for caching transactional data anyways.

NB: If you are re-using your processing logic for custom workflow activities, remember that custom types needs to be serialized before caching (See earlier post).

NBB: MemoryCache is not working in CRM Online plugins and workflows, due to plugins running in isolation, and the sandbox process is being restarted every few seconds! If you have exposed you logic to ad hoc user requests in a CRM Online environment, your code will not fail, but you should not expect running workflows to benefit from caching. I you are in an online scenario, you should make sure not to fetch more data than necessary instead.

Use pagination for retrieving data

Make sure that your bach job logic is scalable, by making sure the job won’t crash with timeouts or OutOfMemoryExceptions. Do this by using paging in all large queries, and make sure that the processing of each row is independent of the other rows in the retrieved page set. Also, for the paging to make sense, don’t keep the retrieved data in memory after you are done processing it.

Save the processing state to the record itself, and split your processing into several independent stages

Rather than creating processing jobs that are long and complex with many dependencies that can make the entire processing fail, try to split the entire process up into smaller steps that are independent of each other. To accomplish this, you should update the records you are working on with the current processing status/stage along the way, so the processing can continue in another run, without starting over from scratch. In order to truly be able to re-process any individual record from any state/stage, you need to write code where processing of one record is completely independent from other records, even if the records are related by nature. This is really useful when using pagination to retrieve the record set to work on, because you can never be sure that all the records related to each other is retrieved in the same page.

When the data you are working on is naturally related, and you cannot relate them in-memory because you have not loaded all the records in memory, you need to maintain the relationships on the database level. This is not as efficient as loading everything into memory, and doing all the processing in one go, but is scales a lot better. The approach also results in more robust solutions, as you can see the intermediate state of records that failed processing, rather than rolling a processing back to scratch. Having records with detailed error codes related to specific processing stages, makes error handling much easier. The chances that a user will actually look at an error and try to fix it (data errors), also increases drastically if the error message is inside CRM, rather than buried in a log file.

Consider an example:

Let’s say we have a invoicing batch job responsible for creating invoices and sending them to the ERP system.

The initial process, which is run daily, could be like the following:

  1. Query all salesorderdetail (orderlines), created the last 24 hours
  2. Create invoice with invoice lines for each salesorderdetail (orderlines)
    • The invoices are bundled by the related account
  3. When the invoices are created, they are sent to ERP through a web service, in a single call containing a list of invoices

The problems with this approach is the following:

  • If one salesorderdetail fails processing, nothing can be invoiced on that account.
  • If the processing logic is not built to expect errors, invoices will be left stranded in an error state, or an entire invoice might to be rolled back in case of errors (if saved in same transaction)
  • If the queried dataset has grown very large, the job might fail with timeout or OutOfMemoryExeptions.
    • Alternatively, you can use paging to avoid this, but then you will not end up with a single invoice per account, as individual lines on the same account might be fetched in separate pages

To improve on the initial design, we can do the following:

  1. Step – Create invoice lines:
    • Query all salesorderdetail (orderlines) in a specific state (i.e. No related invoice line, or invoice status == null)
      • Use paging to reduce the amount of memory used
    • For each salesorderdetail:
      • Create an invoice line
      • Do validation to ensure the record can and should be processed in the next step
      • Set the status on the invoice line (and salesorderdetail). This could also be a validation error code.
  2. Step – Bundling:
    • Query all invoice lines that are validated OK
      • Use paging to reduce the amount of memory used
    • Try to find an draft invoice in CRM
      • If one is found, add the invoice line to it
      • If not create a new invoice in draft state
  3. Step – Calc sums and Validate
    • Query all invoices in draft state
      • Use paging to reduce the amount of memory used
    • For each invoice:
      • Retrieve all invoice lines
      • Sum up amounts
    • Do final validation
    • Update status on invoice from draft to “ready to send”, or set an error code.
  4. Step – Send to ERP
    • Query all invoices in “ready to send”
    • Generate message and call ERP web service
    • Update status on all the invoices

Each of these steps can be run completely independently of each other. If an error occurs, or the processing stops, the processing can continue exactly where it stopped the next time. This is because the records in CRM contains the exact state of the processing, as opposed to keeping state in-memory.

By the way: These steps could be separate batch jobs running at different intervals, or they could be different stages in the same job. It is completely up to you. However, if you expose the job logic as workflows or custom actions to be called by a user manually, it would be a good idea to be able to manually force each step individually. That gives the most control for the user.

Retrieve first, and save last

If possible, retrieve all the un-cached data you need in as few queries as possible. Then process your logic in-memory. Finally, save all the changes using an ExcecuteTransactionRequest.

When retrieving the data for the job, see if you can get all the data you need by joining on the record set used as input for the processing. This will be the optimal solution in terms of performance. In general, when considering performance in batch jobs, you can ignore the in-memory processing and the initial fetch of records to work on. For any long running job, the only relevant performance factors are the fetches made while processing individual rows, along with saving the changes.

Thread safety

Based on the nature of the data you are processing, and the business criticality of the application, you might want to consider thread safety. What you want to avoid is having multiple simultaneous instances of the job running at the same time, processing the same data. This can especially be an issue if you use the strategy of fetching all the relevant data for the entire batch in one big query, and doing the rest of the processing in-memory. The processing can run over a long period of time, and without some locking mechanism, there can be a risk that the data changes without being reflected in the job, while it is running.

Even if you fetch the data for each individual record when it is being processed, and you check if the record still needs processing, there is a risk of double processing. This can happen if the processing is done in a web service layer that supports multiple simultaneous threads, and transaction data is being cached. What can happen in such a scenario, is that when you have two threads processing the same large dataset, the second thread will catch up on the first thread, because when all the required data is cached it can run much faster.

When threads are running in parallel I have seen a case where both threads check if the data should be processed, processes the data in both threads, and then updates the record’s status to prevent further processing. Imagine what this can do in an invoicing scenario!

However, in the most common cases, you don’t need thread safety in batch jobs, because you will have the job hosted in a way that only allows one concurrent thread. As long as a user doesn’t copy the job locally, or a developer decides to debug against production etc. 😉

If you think you need thread safety, I recommend two approaches:

  • Record level locking: If you would like to be able to run processing in multiple treads and you need to make 100 % sure that you have the current version of each record, you need the running thread to lock the individual records to be processed. You can do this by having the job call a web service, that performs the actual processing. In the web service application you can use System.Caching.MemoryCache to keep a list of records that are being processed, and that should be locked for other threads. Use the lock feature of C# whenever updating the list of locked records to avoid timing issues.
  • Global locking: If you don’t need the granularity above, you can implement a lock that makes sure that only one instance of the job can run at the same time, wherever it is being run from. Do this by creating a Lock entity in CRM, create a single record for each job type, and update this record to lock/unlock for processing. You might also want to have a new instance of the job unlock or take over an existing lock, as a failsafe, if the lock has been held for an abnormally long time (Previous job failed to release the lock).

Use impersonation

Plug-ins and workflows automatically use impersonation in the way that any create or update request will run in the context of the user that triggered the event.

When you move the logic out into a batch job (for performance reasons), you often want to have the same functionality. Therefore, consider impersonating the user set as owner of the main record used for processing input when making changes to that record. Impersonation is done by setting CallerId on the OrganizationServiceProxy object.

Deployment options

On-premise deployment

For an on-premise deployment, I usually go with a simple console application that is scheduled Windows Task Scheduler.

Some people tell me they prefer to implement the jobs as Windows Services because it should be more reliable and stable. However, it is not my experience that Windows services are more reliable than a Scheduled Task, so I prefer the Task Scheduler because it has so many scheduling options.

Consider the following when using the Task Scheduler:

  • The task needs to be set to execute whether the owning user is logged in or not.
  • If you need to run the task as a background task, set the owner to “SYSTEM”
    • NB: This will grant full system privileges to the task, so be careful. You also might not be able to open log files etc. generated on the file system.
    • You won’t see the console open when the task is running for background tasks
    • There might be a performance difference between running the task as a background or a foreground tasks. Server systems are typically configured to prioritize background tasks, while PC’s are configured to prioritize foreground tasks.
  • Ensure proper error handling. If the job fails completely, it needs to close itself down. Otherwise it will block further executions later.

For more complex scenarios, you could use Windows Task Scheduler in cooperation with a WCF service layer or WebAPI hosted on IIS.

Cloud deployment

The on-premise deployment is usually not very convenient if you are using Dynamics 365 Online. Instead you could use Azure WebJobs, which gives the same functionality as Scheduled Tasks, in an easy interface.

If you need a more complex set up, using web services, you can host a service application with Azure App Service Web App. You could also have a look at Azure Logic Apps which is really good for orchestration in between.