Primary Replica Jobs

2018-04-02 Wayne Sheffield 5 comments

This post talks about having primary replica jobs. That is, jobs that will only run on the primary replica of an availability group.

Overview

An Availability Group (AG) takes one or more databases, and copies all changes to those databases from the primary replica to all the secondary replicas. However, anything that is not part of those databases are not copied over. Microsoft documents all of these in this article. This includes jobs that run against a database in an AG. Such jobs might include index or statistics maintenance, partition switching, etc. In short, any job that modifies the database needs to run against the primary replica only.

If the AG fails over to another replica, you want these processes to just keep on running. This means that you need these jobs installed on all the replicas, in the same state (enabled) and with the same schedule. However, trying to do these actions will cause the job to fail on the secondary replicas, since it can’t write to the database. So we need a way so that the job always runs, but it only does the work if the current SQL Server instance is the primary replica for this AG. Synchronizing jobs, and the ability of those jobs to run, is beyond the scope of this article.

Creating primary replica jobs

Obviously, what we need to do is to create the job so that it is aware of whether it is running on the instance that is the primary replica, and only proceed if it is. This last part “only proceed if it is” gives us a clue of what we need to do. We need to put the check and the work into separate job steps, and only run the work if the check is successful.

Do you see the next clue here? “if the check is successful”. We need to run a check to determine whether this is a primary replica. If successful, run the next part of the job. If not, stop. While there isn’t a way to tell a job whether or not to continue from within a job step, there are a few ways that we can stop the job completely if we need to.

Stopping the job, method 1

The first method for how to stop a job is to just raise an error. This method requires that the job step be configured to stop the job successfully on a failure. The text of the job step is simply:

IF NOT EXISTS ( SELECT  1
            FROM    sys.dm_hadr_availability_group_states hags
            JOIN    sys.availability_groups ags ON hags.group_id = ags.group_id
            WHERE   hags.primary_replica = @@SERVERNAME
            AND     ags.name = 'MyAGName'
          )
BEGIN
    RAISERROR('Not the primary replica', 16, -1);
END;

IF NOT EXISTS ( SELECT 1

FROM sys.dm_hadr_availability_group_states hags

JOIN sys.availability_groups ags ON hags.group_id = ags.group_id

WHERE hags.primary_replica = @@SERVERNAME

AND ags.name = 'MyAGName'

)

BEGIN

RAISERROR('Not the primary replica', 16, -1);

END;

And the job step’s properties will look like this:

Job Step configuration

When the job runs, if it’s not the primary replica for the specified AG, then the job step fails. Because the job step does not fail the job, the job is successful. If you view the job history, it will look like:

Job history

What I like about this method is the simplicity and that everything is in the one job. What I don’t like is seeing yellow and red when viewing the job history. When it does run on the primary replica, the job history will be green, so you can tell when it was the primary replica.

Stopping the job, method 2

A second way to stop the job is to execute the msdb.dbo.sp_stop_job stored procedure. This requires you to know either the job name or the job id. While we could hard code this, I like to make things dynamic when possible.

There is a way to know the job_id of the job dynamically. That is by using a SQL Agent token. Note that although the documentation says that JOBNAME and STEPNAME are valid, I (and others) cannot get those to work. However, the JOBID token does work. The job step will look like this:

IF NOT EXISTS ( SELECT  1
            FROM    sys.dm_hadr_availability_group_states hags
            JOIN    sys.availability_groups ags ON hags.group_id = ags.group_id
            WHERE   hags.primary_replica = @@SERVERNAME
            AND	    ags.name = 'MyAGName'
          )
BEGIN
    DECLARE @job_id UNIQUEIDENTIFIER;
    SET @job_id = CONVERT(UNIQUEIDENTIFIER, CONVERT(VARBINARY(34), $(ESCAPE_SQUOTE(JOBID)), 1));
    EXECUTE msdb.dbo.sp_stop_job @job_id = @job_id;
END;

IF NOT EXISTS ( SELECT 1

FROM sys.dm_hadr_availability_group_states hags

JOIN sys.availability_groups ags ON hags.group_id = ags.group_id

WHERE hags.primary_replica = @@SERVERNAME

AND ags.name = 'MyAGName'

)

BEGIN

DECLARE @job_id UNIQUEIDENTIFIER;

SET @job_id = CONVERT(UNIQUEIDENTIFIER, CONVERT(VARBINARY(34), $(ESCAPE_SQUOTE(JOBID)), 1));

EXECUTE msdb.dbo.sp_stop_job @job_id = @job_id;

END;

If this step stops the job, then the job history reports being cancelled. If you view the job history, it will look like:

Stopping job with sp_stop_job

What I like about this method is the simplicity and that everything is in the one job. What I don’t like is seeing red when viewing the job history. When this does run on the primary replica, the red will (should) be green, so you can tell when it was the primary replica. This method is slightly more complex than the other methods, but not enough to really make a difference.

Start another job

Another way to handle this is to have this job start another job. In this case, the job with the test has the schedule attached to it, and if it is the primary replica, it starts another job. The second job needs to be disabled and not have a schedule. The code for this method would look like:

IF EXISTS ( SELECT  1
            FROM    sys.dm_hadr_availability_group_states hags
            JOIN    sys.availability_groups ags ON hags.group_id = ags.group_id
            WHERE   hags.primary_replica = @@SERVERNAME
            AND     ags.name = 'MyAGName'
          )
BEGIN
    EXECUTE msdb.dbo.sp_start_job @job_name = N'Job to run';
END;

IF EXISTS ( SELECT 1

FROM sys.dm_hadr_availability_group_states hags

JOIN sys.availability_groups ags ON hags.group_id = ags.group_id

WHERE hags.primary_replica = @@SERVERNAME

AND ags.name = 'MyAGName'

)

BEGIN

EXECUTE msdb.dbo.sp_start_job @job_name = N'Job to run';

END;

What I like about this method is the simplicity and that the job history shows all green. What I don’t like is having twice as many jobs and having to hard-code the job to be run. Additionally, you can’t tell by looking at the job history when the job ran on the primary replica.

Programming Notes

In all three of these methods, the code references the AG name (MyAGName). If you only have one AG on the instance, then this line is not necessary. However, since you can have more than one AG on an instance, and each AG can have a different primary replica, I’m showing how to handle this.

Additionally, there is another way to determine if the current server is the primary replica, but this is on a database-by-database method. Just use the system scalar function sys.fn_hadr_is_primary_replica and pass in the database name to check.

Conclusion

In this post, I have shown three different ways to create primary replica jobs that only run if it is on the primary replica. The job is installed and enabled on all the replicas, and the work will only be performed on the primary replica. All three of these methods are viable methods of accomplishing this. You can use whichever method that you prefer. The order that I like to use these methods is 1): RAISERROR (simplicity), 2): Start another job (don’t like seeing red on the job history), then 3): using sp_stop_job

SQL AG, Availability Group, Job, Primary, Replica

#1 | Written by Rob Ashton about 8 years ago. Reply

I implemented a system similar to your first concept in a previous role. The only variations I had were that:
1) I wanted the same job to work whether the instance was an AG cluster or not, and in the case of a non-AG cluster, or a standalone instance, just lump straight to doing the process.
2) I needed to be able to run the job on versions older than SQL Server 2012, so the checks against the DMVs had to be separated into steps that were called only if the version was high enough to warrant them.

The job/step green/yellow/red issue vexed me as well, and the only way I saw to improve the situation was to switch over to checking the go/no go decisions and running the process(es) involved in the overall job step as a single PowerShell script. Of course, that wasn’t going to be simple since not all of the instances ran on Windows Server 2008 or higher… *cough* Windows Server 2003/SQL Server 2005 *cough**cough*

Loading...
#2 | Written by ikour about 8 years ago. Reply

what should the privileges of the login be which is being used to run job? Unless it is sysadmin its not giving the same results.

Loading...
- #3 | Written by Wayne Sheffield about 8 years ago. Reply
  
  Good question. sys.availability_groups requires the “VIEW ANY DEFINITION” permission, and sys.dm_hadr_availability_group_states requires the “VIEW SERVER STATE” permission.
  I typically configure jobs to be owned by “sa”, so they will run with sysadmin.
  
  Loading...
#4 | Written by Bill Fritz about 5 years ago. Reply

Thanks Wayne! Very nice article. The error produced causes our monitoring software to think the job failed even if our Job Step says to Quit Reporting Success. So I had to go another route.
I also like the ability to use code that doesn’t have to change i.e. Database Name, Listener Name, or even Job Name etc. So I really like what you provided to use a SQL Agent token.

This gives me a nice clean AG_Test Step 1 that can be added to any SQL Agent Job.

IF (SELECT primary_replica FROM sys.dm_hadr_availability_group_states) is NULL
BEGIN
PRINT ‘This is a not an AG member server. –Not an AG Member Server and Job Execution should continue.’
END
ELSE IF (SELECT primary_replica FROM sys.dm_hadr_availability_group_states) = @@Servername
BEGIN
PRINT ‘This is a AG Primary Replica. –Do Nothing. This is an AG Primary Replica and Job Execution should continue.’
END
ELSE IF (Select primary_replica FROM sys.dm_hadr_availability_group_states) @@Servername
BEGIN
PRINT ‘This is a AG Secondary Replica. –Stop Job. This is an AG Secondary Replica and we do not want Job Execution to continue on this Node.’
DECLARE @job_id UNIQUEIDENTIFIER;
SET @job_id = CONVERT(UNIQUEIDENTIFIER, CONVERT(VARBINARY(34), $(ESCAPE_SQUOTE(JOBID)), 1));
EXECUTE msdb.dbo.sp_stop_job @job_id = @job_id;
END

The result is that if the Server executing the SQL Agent Job is not an AG member server or is an AG Primary, it does nothing. If it is an AG member server NOT the Primary, it cancels the Job.
The Job is stopped and the log shows as cancelled. No more error thrown to cause “false” failure alerts.
Thanks for providing a great explanation and helping me think through this. Hope this Job Step is useful for others in the community.
Bill

Loading...
- #5 | Written by Wayne Sheffield about 5 years ago. Reply
  
  Hi Bill,
  I really like what you’ve done here. Looks great!
  
  Loading...

Wayne Sheffield

Recent Posts

Top Posts

Archives

Categories

Blogroll

Primary Replica Jobs

Overview

Creating primary replica jobs

Stopping the job, method 1

Stopping the job, method 2

Start another job

Programming Notes

Conclusion

Like this:

Related

Leave a Reply to let me know how you liked this postCancel reply

Wayne Sheffield

Recent Posts

Top Posts

Archives

Categories

Tags

Blogroll

Primary Replica Jobs

Overview

Creating primary replica jobs

Stopping the job, method 1

Stopping the job, method 2

Start another job

Programming Notes

Conclusion

Share this:

Like this:

Related

Leave a Reply to let me know how you liked this postCancel reply