Thursday, June 28, 2012

MSSql Alerts - More than just notifications


Monitoring MSSQL server environments can be a tedious and demanding task especially when you are dealing with various versions across multiple servers. It is impossible to dedicate resources to manually monitor each instance however MSSQL does include out of box functionality to assist called MSSQL Alerts which is also available even in older versions of MSSQL such as MSSQL server 2000. MSSQL alerts can significantly assist in letting you know when a failure is imminent or has just occurred. That is just the start though, various other event driven functionality that you would like to implement can also be achieved by the different responses available in MSSQL Alerts. Alerts can also be manipulated to run when and how you would like them to run and furthermore the data that they provide can actually be collected and used for reporting.

MSSQL alerts are implemented via the MSSQL agent hence the agent has to be running in order for your alerts to fire. There’s a vast number or alerts available and these are listed under three main categories which may drill down even further.

  • SQL Server events - MSSQL has a set of message/error codes which are assigned a severity level, So multiple error numbers may have the same severity.  These alerts fire for a specific error number or can fire on a certain severity level which can encompass multiple error codes.  For a list of message/error codes, check out the sys.messages system table.
  • Performance conditions – Performance condition alerts assist in maintaining a smooth running MSSQL server and can fire when the threshold of a certain performance counter has been breached. So for example if you server CPU usage exceeds 80% an alert can be set to fire and notify you.
  • Windows Management Instrumentation (WMI) events – These alerts are set to fire with the use of the WMI monitor for a specific MSSQL server event. WMI allows for local and remote monitoring of servers and contain many MSSQL server events. WMI queries can be used to define alerts. An example of what you can alert on with WMI Alerts would be auditing failed logins.


Sql Alerts can be located under the MSSQL Agent in SSMS. Simply expand your SQL Agent and right click your alerts folder to create a new alert. It is possible to restrict your alert to a specific database and further restrict the alert to only fire if the message contains a certain phrase.




Whilst setting up your alert you have the option of setting the action to be taken when the alert is triggered. This can be set to notify an operator via email, pager or net send. Make sure DB Mail is set up and ticked in your MSSQL agent properties in order to receive mails. There is also an option to run a job, this opens up a world of possibilities and can be very useful as this can be used to resolve a problem without any human interaction like above. The jobs that are run can also be used to capture additional information that can be used to discover why the problem occurred. Notifications and job execution may both be ticked.




Lastly Comments can also be included into your alert. These comments will be included in the delivery method that you chose.


Manipulation

Alerts can be enabled and disabled via Sql queries. This can be very useful if you would like to only have an alert firing during a certain period of time or if a certain business rule is met. For example if you wanted your replication latency alert to fire only during business hours or when a job that inserts large amounts of data into the system is running, one could make use of a job to enable the Alert on a specific schedule or add steps to the beginning and end of a job that you would like to monitor which then contain the sp_update_alert system stored procedure eg

EXEC msdb.dbo.sp_update_alert @name=N'Replication Warning: Transactional replication latency (Threshold: latency)',
                        @enabled=1--<Bit to enable (1) or disable (0)

It is also possible to intentionally create an error which will trigger an alert by making use of the Raiserror() function. This can be useful if you would like to trigger an alert within scripts or jobs. Alerts can thus be further manipulated to include business rules and scheduling on an error level with dynamic comments. Here is an example of a custom Replication latency alert script which I wrote to get around a problem of a replication latency error not being raised in Certain Versions of MSSQL by using the Raiserror() function. Once the error is created the alert picks it up and acts accordingly.

declare @Latency varchar(8)
set @Latency = '00:00:10'

declare @LatencySec int
set @LatencySec = (datepart(hour,@Latency)/3600) + (datepart(minute,@Latency)/60) + (datepart(second,@Latency))

declare @TracerID int
exec sp_posttracertoken @publication = 'TestPublication',@tracer_token_id = @TracerID OUTPUT;

WAITFOR DELAY @Latency

declare @TokenTable table(
distributor_latency     int,
subscriber varchar(100),     
subscriber_db  varchar(100), 
subscriber_latency int,
overall_latency int)

Insert into @TokenTable
exec sp_helptracertokenhistory 'TestPublication',@TracerID

if (select isnull(overall_latency,@LatencySec) from @TokenTable) >= @LatencySec
            begin
            --Further rules can be added before error is raised
            raiserror (14161,10,1, '0', @Latency, 'Custom Comment?')
            end

select *,@TracerID from @TokenTable      
           
exec sp_deletetracertokenhistory 'TestPublication',@TracerID

exec sp_helptracertokens 'TestPublication'


Reporting

A great side effect of all this data being relayed to the operator is that one can use it for more than just problem solving but also as a base to report off and discover trends in your database server’s health. We can achieve this by different methods, Here are a few.
  • Errors are recorded in the MSSQL error log which can be read using the command  “xp_readerrorlog 0,1,'Error Number'” however lower severity errors have to be enabled to be written to the error log. The sp_altermessage can be used to alter message attributes in order to always write it to the error log. Results from each day can then be written to a table and reported off.
  • Alerts could call a job or a step can be added into the current job that the alert calls to write an entry to a history table which is being reported off.
  • Alert information is stored in the [msdb].[dbo].[sysalerts] which can also be used for reporting however the table only contains a single line entry per alert. In order to create a historical data table we have to poll this table periodically. Eg,


Table to store polled data.


CREATE TABLE [dbo].[AlertHistory](
   [ID] [int] IDENTITY(1,1) NOT NULL,
   [AlertID] [int] NULL,
   [name] [sysname] NULL,
   [event_source] [nvarchar](100) NULL,
   [event_category_id] [int] NULL,
   [event_id] [int] NULL,
   [message_id] [int] NULL,
   [severity] [int] NULL,
   [enabled] [tinyint] NULL,
   [delay_between_responses] [int] NULL,
   [last_occurrence_date] [int] NULL,
   [last_occurrence_time] [int] NULL,
   [last_response_date] [int] NULL,
   [last_response_time] [int] NULL,
   [notification_message] [nvarchar](512) NULL,
   [include_event_description] [tinyint] NULL,
   [database_name] [nvarchar](512) NULL,
   [event_description_keyword] [nvarchar](100) NULL,
   [occurrence_count] [int] NULL,
   [count_reset_date] [int] NULL,
   [count_reset_time] [int] NULL,
   [job_id] [uniqueidentifier] NULL,
   [has_notification] [int] NULL,
   [flags] [int] NULL,
   [performance_condition] [nvarchar](512) NULL,
   [category_id] [int] NULL,
 CONSTRAINT [PK_AlertHistory] PRIMARY KEY CLUSTERED
(
   [ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [IX_AlertIDLastODLastOT] ON [dbo].[AlertHistory]
(
   [AlertID] ASC,
   [last_occurrence_date] ASC,
   [last_occurrence_time] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO


Query to be placed in a job and run periodically depending on the data precision required

insert into AlertHistory
select * from sysalerts sa (nolock)
where not exists (
select *
from AlertHistory ah (nolock)
where ah.AlertID = sa.id
         and ah.last_occurrence_date = sa.last_occurrence_date
         and ah.last_occurrence_time = sa.last_occurrence_time
)

Once you have collected an adequate amount of data you can plug it into SSRS, Excel or whatever reporting mechanism you may fancy and create reports to analyse the data. Eg




Sql server alerts are a great way to let you know of problems when they occur so that you can get onto fixing and solving them before they turn into a disaster and with the added benefits of job runs as responses, alert manipulation and reporting/trend analysis one can definitely stay one step ahead…

Remember “Intellectuals solve problems, geniuses prevent them” – Albert Einstein

No comments:

Post a Comment