How does Grok determine what is anomalous?
Grok has a unique approach to reporting anomalies that is designed to cut down on the number of false positives. Grok first learns patterns in your data and builds models that predict what is likely to happen next in the metric data stream. Based on the internal predictions, the algorithms generate a raw anomaly score for each data point.
The raw anomaly scores are smoothed with a short moving average. Then the scores are compared to a window of historical scores to determine the mathematical likelihood that the scores match past predictability.
This allows Grok to determine the statistical likelihood the system is behaving anomalously relative to recent history for the particular data stream. This normalizes the anomaly scores, allowing Grok to confidently detect anomalies if both predictable and unpredictable data.
What should I do if Grok uncovers anomalous activity?
Each datapoint is assigned an anomaly score. An anomaly score indicates the severity of a particular anomaly event, but even the lowest scoring events are worth further investigation. For starters, one could view log files from the device in question to discover any abnormal activity. The higher the anomaly score, the more likely abnormal activity is occuring.
Just like with humans, symptoms observed from an environment are the first indication of abnormal behavior that could arise in the future. The more metrics that observe anomalous activity at a time, the more likely the device in question has an issue worth investigating.
To help with investigation, you can employ automation scripts to take action the moment Grok detects an anomaly. For example, if a server metric such as CPU Utlization has displayed anomalous data, then one could trigger an automation to restart the device in an attempt to mitigate the issue. You can also create an integration to a third-party tool. Grok can send anomaly scores to other tools, or it can take action within a tool as well.
What is the probationary learning period?
Grok begins building models on the selected instances immediately. The instance will be in ‘pending’ mode while the model is still the in the probationary learning phase, generally for the first one thousand records. While the model is still in this phase, it will display grey-scale bars.
It will then move to ‘active’ when the instance is actively being monitored, indicated by colored bars, and can be used for accurate anomaly detection. The model will continue learning beyond the probationary learning period through the life of the model and will adjust and adapt to changes in your data.
When will I begin to receive notifications?
Notifications begin after the ‘probationary period’ has ended, after the first thousand records have been received. We do still display the anomalies that are found during the probationary period but you will not receive notifications during that time.
Why should I create an automation?
The automation feature provides a way for operations teams to conduct initial troubleshooting for a service that Grok detected anomalous behavior. Usually when a service failure occurs, operations teams will conduct a series of initial steps to diagnose and resolve the issue, including pulling error logs at the time of failure, restarting the server or rerouting traffic via a load balancer. Many of these steps can be initiated using scripts and API calls to your cloud provider. When Grok detects an issue, this can trigger a script. This cycle of detection and action has the potential to save an ops team time so they can focus on building awesome software instead!
What is the difference between a step and a branch?
A step runs a script after the previous script has run successfully with no errors, whereas a branch runs only if the provided conditional output is returned by the output of the script which ran before it. If a list of troubleshooting steps that must occur in a specific order, each script will only run after the previous script runs successfully. If a troubleshooting workflow requires confirming a specific output of the script before it, a branch can be added to confirm the output before moving forward.
Where can I view the output that resulted from a triggered automation script?
Script output and history can be viewed in two places: the list of automations or in the anomalies table. The automation list shows the full history in the context of the automation. The anomalies table shows any output that an automation provides as the result of being triggered by a detected anomaly. The output can be accessed via the Grok API as well.
Why am I not seeing any anomalies?
There are many reasons why this may occur, but some of the most common are:
The most common reason is that anomalies, by definition, are abnormal. Therefore, if your systems are functioning as normal, you won’t see any anomalies.
Grok learns and builds models from your data during its probationary learning period. It is possible that Grok has not yet received enough data to confidently detect anomalies.
Grok has not detected any anomalies during a given time period for that instance. Some data may look unusual but if Grok has previously learned the patterns then it will not find it anomalous.
Try looking at the instance data over the last few days or weeks to see if similar patterns have previously occurred.
Why do similar metrics with similar patterns of data produce different anomalies?
Grok builds a single model per metric. The models for custom metrics estimate the value range of the data based on the first few hundred records and the anomaly detection is based on the patterns learned from data in the past. It is normal to see slight variations between models with similar data. Even if the patterns look identical between two metrics, it is likely that past behavior was different, resulting in different predictions for new data from the models.
I am having an issue that I can’t resolve. How do I contact support?
Please email Grok support at firstname.lastname@example.org
- Download Here
- Four Pages (Ten Minutes)