Application Down after Monitoring Alerts Implementation

I have implemented alert in my Compute Engine instance, and suddenly my application on that instance is down, I even could not SSH into it to check the issue at that moment.

Here's the condition:

1. Logs suddenly stopped after minutes.

2. CPU and Memory metrics stopped sent data. Before it stopped, there are does not have spike metric.

3. Application in that VM are stopped run.

4. Cannot SSH into the instance.

5. State in the VM: Running.

Here's the error before the logs stop working:

logs/monitoring.googleapis.com%2FViolationAutoResolveEventv1

from that I assume that is because of alert implementation.

What i have done before: Restart instance and the service again

But, i want to know what is the problem because i want to implement alerts in the future without affect my application.

 

3 3 139
3 REPLIES 3

Hello @bruleey  ,Welcome on Google Cloud Community.


@bruleey wrote:

I have implemented alert


What kind of alert? Which type of metrics did you used for that? It's important because there is no possibility to broke VM after creating the alert, especially if you are not using agent metrics, as all metrics not related with agent installed on VM, are taken from hypervisor. 
--
cheers,
DamianS
LinkedIn medium.com Cloudskillsboost

 

Hi @DamianS, thank you for asking.

This is the metric i've implement:

Name policy: VM Instance - High Memory Utilization

Conditions: Policy violates when ANY condition is met

Severity: WARNING

Triggers when: Any time series cross treshold

Treshold: Above 70%

Retest window: 5 min

Note: I also have Ops Agent installed. I create policy based on 'recommend alert' from GCP and just changing the treshold.

So based on that I'm assuming that you are using agent-based metric. See screenshot. If yes, there is a possibility (as always with any kind of agents and tools) that agent will cause VM freeze. I saw similar behavior on Azure and once on Google Cloud. 
It might not be your case, as this is rarely happening, but happening. Maybe, which more fits to this situation imho, any other app/agent/service or OS itself caused VM freeze.

BTW, which OS version are you using? Linux or Windows?

DamianS_0-1715860163444.png

--
cheers,
DamianS
LinkedIn medium.com Cloudskillsboost