Quantcast
Channel: THWACK: Popular Discussions - Server & Application Monitor
Viewing all 3454 articles
Browse latest View live

Restarting a service when stopped and monitored not working

$
0
0

Hi everybody

 

I am very new Solarwinds and just today been trying to monitor a Windows Service and create an alert to restart it if stopped. It wont work.

 

So I just got off the phone to a SoalrWinds technician who was very good at answering any questions I had, but we were both unable to get this to work.

 

Enviroment:

Windows Server 2008 R2 Servers

2003 AD Domain

Latest SAM

Server names: VMEV01 + VMSOLAR01

 

So on VMEV01 there is a local admin account called evadmin who is local admin. and using this account to monitor the VM and the services works a treat.

 

ISSUE:

An alert that triggers when a monitored service is stopped wont start the service again. This is the command APM\APMServiceControl.exe ${ComponentId}. Looking in the backend database the SolarWinds engineered showed me that the command ran successfully, but for some reason the service on the server did not start. It is worth mentioning that his alert and my alert was 100% identical.

 

Any ideas? attached some screenshots


Audit Active Directory using SAM

$
0
0

Can you audit Active Directory using SAM?

 

i.e. Account changes, logins etc.

Alert on Active Directory account lockout

$
0
0

I've setup and event log monitor to watch for event id 4740 and that works properly.  I'm trying to figure out how to get the username of the locked out account in the message of the email alert.  ideas?

High Number of DCOM communication errors when APM fails to connect...

$
0
0

Team,

Is there a relationship between DCOM (Event ID 10009 - DCOM was unable to communicate with the computer) errors and Orion APM being unable to connect to a host for monitoring? If so, is there a way to fix DCOM so that there are not so many retries and errors in the event console or is there a way to fix Orion APM so that it sends a "graceful" error to a log?

Have any of you seen this?  I raised an issue with SolarWinds support as well.

Thanks for any ideas.

Windows Service Monitor - Unexpected error occured. Invalid class

$
0
0

I have a host that shows the error message "Unexpected error occured. Invalid class" for all Windows Service Monitors (using the application template Windows 2003-2008 so DTC, Network Connections, Protected Storage and Remote Registry).

 

I've tried the wmiadap /f command and rebuilt the WMI repository, etc to no avail.

 

Has anyone else experienced this issue?

PowerShell Remoting by FQDN instead of IP

$
0
0

I'm trying to use the  Windows PowerShell Monitor component (actually as part of the "SolarWinds Web Performance Monitor (WPM) Player" template) in the Remote Host Execution Mode.  The component attempts to connect using an IP instead of an FQDN, so this error is generated:

 

PowerShell script error. Connecting to remote server 172.10.10.31 failed with the following error message : The WinRM client cannot process the request. Default authentication may be used with an IP address under the following conditions: the transport is HTTPS or the destination is in the TrustedHosts list, and explicit credentials are provided. Use winrm.cmd to configure TrustedHosts. Note that computers in the TrustedHosts list might not be authenticated. For more information on how to set TrustedHosts run the following command: winrm help config. For more information, see the about_Remote_Troubleshooting Help topic.

 

If I specify HTTPS instead, then this error is returned:

 

PowerShell script error. Connecting to remote server 172.10.10.31 failed with the following error message : The server certificate on the destination computer (172.10.10.31:5986) has the following errors: The SSL certificate contains a common name (CN) that does not match the hostname.

 

Is there some tricky way to tell the component monitor to access via some Fully Qualified Domain Name instead, or is this a feature request that needs to be made?  I realize that could go about modifying the TrustedHosts setting on all of my different pollers, but we'd prefer the ability to not to and to be able to use HTTPS as it's available for all of our connections.

Any way to pull a polling engine status from SWQL?

$
0
0

I have a powershell script written to balance the load between multiple polling servers on a regular basis but I'd like to have it exclude a polling server if it is down or the engine status is down. I have created a SWQL query that gets me the polling engines and checks for the node.status = 2 which I can use to exclude the polling server if it is actually down or the agent service is stopped but it doesn't exclude the polling engine if the Solarwinds services are stopped or the polling server is just not responding.

 

SWQL query is:

 

Select E.EngineID,E.ServerName,N.Status

From Engines E

Inner Join Nodes as N ON E.IP = N.IP_Address

Where N.Status = 2

 

Powershell is:

$Engines = Get-SwisData $swis 'Select E.EngineID,E.ServerName,N.Status From Orion.Engines E Inner Join Orion.Nodes as N ON E.IP = N.IP_Address Where N.Status = 2'

 

Is there a way to get the engine status like you get in Settings > All Settings > Polling Servers ?

 

Otherwise I'll have to use the Polling Completion or something as a proxy value for the polling engine status which is less exact.

 

Thanks

-Jim

Issues with solarwinds. Kinda a long read.

$
0
0

Has anyone had any recent issues with the new solarwinds?  I have dome a migration from what it a relatively old, but stable environment that just worked, to a new and shiny environment that appears to be not what I expected.

Though, now that I think about it, I mean there could be all kinds of reasons why things aren't working.  Some of these could include:

- Using more features.

- Building it into a retrospective real-life environment.

- Other external effects such as OS, network speed, storage devices, ect.

 

So, let me explain from the beginning, do a compare and contrast and then a analysis.

 

Introduction.

We had an old, stable environment that was running solarwinds 10.5 and then I did an upgrade to 11.5.  It worked for it's purpose (kind of) but it wasn't how I liked it when I adopted it.  I still don't like it how I adopted it, but that's another story.

It operated like the following:

- Nodes were added to solarwinds and then an alert was made for that node(s) eg - alert for customer x.

- Nodes were added to SAM as per a node basis.

- pictures were used...I hate pictures. -.-

- add ons such as NCM and WPM were forgotten and left and on separate servers with their own databases

- network data/information was left to as is and left to rot.

- Reporting was out of date

- Alerts were ignored.

- nodes were left or never added

- Clusters and the resources were forgotten

 

Since I did a new environment, I want to make the following happen:

- I now run a regular weekly scan on our environment to ensure things are not ignored/forgotten

- I have included all of our failover environments, their resources and their respective IP addresses.

- Reporting now has a process and rather than fill up on reports we don't need, I include a 30 day policy that deletes old reports.

- I wanted to start a process of having everything in a group and then working from that group - Eg - groups would be used to application monitoring, for dependencies, for network mapping, to build our entire environment.

- From there I would build the alerting based on logic and grouping of resources/nodes - ergo making it easier to manage.

- I have a daily report to show me things that have broke over the last week/day.

- I am forcing people to acknowledge alerts and act on them.

- NCM, IPAM and other networking applications, WPM are now integrated into solarwinds - thus making it easier to manage

- I have no pictures because I hate pictures. (this is something that I know people might enjoy, which is fine, but I have not had any support on what people wanted within my organisation.  So I see it if you do not provide me with any requirements that you want, or come to any meeting I have arranged then simply you see it as everything is fine.)

- I am encouraging people to notify me when work is being carried out so I may do my admin in solarwinds

- I would like it to be part/integrated into service-now.  If possible for things such as audit tracking of nodes.

 

So, that is my goal in the end. But I have come into some weird things.  These particular things are annoying me somewhat and solarwinds has still not come back to me with a fix/hot patch/update/other.  Neither have I found a real solution within thwack:

 

- Sam does not add groups to templates.  It sometimes kinda works by restarting a service (I think) on the solarwinds box and then mess around with it (I haven't done it for a few months so would need to refresh my memory)

- Custom charts are breaking the solarwinds box.  You create them and then they suddenly just bottle out the server(s) I did a schema cache fix and that has so far proven unsuccessful. I am still working on this fix.  Though this has been known for ages about this.

- Sometimes things work and sometimes they don't.  I know that everyone's environment is different to other peoples and there's lots of variables because although there is some universals within environments, there are some huge differences. That's fine.  But why do some hot fixes work for people and then some don't?  Again, I know why, it's just a niggle that I have had recently.

 

So that's my beef.


DNS record monitoring

$
0
0

Is there a way to monitor an A record which has many values for any additions or deletions.

 

IE:

record.store.com A 1.1.1.1

record.store.com A 1.1.1.2

record.store.com A 1.1.1.3

record.store.com A 1.1.1.4

 

I would like to know when something is added to or removed from record.store.com.

(It seems simple to monitor for a removal of a record)

 

Thanks for your incoming ideas.

What could cause a monitoring Agent to be stuck in "Update in progress" ?

$
0
0

I have 13 Windows and Linux monitoring agents that have been stuck in "Update in progress" status since at least 9/30. I tried restarting the agent service on a Linux node and a Windows node, but the status did not change. I cannot find any log files that might tell me what the problem is, so I'd like to cancel the upgrade and have Solarwinds try again.

 

Any ideas on how to resolve the issue are welcome. I attached some screenshots from the Agent management screen better communicate the issue.

 

2016-10-06 08_23_52-Manage Agents.png

 

2016-10-06 08_34_26-Manage Agents.png

At the time that I noticed the issue, we were running on RC2 of NPM, SAM, NTA, And NCM. I upgraded to the GA release versions last night.

Version banner:

SolarWindsOrion Platform 2016.2.100, WPM 2.2.1, IPAM 4.3.2, SRM 6.3.0, VNQM 4.2.4, NCM 7.5.1, NPM 12.0.1, DPA 10.2.0, QoE 2.2.0, NTA 4.2.1, IVIM 2.1.2, SAM 6.3.0, NetPath 1.0.1 © 1999-2016 SolarWinds Worldwide, LLC. All Rights Reserved

Problems charting statistic data via a custom chart?

$
0
0

We made a quick and dirty little PowerShell script that we are executing on all of our polling engines via a PowerShell component monitor.

 

$ScriptBlock = { ( netstat -anob ) | Where-Object { $_ -like '*svchost.exe*' } | Measure-Object | Select-Object -ExpandProperty Count}

$count = Invoke-Command -ComputerName '${Node.Caption}' -Credential '${CREDENTIAL}' -ScriptBlock $ScriptBlock

Write-Host "Statistic:" $count

Write-Host "Message: Total connections " $count

 

(A reminder to max sure your template is set to x64 and the component is set to local execution with impersonation!)

 

When we try and graph the statistic data though, it fails.  But it only fails for the new, custom chart widget (the one that let's you dynamically shift the time period, slide the period focus, etc.)  It works using the old-style Multiple Object Chart, but that chart has a limited set of data summation options.

 

Has anyone else run into the same or similar problems with graphing this data series?  I verified that we could see the data series via this SWQL query and it works just like any of the other data series.  (Yes, our component is called svchost.exe connection count)

 

SELECT

cs.component.application.node.caption

,cs.ComponentStatisticData

,CONCAT('http:\\monitoring.cardinalhealth.net',cs.component.DetailsURL) as URL

FROM Orion.APM.CurrentStatistics cs

WHERE

ComponentName = 'svchost.exe connection count'

ORDER BY cs.ComponentStatisticData DESC

AppInsight of SQL - SQL Agent Job Component

$
0
0

I am working with AppInsight for SQL template, and trying to adjust component thresholds to reflect our event management requirements. One item that is particularly frustrating is the SQL Agent Job Component. No matter what I supply for warning or critical state it does not appear to follow similar rules in it's effect upon overall instance state.

 

For example, Buffer Cache Hit Ratio. If you leave the warning and critical thresholds empty (not configured) that component will always show green/up status regardless of current value collected. This in turn rolls up into the overall instance state contributing green/up status to the pool of other component monitors.

 

In the case of SQL Agent Job Info.  If you leave the warning and critical threshold values empty or set a very high number for warning and critical state on this component, if one job fails that component is identified in critical state and the instance is then identified in critical state.

 

Is there a way around this issue, minus disabling the SQL Agent Job component on the AppInsight template?

WMI monitoring through firewalls and NAT routers?

$
0
0

Hi clever people...

has anyone managed to solve the problem of monitoring WMI stuff from APM, when the APM poller sits on the remote site of a firewall tot he device being monitored? if i disable all rules, and make it any:any, then it works but if i have a firewall blocking all but the WMI single port, and have NATs in place, then i am scuppered.

any help would be great!

Solarwinds, what i would ideally need is a remote poller for APM, to sit on the remote site of the firewall and router, and report back into my set of ALX pollers in my management network...

any help?

Alert on Active Directory account lockout

$
0
0

I've setup and event log monitor to watch for event id 4740 and that works properly.  I'm trying to figure out how to get the username of the locked out account in the message of the email alert.  ideas?

Restarting a service when stopped and monitored not working

$
0
0

Hi everybody

 

I am very new Solarwinds and just today been trying to monitor a Windows Service and create an alert to restart it if stopped. It wont work.

 

So I just got off the phone to a SoalrWinds technician who was very good at answering any questions I had, but we were both unable to get this to work.

 

Enviroment:

Windows Server 2008 R2 Servers

2003 AD Domain

Latest SAM

Server names: VMEV01 + VMSOLAR01

 

So on VMEV01 there is a local admin account called evadmin who is local admin. and using this account to monitor the VM and the services works a treat.

 

ISSUE:

An alert that triggers when a monitored service is stopped wont start the service again. This is the command APM\APMServiceControl.exe ${ComponentId}. Looking in the backend database the SolarWinds engineered showed me that the command ran successfully, but for some reason the service on the server did not start. It is worth mentioning that his alert and my alert was 100% identical.

 

Any ideas? attached some screenshots


Agent Deployment Firewall Ports help needed.

$
0
0

Hello!

 

I am trying to deploy an agent to a server across our WAN.

 

Ports that I have opened are 135 and 443 to the server and 17778 back as per the documentation.

 

This keeps failing with "Credentials test failed. Path not found" 

 

Analysing the network we find that it is trying to connect with ports, 445, 139 and 137

 

Are ports ports, 445, 139 and 137 the ones I should be opening?

Are there any other ports I should be using?

custom sam?

$
0
0

how customized is your SAM?

show a screenshot?

PowerShell Remoting by FQDN instead of IP

$
0
0

I'm trying to use the  Windows PowerShell Monitor component (actually as part of the "SolarWinds Web Performance Monitor (WPM) Player" template) in the Remote Host Execution Mode.  The component attempts to connect using an IP instead of an FQDN, so this error is generated:

 

PowerShell script error. Connecting to remote server 172.10.10.31 failed with the following error message : The WinRM client cannot process the request. Default authentication may be used with an IP address under the following conditions: the transport is HTTPS or the destination is in the TrustedHosts list, and explicit credentials are provided. Use winrm.cmd to configure TrustedHosts. Note that computers in the TrustedHosts list might not be authenticated. For more information on how to set TrustedHosts run the following command: winrm help config. For more information, see the about_Remote_Troubleshooting Help topic.

 

If I specify HTTPS instead, then this error is returned:

 

PowerShell script error. Connecting to remote server 172.10.10.31 failed with the following error message : The server certificate on the destination computer (172.10.10.31:5986) has the following errors: The SSL certificate contains a common name (CN) that does not match the hostname.

 

Is there some tricky way to tell the component monitor to access via some Fully Qualified Domain Name instead, or is this a feature request that needs to be made?  I realize that could go about modifying the TrustedHosts setting on all of my different pollers, but we'd prefer the ability to not to and to be able to use HTTPS as it's available for all of our connections.

Linux Agent Deployment

$
0
0

Linux Agent Deployment

 

 

Add Node Wizard - Push Deployment

 

Deploying the Linux Agent to an individual machine is as simple as adding the node to Orion via the Add Node Wizard. To begin, navigate to [Settings -> All Settings -> Add Node], enter the IP address or fully qualified host name of the Linux host you'd like managed in the "Polling Hostname or IP Address" field, and select the "Windows Servers: Agent" radio button from the available "Polling Method" options. Next, enter the credentials that will be used to both connect to the Linux host and install the agent software. The credentials provided here should have 'root' or equivalent level permissions. Note that the credentials provided here are used only for initial deployment of the agent. Future password changes of the account credentials provided here will have no impact on the agent once it is deployed.

 

The Agent is deployed to the Linux host using a combination of SSH and SFTP requiring TCP port 22 be open from the Orion server (or additional polling engine) to the Linux endpoint you wish to manage for push deployment to function properly.

 

Agent Deployment Add Node.png

 

Once credentials are provided click "Next" at the bottom of the page. You will then be prompted to start the installation of the Linux Agent. Click "Start Install" and the progress indicator will appear. When finished, you will be taken through the rest of the Add Node Wizard flow where you can select which resources you wish to monitoring on the host.

 

Install Agent PromptInstalling Agent Progress IndicatorList Resources
Install Agent Software.pngInstalling Agent Software.pngList Resources.png

 

 

Manual - Pull Deployment

 

In some scenarios it may not be possible for the Orion server to push the agent to the Linux host over SSH. This is not uncommon when the host you wish to manage resides behind a NAT or is hosted in the cloud. While firewall policy changes, port forwarding, or one-to-one address translations could be made to facilitate push deployment of the agent, in many cases it may be far easier to perform a manual deployment of the agent to those hosts.

 

The Linux Agent can be downloaded from the Orion web interface to the Linux host by going to [Settings -> All Settings -> Agent Settings -> Download Agent Software] and selecting "Linux" from the options provided and clicking "Next". In the following step of the Wizard select "Manual Install" and click "Next". Finally, In the third and final step of the wizard is where you will select the Linux distribution you will be installing the Agent on, as well as bitness of the OS (32 or 64bit). Here you can also configure any advanced options the agent will use when it is installed, such as which polling engine the Agent should be associated with in Agent Initiated (Active) mode, or the listening port the Agent will use when running in Server Initiated (Passive) mode.

 

Select Agent TypeSelect Deployment MethodChoose Agent Settings & Distribution
Select Agent Type.pngAgent Deployement Type.pngAgent Distribution.png

 

Once selecting all the appropriate configuration options, click the "Generate Command" button at the bottom of the page. This will generate a dynamic installation command based upon the the settings chosen above, which can then be copied and pasted into an SSH or X-Windows session on the Linux host. The Linux machine will then download and install the appropriate agent software from the Orion server using those pre-configured options.

 

Copy Install Command.png

Paste the generated command into your Linux terminal session and press 'enter' to start the download and install process.

Console Session Install Agent.pngThe command will begin downloading, then installing the Agent onto the machine. When complete, the agent service will automatically start on the Linux host registering with the Orion server, and becoming a managed node.

Agent Download and Install.pngOnce registered select your newly added agent node and click "Choose Resources" from the 'Manage Agents' view to select items on the node you would like to monitor.

Choose Resources.png

 

Mass Deployment - Repository

 

With the introduction of the Linux Agent in SAM 6.3 Beta 3, the Orion server and Additional Web Servers are now also Linux repositories for the Agent. This method can be utilized for mass Agent deployment using automation and orchestration tools such as Puppet and Chef. It also means you can use the same native built-in  package management tools for the Agent, as you would any other package on the Linux operating system. Now you can install the Linux Agent the same way you might install or update BIND or Apache, using 'yum', 'apt-get', or 'zypper' depending upon your distribution. To utilize 'yum', 'apt-get', or 'zypper' for installing the agent you must first register the Orion repository with your operating system. To do so, navigate to [Settings -> All Settings -> Agent Settings -> Download Agent Software -> Linux] and select 'Install via Package Management Tool (e.g. yum, apt)'.

 

yum.pngBased upon the distribution selected in the drop down (1), a command line string is dynamically generated. Simply click the 'Copy' button (2), then paste this string into an SSH session for the Linux host you want to add the repository to and press 'Enter'. Once the repository is added to the host, you can use the appropriate 'yum', 'apt-get', or  'zypper' command to install the agent from the repository.  Simply click the 'Copy' button (3) and once again, paste this command into the SSH session to install the Agent. The Agent will then be downloaded from the Orion repository and installed on the host.

 

 

 

Select Distribution
Add Repository to Host
Linux Repository.png

 

SSH Add Repository.png

Install AgentConfigure Agent

CopyPaste Command From Step #3 Above

yum install.png

IP/Hostname, Username & Password

Configure Agent.png

 

When installation is complete, all that remains is to configure the agent by typing 'service swiagentd init'. This will bring up the Linux Agent configuration settings where you will need to provide the IP, hostname, or FQDN of the Orion server or Additional Polling Engine (option #2). Then enter your Orion 'admin' or equivalent credentials used to login to the Orion web interface (Options 4 & 5). When complete, save your changes and exit (Option 7). This will then register the agent with the Orion server and begin managing it as a node.

 

Add Node WizardPull Deployment
Deployment via Repository Manual Installation

Powershell Exit Code 1 = Get Output Failed

$
0
0

I having a tough time getting my script to correctly report the exit code to SAM 6.1.1. I have read The Basics of PowerShell (part 3) and I am using the required Message, Statistic and Exit codes. My script Telnets to a server (using a built telnet function, grabs a payload and verifies that its valid. I want 3 exit codes, 0 - Payload is confirmed (Up), 3 - Payload is invalid (Critical), 1 - Any error & failed to connect (Down). Exit codes 0 and 3 report correctly but 1 will not.

 

$error.clear()

$ErrorActionPreference= 'silentlycontinue'

#This builds a custom telnet function from http://community.spiceworks.com/scripts/show/1887-get-telnet-telnet-to-a-device-and-issue-commands

Function Get-Telnet

{  Param (

        [Parameter(ValueFromPipeline=$true)]

        [String[]]$Commands = @(""),

        [string]$RemoteHost = "",

        [string]$Port = "",

        [int]$WaitTime = 1000,

        [string]$OutputPath = ""

    )

    #Attach to the remote device, setup streaming requirements

    $Socket = New-Object System.Net.Sockets.TcpClient($RemoteHost, $Port)

    If ($Socket)

    {  $Stream = $Socket.GetStream()

        $Writer = New-Object System.IO.StreamWriter($Stream)

        $Buffer = New-Object System.Byte[] 1024

        $Encoding = New-Object System.Text.AsciiEncoding

 

        #Now start issuing the commands

        ForEach ($Command in $Commands)

        {  $Writer.WriteLine($Command)

            $Writer.Flush()

            Start-Sleep -Milliseconds $WaitTime

        }

        #All commands issued, but since the last command is usually going to be

        #the longest let's wait a little longer for it to finish

        Start-Sleep -Milliseconds ($WaitTime * 4)

        $Result = ""

        #Save all the results

        While($Stream.DataAvailable)

        {  $Read = $Stream.Read($Buffer, 0, 1024)

            $Result += ($Encoding.GetString($Buffer, 0, $Read))

        }

    }

    Else 

    {  $Result = "Unable to connect to host: $($RemoteHost):$Port"

    }

    #Done, now save the results to a file

    $Result | Out-File $OutputPath

}

#This clears the content of the output file so its not mistakenly read from previous telnet test

Clear-Content "F:\scripts\EDITelnetOutput.txt"

 

#This will telnet to the remote server and issue the command to check the payload. The output must be written to a file because get-telnet cmdlet sucks.

Get-Telnet -RemoteHost ${IP} -Port "3575" -Commands "Blah" -OutputPath "F:\scripts\TelnetOutput.txt"

$TelnetOutput = get-content "F:\scripts\TelnetOutput.txt"

 

#This looks for "Keyword" in the payload. If found then status is 1 and application is considered functional.

IF ($error)

{

Write-Host "Statistic: 1"

Write-Host "Message: $($error[0])"

    Exit 1

  }

IF ($TelnetOutput -match "Keyword") {

        Write-Host "Statistic: 0"

        Write-Host "Message: Valid Payload"

      Exit 0

    }

    ELSE {

        Write-Host "Statistic: 3"

        Write-Host "Message: Invalid Payload"

        Exit 3

          }


Tests for Exit 0 and 3 are successful with proper output. Test for exit 1 is done by turning off the application. The port is closed and the telnet connection will fail. Below is the error that I get for Exit 1 "Get Output Failed". It should result in Exit 1 - Component is Down.

GetOutPut.JPG

 

I found that if I changed "Exit 1" any other number value it will run successfully. Something is wrong with Exit 1.

Viewing all 3454 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>