hoegaarden/splunk-firehose-nozzle (2024)

hoegaarden/splunk-firehose-nozzle (1)

Splunk Nozzle

Cloud Foundry Firehose-to-Splunk Nozzle

Usage

Splunk nozzle is used to stream Cloud Foundry Firehose events to Splunk HTTP Event Collector. Using pre-defined Splunk sourcetypes, the nozzle automatically parses the events and enriches them with additional metadata before forwarding to Splunk. For detailed descriptions of each Firehose event type and their fields, refer to underlying dropsonde protocol. Below is a mapping of each Firehose event type to its corresponding Splunk sourcetype. Refer to Searching Events for example Splunk searches.

Firehose event typeSplunk sourcetypeDescription
Errorcf:errorAn Error event represents an error in the originating process
HttpStartStopcf:httpstartstopAn HttpStartStop event represents the whole lifecycle of an HTTP request
LogMessagecf:logmessageA LogMessage contains a "log line" and associated metadata
ContainerMetriccf:containermetricA ContainerMetric records resource usage of an app in a container
CounterEventcf:countereventA CounterEvent represents the increment of a counter
ValueMetriccf:valuemetricA ValueMetric indicates the value of a metric at an instant in time

In addition, logs from the nozzle itself are of sourcetype cf:splunknozzle.

Setup

The Nozzle requires a client with the authorities doppler.firehose and cloud_controller.admin_read_only (the latter is only required if ADD_APP_INFO is true) and grant-types client_credentials and refresh_token. If cloud_controller.admin_read_only is notavailable in the system, switch to use cloud_controller.admin.

You can either

  • Add the client manually using uaac
  • Add the client to the deployment manifest; see uaa.scim.users

Manifest example:

# Clientsuaa.clients: splunk-firehose: id: splunk-firehose override: true secret: splunk-firehose-secret authorized-grant-types: client_credentials,refresh_token authorities: doppler.firehose,cloud_controller.admin_read_only

uaac example:

uaac target https://uaa.[system domain url]uaac token client get admin -s [admin client credentials secret]uaac client add splunk-firehose --name splunk-firehoseuaac client add splunk-firehose --secret [your_client_secret]uaac client add splunk-firehose --authorized_grant_types client_credentials,refresh_tokenuaac client add splunk-firehose --authorities doppler.firehose,cloud_controller.admin_read_only

cloud_controller.admin_read_only will work for cf v241or later. Earlier versions should use cloud_controller.admin instead.

Environment Parameters

You can declare parameters by making a copy of the scripts/nozzle.sh.template.

  • DEBUG: Enable debug mode (forward to standard out instead of Splunk).

Cloud Foundry configuration parameters:

  • API_ENDPOINT: Cloud Foundry API endpoint address.
  • CLIENT_ID: UAA Client ID (Must have authorities and grant_types described above).
  • CLIENT_SECRET: Secret for Client ID.

Splunk configuration parameters:

  • SPLUNK_TOKEN: Splunk HTTP event collector token.
  • SPLUNK_HOST: Splunk HTTP event collector host. example: https://example.cloud.splunk.com:8088
  • SPLUNK_INDEX: The Splunk index events will be sent to. Warning: Setting an invalid index will cause events to be lost. This index must match one of the selected indexes for the Splunk HTTP event collector token used for the SPLUNK_TOKEN parameter.

Advanced Configuration Features:

  • JOB_NAME: Tags nozzle log events with job name.

  • JOB_INDEX: Tags nozzle log events with job index.

  • JOB_HOST: Tags nozzle log events with job host.

  • SKIP_SSL_VALIDATION_CF: Skips SSL certificate validation for connection to Cloud Foundry. Secure communications will not check SSL certificates against a trusted certificate authority.This is recommended for dev environments only.

  • SKIP_SSL_VALIDATION_SPLUNK: Skips SSL certificate validation for connection to Splunk. Secure communications will not check SSL certificates against a trusted certificate authority.This is recommended for dev environments only.

  • FIREHOSE_SUBSCRIPTION_ID: Tags nozzle events with a Firehose subscription id. See https://docs.pivotal.io/pivotalcf/1-11/loggregator/log-ops-guide.html.

  • FIREHOSE_KEEP_ALIVE: Keep alive duration for the Firehose consumer.

  • ADD_APP_INFO: Enrich raw data with app info. A comma separated list of app metadata (AppName,OrgName,OrgGuid,SpaceName,SpaceGuid).

  • IGNORE_MISSING_APP: If the application is missing, then stop repeatedly querying application info from Cloud Foundry.

  • MISSING_APP_CACHE_INVALIDATE_TTL: How frequently the missing app info cache invalidates (in s/m/h. For example, 3600s or 60m or 1h).

  • APP_CACHE_INVALIDATE_TTL: How frequently the app info local cache invalidates (in s/m/h. For example, 3600s or 60m or 1h).

  • ORG_SPACE_CACHE_INVALIDATE_TTL: How frequently the org and space cache invalidates (in s/m/h. For example, 3600s or 60m or 1h).

  • APP_LIMITS: Restrict to APP_LIMITS the most updated apps per request when populating the app metadata cache.

  • BOLTDB_PATH: Bolt database path.

  • EVENTS: A comma separated list of events to include. Possible values: ValueMetric,CounterEvent,Error,LogMessage,HttpStartStop,ContainerMetric

  • EXTRA_FIELDS: Extra fields to annotate your events with (format is key:value,key:value).

  • FLUSH_INTERVAL: Time interval (in s/m/h. For example, 3600s or 60m or 1h) for flushing queue to Splunk regardless of CONSUMER_QUEUE_SIZE. Protects against stale events in low throughput systems.

  • CONSUMER_QUEUE_SIZE: Sets the internal consumer queue buffer size. Events will be pushed to Splunk after queue is full.

  • HEC_BATCH_SIZE: Set the batch size for the events to push to HEC (Splunk HTTP Event Collector).

  • HEC_RETRIES: Retry count for sending events to Splunk. After expiring, events will begin dropping causing data loss.

  • HEC_WORKERS: Set the amount of Splunk HEC workers to increase concurrency while ingesting in Splunk.

  • ENABLE_EVENT_TRACING: Enables event trace logging. Splunk events will now contain a UUID, Splunk Nozzle Event Counts, and a Subscription-ID for Splunk correlation searches.

  • SPLUNK_VERSION: The Splunk version that determines how HEC ingests metadata fields. Only required for Splunk version 6.3 or below.

  • STATUS_MONITOR_INTERVAL: Time interval (in s/m/h. For example, 3600s or 60m or 1h) for monitoring memory queue pressure. Use to help with back-pressure insights. (Increases CPU load. Use for insights purposes only) Default is 0s (Disabled).

    Please note

    SPLUNK_VERSION configuration parameter is only required for Splunk version 6.3 and below.For Splunk version 6.3 or below, please deploy nozzle via CLI. Update nozzle_manifest.yml with splunk_version (For example: SPLUNK_VERSION: 6.3) as an env variable and deploy nozzle as an app via CLI.

    Tile only supports deployment for Splunk version 6.4 or above

Push as an App to Cloud Foundry

Push Splunk Firehose Nozzle as an application to Cloud Foundry. Please refer to Setup section for detailson user authentication.

  1. Download the latest release

    git clone https://github.com/cloudfoundry-community/splunk-firehose-nozzle.gitcd splunk-firehose-nozzle
  2. Authenticate to Cloud Foundry

    cf login -a https://api.[your cf system domain] -u [your id]
  3. Copy the manifest template and fill in needed values (using the credentials created during setup)

    vim .circleci/ci_nozzle_manifest.yml
  4. Push the nozzle

    make deploy-nozzle

Dump application info to boltdb

If in production there are lots of CF applications(say tens of thousands) and if the user would like to enrichapplication logs by including application meta data,querying all application metadata information from CF may take some time.For example if we include, add app name, space ID, space name, org ID and org name to the events.If there are multiple instances of Spunk nozzle deployed the situation will be even worse, since each of the Splunk nozzle(s) will query all applications meta data andcache the meta data information to the local boltdb file. These queries will introduce load to the CF system and could potentially take a long time to finish.Users can run this tool to generate a copy of all application meta data and copy this to each Splunk nozzle deployment. Each Splunk nozzle can pick up the cache copy and update the cache file incrementally afterwards.

Example of how to run the dump application info tool:

$ cd tools/dump_app_info$ go build dump_app_info.go$ ./dump_app_info --skip-ssl-validation --api-endpoint=https://<your api endpoint> --user=<api endpoint login username> --password=<api endpoint login password>

After populating the application info cache file, user can copy to different Splunk nozzle deployments and start Splunk nozzle to pick up this cache file byspecifying correct "--boltdb-path" flag or "BOLTDB_PATH" environment variable.

Index routing

Index routing is a feature that can be used to send different Cloud Foundry logs to different indexes for better ACL and data retention control in Splunk.

Per application index routing via application manifest

In your app manifest provide an environment variable called SPLUNK_INDEX and assign it the index you would like to send the app data to

applications:- name: console memory: 256M disk_quota: 256M host: console timeout: 180 buildpack: https://github.com/SUSE/stratos-buildpack health-check-type: port env: SPLUNK_INDEX: testing_index

Index routing via Splunk configuration

Logs can be routed using fields such as app ID/name, space ID/name or org ID/name.Users can configure the Splunk configuration files props.conf and transforms.conf on Splunk indexers or Splunk Heavy Forwarders if deployed.

Below are few sample configuration:

1. Route data from application ID 95930b4e-c16c-478e-8ded-5c6e9c5981f8 to a Splunk prod index:

$SPLUNK_HOME/etc/system/local/props.conf

[cf:logmessage]TRANSFORMS-index_routing = route_data_to_index_by_field_cf_app_id

$SPLUNK_HOME/etc/system/local/transforms.conf

[route_data_to_index_by_field_cf_app_id]REGEX = "(\w+)":"95930b4e-c16c-478e-8ded-5c6e9c5981f8"DEST_KEY = _MetaData:IndexFORMAT = prod

2. Routing application logs from any Cloud Foundry orgs whose names are prefixed with sales to a Splunk sales index.

$SPLUNK_HOME/etc/system/local/props.conf

[cf:logmessage]TRANSFORMS-index_routing = route_data_to_index_by_field_cf_org_name

$SPLUNK_HOME/etc/system/local/transforms.conf

[route_data_to_index_by_field_cf_org_name]REGEX = "cf_org_name":"(sales.*)"DEST_KEY = _MetaData:IndexFORMAT = sales

3. Routing data from sourcetype cf:splunknozzle to index new_index:

$SPLUNK_HOME/etc/system/local/props.conf

[cf:splunknozzle]TRANSFORMS-route_to_new_index = route_to_new_index

$SPLUNK_HOME/etc/system/local/transforms.conf

[route_to_new_index]SOURCE_KEY = MetaData:SourcetypeDEST_KEY =_MetaData:IndexREGEX = (sourcetype::cf:splunknozzle)FORMAT = new_index

Troubleshooting

This topic describes how to troubleshoot Splunk Firehose Nozzle for Cloud Foundry.

1. I can't find my data!

Are you searching for events and not finding them or looking at a dashboard and seeing "No result found"? Check Splunk Nozzle app logs.

To view the nozzle's logs running on CF do the following:

  1. Log in as an admin via the CLI.
  2. Target the org created by the tile.
    cf target -o SPLUNK-NOZZLE-ORG
  3. View the recent app Splunk Nozzle logs (the version number installed by the tile will vary).
    cf logs --recent splunk-firehoze-nozzle
  4. Alternatively, you can stream the app logs as they're emitted.
    cf logs splunk-firehose-nozzle

Here are a few common errors and possible resolutions:

Splunk configuration related errors:

{"timestamp":"","source":"splunk-nozzle-logger","message":"splunk-nozzle-logger.Unable to talk to Splunk","log_level":2,"data":{"error":"Post http://localhost:8088/services/collector: read tcp 10.0.0.0:62931-\u003elocalhost:8088: read: connection reset by peer"}}

This error usually occurs when SSL is enabled on the Splunk HEC endpoint. Confirm that you're using https' in the Splunk HEC URL.

{"timestamp":"","source":"splunk-nozzle-logger","message":"splunk-nozzle-logger.Unable to talk to Splunk","log_level":2,"data":{"error":"Non-ok response code [400] from splunk: {\"text\":\"Incorrect index\",\"code\":7,\"invalid-event-number\":1}"}}

This usually means the index value specified in the configuration doesn't exist on Splunk Host. Confirm that you're using the correct Splunk index value.

{"timestamp":"","source":"splunk-nozzle-logger","message":"splunk-nozzle-logger.Unable to talk to Splunk","log_level":2,"data":{"error":"Non-ok response code [403] from splunk: {\"text\":\"Invalid token\",\"code\":4}"}}

This can occur when the Splunk HEC Token value is invalid. Confirm that you're using a valid token.

{"timestamp":"","source":"splunk-nozzle-logger","message":"splunk-nozzle-logger.Unable to talk to Splunk","log_level":2,"data":{"error":"Post https://localhost:8088/services/collector: x509: cannot validate certificate for localhost because it doesn't contain any IP SANs"}}

This usually means that there was no valid SSL certificate found. Confirm that you're using a valid SSL certificate for the Splunk server, or set 'Skip SSL Validation' to true under Splunk settings.

Note:Disabling SSL validation is not recommended for production environments.

{"timestamp":"","source":"splunk-nozzle-logger","message":"splunk-nozzle-logger.Unable to talk to Splunk","log_level":2,"data":{"error":"Post https://localhost:8088/services/collector: dial tcp localhost:8088: getsockopt: connection refused"}}

This error can occur when the Splunk server is offline or when the Splunk HEC URL is not valid. Confirm that both the Splunk server is running and that you're using a valid URL.

Cloud Foundry configuration related errors:

{"timestamp":"","source":"splunk-nozzle-logger","message":"splunk-nozzle-logger.Failed to run splunk-firehose-nozzle","log_level":2,"data":{"error":"Error getting token: oauth2: cannot fetch token: 401 Unauthorized\nResponse: {\"error\":\"unauthorized\",\"error_description\":\"Bad credentials\"}"}}

This error can occur when the credentials provided for CF environment are invalid. Confirm that the API User and API Password each have access to the CF environment.

{"timestamp":"","source":"splunk-nozzle-logger","message":"splunk-nozzle-logger.Failed to run splunk-firehose-nozzle","log_level":2,"data":{"error":"Could not get api /v2/info: Get https://api.cfendpoint.com/v2/info: x509: certificate signed by unknown authority"}}

This means that no valid SSL certificate was found. To remediate this error, provide a valid SSL certificate for Cloud Foundry or set 'Skip SSL Validation' to true under Cloud Foundry Settings.

Note:Disabling SSL validation is not recommended for production environments.

The following troubleshooting tips assume you have access to Splunk to run basic searches against index _internal and the user-specified index for Firehose events.

2. Ensure Splunk Nozzle is forwarding events from the Firehose:

Search app logs of the Nozzle to confirm correct behavior:

sourcetype="cf:splunknozzle"

A correct setup logs a start message with configuration parameters of the Nozzle logged as a JSON object, for example:

 data:{ add-app-info: AppName,OrgName,OrgGuid,SpaceName,SpaceGuid api-endpoint: https://api.endpoint.com app-cache-ttl: 0 app-limits: 0 batch-size: 1000 boltdb-path: cache.db branch: null buildos: null commit: null debug: false extra-fields: flush-interval: 5000000000 hec-workers: 8 ignore-missing-apps: true job-host: job-index: -1 job-name: splunk-nozzle keep-alive: 25000000000 missing-app-cache-ttl: 0 queue-size: 10000 retries: 2 skip-ssl: true splunk-host: http://localhost:8088 splunk-index: atomic splunk-version: 6.6 subscription-id: splunk-firehose trace-logging: true status-monitor-interval: 0s version: wanted-events: ValueMetric,CounterEvent,Error,LogMessage,HttpStartStop,ContainerMetric } ip: 10.0.0.0 log_level: 1 logger_source: splunk-nozzle-logger message: splunk-nozzle-logger.Running splunk-firehose-nozzle with following configuration variables origin: splunk_nozzle

Search app logs of the Nozzle for any errors:

sourcetype="cf:splunknozzle" data.error=*

Errors are logged with corresponding message and stacktrace.

3. Check for dropped events due to HTTP Event Collector availability:

As the Splunk Firehose Nozzle sends data to Splunk via HTTPS using the HTTP Event Collector, it is also susceptible to any network issues across the network path from point to point. Run the following search to determine if Splunk has indexed any events indicating issues with the HEC Endpoint.

 sourcetype="cf:splunknozzle" "dropping events"

4. Check for data loss inside the Splunk Firehose Nozzle:

If "Event Tracing" is enabled, extra metadata will be attached to events. This allows searches to calculate the percentage of data loss inside the Splunk Firehose Nozzle, if applicable.

Each instance of the Splunk Firehose Nozzle will run with a randomly generated UUID. The query below will display the message success rate for each UUID (Please update the index value based on your nozzle configuration).

index=main | stats count as total_events, min(nozzle-event-counter) as min_number, max(nozzle-event-counter) as max_number by uuid | eval event_number = max_number - min_number | eval success_percentage = total_events/event_number*100 | stats max(success_percentage) by uuid

Searching Events

Here are two short Splunk queries to start exploring some of the Cloud Foundry events in Splunk.

sourcetype="cf:valuemetric" | stats avg(value) by job_instance, name
sourcetype="cf:counterevent" | eval job_and_name=source+"-"+name | stats values(job_and_name)

Development

Software Requirements

Make sure you have the following installed on your workstation:

SoftwareVersion
gogo1.12.x
glide0.12.x

Then install all dependent packages via Glide:

$ cd <REPO_ROOT_DIRECTORY>$ make installdeps

Environment

For development against bosh-lite,copy tools/nozzle.sh.template to tools/nozzle.sh and supply missing values:

$ cp script/dev.sh.template tools/nozzle.sh$ chmod +x tools/nozzle.sh

Build project:

$ make VERSION=1.2.0

Run tests with Ginkgo

$ ginkgo -r

Run all kinds of testing

$ make test # run all unittest$ make race # test if there is race condition in the code$ make vet # examine GoLang code$ make cov # code coverage test and code coverage html report

Or run all testings: unit test, race condition test, code coverage etc

$ make testall

Run app

# this will run: go run main.go$ ./tools/nozzle.sh

Splunk Firehose Nozzle project is supported through Splunk Support assuming the customer has a current Splunk support entitlement. For customers that do not have a current Splunk support entitlement, please file an issue at create a new issue

hoegaarden/splunk-firehose-nozzle (2024)

References

Top Articles
Latest Posts
Article information

Author: Cheryll Lueilwitz

Last Updated:

Views: 6287

Rating: 4.3 / 5 (74 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Cheryll Lueilwitz

Birthday: 1997-12-23

Address: 4653 O'Kon Hill, Lake Juanstad, AR 65469

Phone: +494124489301

Job: Marketing Representative

Hobby: Reading, Ice skating, Foraging, BASE jumping, Hiking, Skateboarding, Kayaking

Introduction: My name is Cheryll Lueilwitz, I am a sparkling, clean, super, lucky, joyous, outstanding, lucky person who loves writing and wants to share my knowledge and understanding with you.