View Source

Understanding the Datalake

The Armor data lake is a centralized repository for storing Armor collected data. With regards to vulnerabilities, the data lake contains all the data for every report created for an environment and all the historical data from when the reports are run. This can be a lot of data so narrowing down the scope of information is critical to making sense of it all.

Accessing the Datalake

Users can access the datalake in two ways:

Data Presentation

Data consists of documents stored in the datalake. Each document contains all the data related to that particular rule and resource. Below are examples of the table and JSON views:

The schema for these documents is based on Elastic Common Schema, please refer to the below links for the details and explanation of the fields:

Vulnerability schema - https://www.elastic.co/guide/en/ecs/1.5/ecs-vulnerability.html

Custom Fields:

vulnerability.published - the date an entry for the vulnerability was given a CVE
vulnerability.results - the criteria used to determine the presence of the vulnerability
vulnerability.cve - contains a link to the vulnerability's entry in the CVE database
vulnerability.solution - provides instructions, if any exist, for remediating the vulnerability
vulnerability.status - lists New if it is the first time a vulnerability is detected by a scan; Active if the vulnerability was been detected by two or more scans; Fixed if the vulnerability was detected in the previous scan but the most recent scan shows it as fixed; and Re-Opened if the vulnerability was verified fixed previously but is no longer so
vulnerability.first_found - the date of the first scan in which the vulnerability was detected for a given server
vulnerability.last_found - the date of the most recent scan in which the vulnerability was detected for a given server
vulnerability.discovery - indicates whether the vulnerability was discovered through remote and/or authenticated scanning
vulnerability.pci_flag - a flag that indicates whether the vulnerability must be fixed to pass PCI compliance
vulnerability.patchable - contains a 1 if the vulnerability can be patched and a 0 if no patches currently exist for it
vulnerability.last_modification - the date of the vulnerability attributes' (title, severity level, patch availability, CVSS scores, PCI relevance, etc.) last modification
vulnerability.diagnosis - gives information about the technical details of the vulnerability, affected packages, severity scoring, and detection
vulnerability.vulnerability_type - indicates whether the detection was a potential vulnerability (vulnerabilities that cannot be fully verified but have at least one necessary condition for the vulnerability) or a vulnerability (the vulnerability can be fully verified)
vulnerability.consequences - provides information about the access an attacker who successfully exploits the vulnerability might gain

Helpful Fields for Searching the Datalake

Field	Filter By
vulnerability.category	The type of system or architecture that the vulnerability affects. See https://qualysguard.qualys.com/qwebhelp/fo_portal/knowledgebase/vulnerability_categories.htm#P for a listing of potential categories
vulnerability.severity	1 through 5
host.hostname	the hostname of any servers in your account
vulnerability.report_id	a scan ID that can be used to show only the vulnerabilities associated with a specific scan

Adding a Filter

To add additional filters, click on the Add Filter Button.

Knowledge Base > Using the Datalake for Vulnerabilities > datalake add filter.png

Then set the field to one of the helpful fields above, select the operator, put in the value and hit save. The data is now filtered on a specific reportId, rPolicy or other field selected.

Viewing Datalake Aggregations

Please refer to Reports for custom aggregations, visualizations and custom reports.