Web application firewall (WAF)
Security, as one of the top priorities, cannot rely on merely a single service.
AWS WAF
Amazon Athena
Amazon Route 53
Amazon CloudFront
Terraform
The main idea and reason behind using any kind of firewalls is that, as soon as the project reaches certain level, it begins to attract more and more audience, and that includes attackers whose purpose may be to cause harm by finding various kinds of vulnerabilities, including database vulnerabilities, cross-site scripting, HTTP flood and many others. Unfortunately, the list is almost endless.
AWS services
Amazon Web Services has a number of products that are capable of countering these kinds of threats, AWS Network Firewall and AWS Web Application Firewall to name but two. The main difference between them, among many others, lies in the number of OSI layers, 3-4 and 7, respectively. AWS WAF analyzes communications between external users and web application by blocking malicious requests before they reach users or web application, and can be associated with resources such as Application Load Balancer, API Gateway, AWS AppSync and CloudFront distributions.

AWS WAF contains various kinds of rules (managed rule groups, own rules and rule groups) and actions that can be potentially applied (allow, block, count). In our project, we decided to use AWS Managed Rules, such as AWSManagedRulesSQLiRuleSet
, AWSManagedRulesCommonRuleSet
, AWSManagedRulesAmazonIpReputationList
, AWSManagedRulesKnownBadInputsRuleSet
, as well as our own rules for rate limits. Additionally, AWS Managed Rules include many other subrules, i.e. AWSManagedRulesCommonRuleSet
also contain rules against cross-site scripting, size restrictions, bad bots etc.
Using Terraform
Undoubtedly and as a matter of good practice, it’s better to start writing any used infrastructure as code in the first place.
Example of Terraform code
resource "aws_wafv2_web_acl" "this" {
name = var.web_acl_name
description = var.web_acl_description
scope = var.scope
default_action {
allow {}
}
// custom rule based on waf rule group
rule {
name = "user_defined_rules"
priority = 1
override_action {
count {}
}
statement {
rule_group_reference_statement {
arn = aws_wafv2_rule_group.custom_rules_group.arn
}
}
visibility_config {
cloudwatch_metrics_enabled = var.metrics_enabled
sampled_requests_enabled = var.metrics_enabled
metric_name = "custom_xss_rule"
}
}
// managed rules based on managed-rules variable
dynamic "rule" {
for_each = var.managed_rules
iterator = object
content {
name = lookup(object.value, "name")
priority = lookup(object.value, "priority")
override_action {
dynamic "count" {
for_each = lookup(object.value, "override_action", {}) == "count" ? [1] : []
content {}
}
dynamic "none" {
for_each = lookup(object.value, "override_action", {}) == "none" ? [1] : []
content {}
}
}
statement {
managed_rule_group_statement {
name = lookup(object.value, "name")
vendor_name = "AWS"
}
}
visibility_config {
cloudwatch_metrics_enabled = var.metrics_enabled
sampled_requests_enabled = var.metrics_enabled
metric_name = "metric-name-${lookup(object.value, "name")}"
}
}
}
// rate based rules
dynamic "rule" {
for_each = var.rate_based_rules
iterator = object
content {
name = lookup(object.value, "name")
priority = lookup(object.value, "priority")
action {
dynamic "count" {
for_each = lookup(object.value, "action", {}) == "count" ? [1] : []
content {}
}
dynamic "block" {
for_each = lookup(object.value, "action", {}) == "block" ? [1] : []
content {}
}
}
statement {
rate_based_statement {
limit = lookup(object.value, "limit")
aggregate_key_type = "IP"
scope_down_statement {
byte_match_statement {
field_to_match {
uri_path {}
}
positional_constraint = "CONTAINS"
search_string = lookup(object.value, "search_string")
text_transformation {
priority = 0
type = "NONE"
}
}
}
}
}
visibility_config {
cloudwatch_metrics_enabled = var.metrics_enabled
sampled_requests_enabled = var.metrics_enabled
metric_name = "rate-based-${lookup(object.value, "name")}"
}
}
}
tags = var.tags
visibility_config {
cloudwatch_metrics_enabled = var.metrics_enabled
metric_name = var.web_acl_metric_name
sampled_requests_enabled = var.metrics_enabled
}
}
Typical logs flow
It should also be noted that the use of AWS WAF in real conditions on large projects is a rather time-consuming iterative process, and usually, in this case the blame falls on false positives, can’t be implemented ‘out of the box’. The most common practice is implementation according to the following scheme - collect logs in account mode, analyze them and correct the AWS WAF rules based on that analysis. The collection of logs is carried out over a certain period of time which depends on many factors, including traffic.


Everything depends on analysis
After the logs get into AWS S3, one of the options for a quite effective analysis is using AWS Athena. This service allows you to create atable from data in a bucket and use SQL queries against it.
Example of logs received from AWS WAF:
{
"timestamp": 1612420137433,
"formatVersion": 1,
"webaclId": "***************",
"terminatingRuleId": "Default_Action",
"terminatingRuleType": "REGULAR",
"action": "ALLOW",
"terminatingRuleMatchDetails": [],
"httpSourceName": "CF",
"httpSourceId": "****************",
"ruleGroupList": [
{
"ruleGroupId": ""****************"",
"terminatingRule": null,
"nonTerminatingMatchingRules": [],
"excludedRules": null
},
{
"ruleGroupId": "AWS#AWSManagedRulesSQLiRuleSet",
"terminatingRule": null,
"nonTerminatingMatchingRules": [],
"excludedRules": null
},
{
"ruleGroupId": "AWS#AWSManagedRulesCommonRuleSet",
"terminatingRule": {
"ruleId": "GenericRFI_BODY",
"action": "BLOCK",
"ruleMatchDetails": null
},
"nonTerminatingMatchingRules": [],
"excludedRules": null
},
{
"ruleGroupId": "AWS#AWSManagedRulesAmazonIpReputationList",
"terminatingRule": null,
"nonTerminatingMatchingRules": [],
"excludedRules": null
},
{
"ruleGroupId": "AWS#AWSManagedRulesKnownBadInputsRuleSet",
"terminatingRule": null,
"nonTerminatingMatchingRules": [],
"excludedRules": null
}
],
"rateBasedRuleList": [],
"nonTerminatingMatchingRules": [
{
"ruleId": "AWSManagedRulesCommonRuleSet",
"action": "COUNT",
"ruleMatchDetails": []
}
],
"requestHeadersInserted": null,
"responseCodeSent": null,
"httpRequest": {
"clientIp": "***************",
"country": "**",
"headers": [
{
"name": "user-agent",
"value": "ReactorNetty/0.9.12.RELEASE"
},
{
"name": "host",
"value": "***************"
},
{
"name": "Accept",
"value": "application/json"
},
{
"name": "Content-Type",
"value": "application/json"
},
{
"name": "content-length",
"value": "1317"
}
],
"uri": "***************",
"args": "",
"httpVersion": "HTTP/1.1",
"httpMethod": "POST",
"requestId": "***************"
}
}
AWS Athena table creation (from AWS documentation):
CREATE EXTERNAL TABLE `waf_logs`(
`timestamp` bigint,
`formatversion` int,
`webaclid` string,
`terminatingruleid` string,
`terminatingruletype` string,
`action` string,
`terminatingrulematchdetails` array<
struct<
conditiontype:string,
location:string,
matcheddata:array<string>
>
>,
`httpsourcename` string,
`httpsourceid` string,
`rulegrouplist` array<
struct<
rulegroupid:string,
terminatingrule:struct<
ruleid:string,
action:string,
rulematchdetails:string
>,
nonterminatingmatchingrules:array<
struct<
ruleid:string,
action:string,
rulematchdetails:array<
struct<
conditiontype:string,
location:string,
matcheddata:array<string>
>
>
>
>,
excludedrules:string
>
>,
`ratebasedrulelist` array<
struct<
ratebasedruleid:string,
limitkey:string,
maxrateallowed:int
>
>,
`nonterminatingmatchingrules` array<
struct<
ruleid:string,
action:string
>
>,
`httprequest` struct<
clientip:string,
country:string,
headers:array<
struct<
name:string,
value:string
>
>,
uri:string,
args:string,
httpversion:string,
httpmethod:string,
requestid:string
>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ('paths'='action,formatVersion,httpRequest,httpSourceId,httpSourceName,nonTerminatingMatchingRules,rateBasedRuleList,ruleGroupList,terminatingRuleId,terminatingRuleMatchDetails,terminatingRuleType,timestamp,webaclId')
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://athenawaflogs/WebACL/'
Sample SQL query for analysis
SELECT COUNT(httpRequest.clientIp) as count, httpRequest.clientIp,
ruleGroupList[rule_number].ruleGroupId as managed_group,
ruleGroupList[rule_number].terminatingRule.ruleId as rule_id,
httpRequest.headers[header_number].value as host,
httpRequest.uri as uri
FROM waf_logs_for_report
WHERE ruleGroupList[rule_number].terminatingRule.action='BLOCK'
GROUP BY httpRequest.clientIp, ruleGroupList[rule_number].ruleGroupId,
ruleGroupList[rule_number].terminatingRule.ruleId, httpRequest.headers[header_number].value,
httpRequest.uri
ORDER BY count
LIMIT 100;
Next steps
After such analysis, we can understand which sub rules gave the largest number of false positives, then correct them and repeat the process of logs collection and analysis. After several iterations, as soon as we are able to get rid of the overwhelming number of false positives, we can start the implementation in block mode while intensively monitoring the logs, so that in the event of any unforeseen situations, we can have a quick rollback.
In conclusion, it should be noted that security in the current environment should, generally, be one of the top priorities, and cannot be based on one service only. Rather, it should be a mix of services and best security practices, as this allows you to avoid negative consequences for the entire project as a whole.