{"id":8571,"date":"2024-01-22T19:38:19","date_gmt":"2024-01-22T19:38:19","guid":{"rendered":"https:\/\/www.satup.xyz\/index.php\/2024\/01\/22\/designing-effective-data-security-schemas-by-renae-kang-jan-2024\/"},"modified":"2024-01-22T19:38:19","modified_gmt":"2024-01-22T19:38:19","slug":"designing-effective-data-security-schemas-by-renae-kang-jan-2024","status":"publish","type":"post","link":"https:\/\/www.satup.xyz\/index.php\/2024\/01\/22\/designing-effective-data-security-schemas-by-renae-kang-jan-2024\/","title":{"rendered":"Designing Effective Data Security Schemas | by Renae Kang | Jan, 2024"},"content":{"rendered":"<p><br \/>\n<\/p>\n<div>\n<figure class=\"nq nr ns nt nu nv nn no paragraph-image\">\n<div role=\"button\" tabindex=\"0\" class=\"nw nx fg ny bg nz\">\n<div class=\"nn no np\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*nimiEMPSVnUk7ANar1Zxog.jpeg 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*nimiEMPSVnUk7ANar1Zxog.jpeg 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*nimiEMPSVnUk7ANar1Zxog.jpeg 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*nimiEMPSVnUk7ANar1Zxog.jpeg 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*nimiEMPSVnUk7ANar1Zxog.jpeg 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*nimiEMPSVnUk7ANar1Zxog.jpeg 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*nimiEMPSVnUk7ANar1Zxog.jpeg 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" type=\"image\/webp\"\/><source data-testid=\"og\" srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*nimiEMPSVnUk7ANar1Zxog.jpeg 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*nimiEMPSVnUk7ANar1Zxog.jpeg 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*nimiEMPSVnUk7ANar1Zxog.jpeg 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*nimiEMPSVnUk7ANar1Zxog.jpeg 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*nimiEMPSVnUk7ANar1Zxog.jpeg 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*nimiEMPSVnUk7ANar1Zxog.jpeg 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*nimiEMPSVnUk7ANar1Zxog.jpeg 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"\/><img fetchpriority=\"high\" alt=\"\" class=\"bg mv oa c\" width=\"700\" height=\"467\" loading=\"eager\" role=\"presentation\"\/><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"d681\" class=\"pw-post-body-paragraph ob oc gr od b hp oe of og hs oh oi oj ok ol om on oo op oq or os ot ou ov ow gk bj\">Cybersecurity tools generate a vast amount of logging data, each with its own unique data definitions and field names. Meanwhile, enterprises often collect logs from different sources, such as network devices, servers, applications, and user activity. These logs can be massive in size, reaching petabytes or even exabytes, and they often contain a wide variety of fields with varying definitions and data types. In fact, Adobe collects tens of terabytes of security tooling and related log data every day.<\/p>\n<p id=\"c2d8\" class=\"pw-post-body-paragraph ob oc gr od b hp oe of og hs oh oi oj ok ol om on oo op oq or os ot ou ov ow gk bj\">That\u2019s why data correlation is essential to finding the usable nuggets of information that help us make the best security control decisions possible. In short, it helps us pinpoint the actionable \u201cneedles in our logging haystack.\u201d Though analyzing individual datasets is relatively straightforward, correlating data across multiple logs can pose a significant challenge because of the heterogeneity of field names and data formats across different logs.<\/p>\n<p id=\"2eea\" class=\"pw-post-body-paragraph ob oc gr od b hp oe of og hs oh oi oj ok ol om on oo op oq or os ot ou ov ow gk bj\">Field normalization is a crucial step in enabling effective correlation. As such, one of the most important steps for identifying misconfigurations, attacks paths, and vulnerabilities across the systems is to develop a standardized security data schema. In this blog post, I will explain the importance of field normalization and share tangible methods to address this data schema problem effectively.<\/p>\n<p id=\"0c64\" class=\"pw-post-body-paragraph ob oc gr od b hp oe of og hs oh oi oj ok ol om on oo op oq or os ot ou ov ow gk bj\"><strong class=\"od gs\">Normalizing Fields for Better Analysis<\/strong><\/p>\n<p id=\"598f\" class=\"pw-post-body-paragraph ob oc gr od b hp oe of og hs oh oi oj ok ol om on oo op oq or os ot ou ov ow gk bj\">Field normalization is a widely accepted method used to homogenize the name, definition, and data type of a particular field across different data sets. While investigating security incidents, it\u2019s important to analyze data from multiple logs to quickly identify the root causes of attacks.<\/p>\n<p id=\"b61f\" class=\"pw-post-body-paragraph ob oc gr od b hp oe of og hs oh oi oj ok ol om on oo op oq or os ot ou ov ow gk bj\">The benefits of field normalization include:<\/p>\n<ul class=\"\">\n<li id=\"c62d\" class=\"ob oc gr od b hp oe of og hs oh oi oj ok ol om on oo op oq or os ot ou ov ow ox oy oz bj\"><strong class=\"od gs\">Data Integration<\/strong>: Field normalization streamlines the integration of logs from various sources by standardizing field names and its definitions, eliminating the need for intricate transformations during the analysis.<\/li>\n<li id=\"9b1f\" class=\"ob oc gr od b hp pa of og hs pb oi oj ok pc om on oo pd oq or os pe ou ov ow ox oy oz bj\"><strong class=\"od gs\">Improved Correlation<\/strong>: Inconsistent data formats can obscure valuable relationships in logs, hindering pattern detection, and correlation ability. Normalization helps make it easier to detect patterns and identify potential attacks or anomalies.<\/li>\n<li id=\"eb7e\" class=\"ob oc gr od b hp pa of og hs pb oi oj ok pc om on oo pd oq or os pe ou ov ow ox oy oz bj\"><strong class=\"od gs\">Better Productivity<\/strong>: Standardized field names and data formats simplify analysis and reduce complexity. Our researchers and incident response teams can focus on extracting insights without being stalled by data inconsistencies.<\/li>\n<li id=\"65ba\" class=\"ob oc gr od b hp pa of og hs pb oi oj ok pc om on oo pd oq or os pe ou ov ow ox oy oz bj\"><strong class=\"od gs\">Data Error Reduction<\/strong>: Consistent data representations prevent errors from data misinterpretation, ensuring accurate and reliable analysis. Error-checking can be incorporated in the schema pipeline.<\/li>\n<li id=\"655a\" class=\"ob oc gr od b hp pa of og hs pb oi oj ok pc om on oo pd oq or os pe ou ov ow ox oy oz bj\"><strong class=\"od gs\">Data Portability<\/strong>: Standardized field names and formats promote seamless collaboration among different security teams, enabling efficient data sharing and enhancing incident response and threat mitigation strategies.<\/li>\n<\/ul>\n<p id=\"ee52\" class=\"pw-post-body-paragraph ob oc gr od b hp oe of og hs oh oi oj ok ol om on oo op oq or os ot ou ov ow gk bj\"><strong class=\"od gs\">Designing Base Classes<\/strong><\/p>\n<p id=\"eede\" class=\"pw-post-body-paragraph ob oc gr od b hp oe of og hs oh oi oj ok ol om on oo op oq or os ot ou ov ow gk bj\">After conducting comprehensive research on Adobe\u2019s in-house datasets and industry-standard security stack tools, our team developed an extensible modular framework for defining variables within the data schema.<\/p>\n<p id=\"5b11\" class=\"pw-post-body-paragraph ob oc gr od b hp oe of og hs oh oi oj ok ol om on oo op oq or os ot ou ov ow gk bj\">In our approach, we thoroughly researched several security data sources, meticulously identifying common security challenges that enterprises face, such as cloud misconfigurations, container security issues, network attack patterns, and identity misuse. Once we pinpointed the key focus areas, we designed <em class=\"pf\">base classes<\/em>, or a logical construct, tailored to these specific needs. These base classes serve as the building blocks, each evolving to encompass multiple objects, variables, or column names. The flexibility of this approach allows seamless transformation of logs, adapting to the various data lakes and SIEM solutions. In the following table, we present a selection of sample base classes:<\/p>\n<figure class=\"nq nr ns nt nu nv nn no paragraph-image\">\n<div role=\"button\" tabindex=\"0\" class=\"nw nx fg ny bg nz\">\n<div class=\"nn no pg\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*V989s2K4n058o6nAKEK2Ow.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*V989s2K4n058o6nAKEK2Ow.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*V989s2K4n058o6nAKEK2Ow.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*V989s2K4n058o6nAKEK2Ow.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*V989s2K4n058o6nAKEK2Ow.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*V989s2K4n058o6nAKEK2Ow.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*V989s2K4n058o6nAKEK2Ow.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" type=\"image\/webp\"\/><source data-testid=\"og\" srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*V989s2K4n058o6nAKEK2Ow.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*V989s2K4n058o6nAKEK2Ow.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*V989s2K4n058o6nAKEK2Ow.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*V989s2K4n058o6nAKEK2Ow.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*V989s2K4n058o6nAKEK2Ow.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*V989s2K4n058o6nAKEK2Ow.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*V989s2K4n058o6nAKEK2Ow.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"\/><img loading=\"lazy\" alt=\"\" class=\"bg mv oa c\" width=\"700\" height=\"304\" loading=\"lazy\" role=\"presentation\"\/><\/picture><\/div>\n<\/div><figcaption class=\"ph fc pi nn no pj pk be b bf z dt\">Base Class Definitions<\/figcaption><\/figure>\n<p id=\"8a73\" class=\"pw-post-body-paragraph ob oc gr od b hp oe of og hs oh oi oj ok ol om on oo op oq or os ot ou ov ow gk bj\">Here is a logical illustration of what a base class might look like:<\/p>\n<pre class=\"nq nr ns nt nu pl pm pn bo po ba bj\"><span id=\"8eb6\" class=\"pp pq gr pm b bf pr ps l pt pu\">{ <br\/>\"source\": { <br\/>\"number\": \"07\", <br\/>\"organizationName\": \"adobe\", <br\/>\"domain\": \"example.com\", <br\/>\"ip\": \"120.105.139.131\", <br\/>\"user\": { <br\/>\"fullName\": \"ABC XYZ\", <br\/>\"id\": \"00uxfurz4oTUMQKJPJUXPCFDRTT\" <br\/>}, <br\/>\"customField\": \"Custom Value\", <br\/>\"customField1\": \"Custom Value 1\", <br\/>\"custom_Field2\": [ <br\/>\"Value3\", <br\/>\"Value4\" <br\/>] <br\/>} <br\/>}<\/span><\/pre>\n<p id=\"bb07\" class=\"pw-post-body-paragraph ob oc gr od b hp oe of og hs oh oi oj ok ol om on oo op oq or os ot ou ov ow gk bj\">As you can see in above, it is possible to define custom variables within a specific class. We can also add additional metadata, such as data type, timestamp, log source, and data storage location to enable data comprehension and aid in further analysis.<\/p>\n<p id=\"a6a0\" class=\"pw-post-body-paragraph ob oc gr od b hp oe of og hs oh oi oj ok ol om on oo op oq or os ot ou ov ow gk bj\"><strong class=\"od gs\">Establishing Field Nomenclature<\/strong><\/p>\n<p id=\"1027\" class=\"pw-post-body-paragraph ob oc gr od b hp oe of og hs oh oi oj ok ol om on oo op oq or os ot ou ov ow gk bj\">Field nomenclature is vital in designing a security schema. In essence, field nomenclature refers to the naming conventions used for fields in a database or data lake. Consistent and meaningful field names are essential for effective correlation and analysis.<\/p>\n<p id=\"a689\" class=\"pw-post-body-paragraph ob oc gr od b hp oe of og hs oh oi oj ok ol om on oo op oq or os ot ou ov ow gk bj\">Here are some field nomenclature best practices that we follow at Adobe:<\/p>\n<ul class=\"\">\n<li id=\"5481\" class=\"ob oc gr od b hp oe of og hs oh oi oj ok ol om on oo op oq or os ot ou ov ow ox oy oz bj\">Field names should not start with a number or symbols.<\/li>\n<li id=\"3472\" class=\"ob oc gr od b hp pa of og hs pb oi oj ok pc om on oo pd oq or os pe ou ov ow ox oy oz bj\">Field names should be unique within a table.<\/li>\n<li id=\"c301\" class=\"ob oc gr od b hp pa of og hs pb oi oj ok pc om on oo pd oq or os pe ou ov ow ox oy oz bj\">Every field name should start with lower case; multiple words should be CamelCase. (Example: <em class=\"pf\">full name \u2014 -&gt; fullName<\/em>)<\/li>\n<li id=\"e9fd\" class=\"ob oc gr od b hp pa of og hs pb oi oj ok pc om on oo pd oq or os pe ou ov ow ox oy oz bj\">Any table-specific field names that don\u2019t directly map with base fields should follow the syntax: <em class=\"pf\">&lt;table_name&gt;_&lt;field_name&gt;<\/em> (Example: <em class=\"pf\">dns_account_id<\/em>)<\/li>\n<li id=\"cb80\" class=\"ob oc gr od b hp pa of og hs pb oi oj ok pc om on oo pd oq or os pe ou ov ow ox oy oz bj\">If fields have sub levels, the flattening should follow below syntax: <em class=\"pf\">source.user.fullName \u2192 source_user_fullName<\/em>. Source is Top Level Class Name. User is the first-level subfield within the class source. fullName is a second-level subfield within class source, first-level subfield within user struct.<\/li>\n<li id=\"9f3f\" class=\"ob oc gr od b hp pa of og hs pb oi oj ok pc om on oo pd oq or os pe ou ov ow ox oy oz bj\">Fields may (or may not) have sublevels. Non-nested field: <em class=\"pf\">source_number<\/em>. Nested field: <em class=\"pf\">souce_user_fullName<\/em><\/li>\n<li id=\"dd34\" class=\"ob oc gr od b hp pa of og hs pb oi oj ok pc om on oo pd oq or os pe ou ov ow ox oy oz bj\">When transforming column names, the object should be flattened within a base case using an underscore. (Example: <em class=\"pf\">source.number \u2192 souce_number<\/em>)<\/li>\n<\/ul>\n<p id=\"22d8\" class=\"pw-post-body-paragraph ob oc gr od b hp oe of og hs oh oi oj ok ol om on oo op oq or os ot ou ov ow gk bj\"><strong class=\"od gs\">Field Mapping to Data source<\/strong><\/p>\n<p id=\"698a\" class=\"pw-post-body-paragraph ob oc gr od b hp oe of og hs oh oi oj ok ol om on oo op oq or os ot ou ov ow gk bj\">In the table below, we\u2019ve outlined a sample field-level mapping across various datasets, all sharing a common definition.<\/p>\n<figure class=\"nq nr ns nt nu nv nn no paragraph-image\">\n<div role=\"button\" tabindex=\"0\" class=\"nw nx fg ny bg nz\">\n<div class=\"nn no pv\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*2_GkY8x0LMTuet0YqB1UvQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*2_GkY8x0LMTuet0YqB1UvQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*2_GkY8x0LMTuet0YqB1UvQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*2_GkY8x0LMTuet0YqB1UvQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*2_GkY8x0LMTuet0YqB1UvQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*2_GkY8x0LMTuet0YqB1UvQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*2_GkY8x0LMTuet0YqB1UvQ.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" type=\"image\/webp\"\/><source data-testid=\"og\" srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*2_GkY8x0LMTuet0YqB1UvQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*2_GkY8x0LMTuet0YqB1UvQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*2_GkY8x0LMTuet0YqB1UvQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*2_GkY8x0LMTuet0YqB1UvQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*2_GkY8x0LMTuet0YqB1UvQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*2_GkY8x0LMTuet0YqB1UvQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*2_GkY8x0LMTuet0YqB1UvQ.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"\/><img loading=\"lazy\" alt=\"\" class=\"bg mv oa c\" width=\"700\" height=\"284\" loading=\"lazy\" role=\"presentation\"\/><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"02c7\" class=\"pw-post-body-paragraph ob oc gr od b hp oe of og hs oh oi oj ok ol om on oo op oq or os ot ou ov ow gk bj\"><strong class=\"od gs\">Schematizing and Flattening Data<\/strong><\/p>\n<figure class=\"nq nr ns nt nu nv nn no paragraph-image\">\n<div role=\"button\" tabindex=\"0\" class=\"nw nx fg ny bg nz\">\n<div class=\"nn no pw\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*gTbrs81vE6sitpGnSnz20w.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*gTbrs81vE6sitpGnSnz20w.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*gTbrs81vE6sitpGnSnz20w.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*gTbrs81vE6sitpGnSnz20w.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*gTbrs81vE6sitpGnSnz20w.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*gTbrs81vE6sitpGnSnz20w.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*gTbrs81vE6sitpGnSnz20w.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" type=\"image\/webp\"\/><source data-testid=\"og\" srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*gTbrs81vE6sitpGnSnz20w.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*gTbrs81vE6sitpGnSnz20w.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*gTbrs81vE6sitpGnSnz20w.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*gTbrs81vE6sitpGnSnz20w.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*gTbrs81vE6sitpGnSnz20w.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*gTbrs81vE6sitpGnSnz20w.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*gTbrs81vE6sitpGnSnz20w.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"\/><img loading=\"lazy\" alt=\"\" class=\"bg mv oa c\" width=\"700\" height=\"239\" loading=\"lazy\" role=\"presentation\"\/><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"1821\" class=\"pw-post-body-paragraph ob oc gr od b hp oe of og hs oh oi oj ok ol om on oo op oq or os ot ou ov ow gk bj\">In this approach, the field normalization process consists of two main steps: <strong class=\"od gs\">Schematization<\/strong> and <strong class=\"od gs\">Flattening<\/strong>.<\/p>\n<ol class=\"\">\n<li id=\"9d1d\" class=\"ob oc gr od b hp oe of og hs oh oi oj ok ol om on oo op oq or os ot ou ov ow px oy oz bj\"><strong class=\"od gs\">Schematization<\/strong> involves identifying and extracting critical fields or variables of interest from raw logs. These fields are then transformed into a base field structured format, in which each field is mapped to its corresponding variable in the raw logs.<\/li>\n<li id=\"d178\" class=\"ob oc gr od b hp pa of og hs pb oi oj ok pc om on oo pd oq or os pe ou ov ow px oy oz bj\"><strong class=\"od gs\">Flattening<\/strong> takes the schematized data and further simplifies its structure by eliminating nested fields. Each variable within a base field is extracted and given its own unique field name, adhering to a nomenclature defined above by eliminating the hierarchical structure of the data and making the data easier for the end-user to query.<\/li>\n<\/ol>\n<p id=\"adf6\" class=\"pw-post-body-paragraph ob oc gr od b hp oe of og hs oh oi oj ok ol om on oo op oq or os ot ou ov ow gk bj\">Schematizing and flattening normalizes data, ensuring consistency in field names and formats. This prepared data enables deep analysis, uncovering insights and patterns related to security incidents or anomalies.<\/p>\n<p id=\"f121\" class=\"pw-post-body-paragraph ob oc gr od b hp oe of og hs oh oi oj ok ol om on oo op oq or os ot ou ov ow gk bj\"><strong class=\"od gs\">Table Structure in the Data Lake<\/strong><\/p>\n<p id=\"dc2f\" class=\"pw-post-body-paragraph ob oc gr od b hp oe of og hs oh oi oj ok ol om on oo op oq or os ot ou ov ow gk bj\">Once the schema is ready, you can divide the tables per event aggregation, this should be data-source- agnostic. All the tables will have common base fields, along with any table-specific fields.<\/p>\n<p id=\"2fb2\" class=\"pw-post-body-paragraph ob oc gr od b hp oe of og hs oh oi oj ok ol om on oo op oq or os ot ou ov ow gk bj\"><strong class=\"od gs\">Example<\/strong>: All the network logs, VPC flow logs, Azure logs, and Google Cloud Platform logs can reside in a common table. And, because the fields are mapped to the base fields, we can write complex correlations.<\/p>\n<figure class=\"nq nr ns nt nu nv nn no paragraph-image\">\n<div role=\"button\" tabindex=\"0\" class=\"nw nx fg ny bg nz\">\n<div class=\"nn no py\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*wybX9Xw_9qYZ2EIzGD8O3Q.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*wybX9Xw_9qYZ2EIzGD8O3Q.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*wybX9Xw_9qYZ2EIzGD8O3Q.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*wybX9Xw_9qYZ2EIzGD8O3Q.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*wybX9Xw_9qYZ2EIzGD8O3Q.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*wybX9Xw_9qYZ2EIzGD8O3Q.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*wybX9Xw_9qYZ2EIzGD8O3Q.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" type=\"image\/webp\"\/><source data-testid=\"og\" srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*wybX9Xw_9qYZ2EIzGD8O3Q.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*wybX9Xw_9qYZ2EIzGD8O3Q.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*wybX9Xw_9qYZ2EIzGD8O3Q.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*wybX9Xw_9qYZ2EIzGD8O3Q.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*wybX9Xw_9qYZ2EIzGD8O3Q.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*wybX9Xw_9qYZ2EIzGD8O3Q.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*wybX9Xw_9qYZ2EIzGD8O3Q.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"\/><img loading=\"lazy\" alt=\"\" class=\"bg mv oa c\" width=\"700\" height=\"261\" loading=\"lazy\" role=\"presentation\"\/><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"b4c1\" class=\"pw-post-body-paragraph ob oc gr od b hp oe of og hs oh oi oj ok ol om on oo op oq or os ot ou ov ow gk bj\"><strong class=\"od gs\">Writing Correlations on the Data Lake<\/strong><\/p>\n<p id=\"67d2\" class=\"pw-post-body-paragraph ob oc gr od b hp oe of og hs oh oi oj ok ol om on oo op oq or os ot ou ov ow gk bj\">Once you\u2019ve loaded data into tables, you can query the data using the specified syntax. From the user perspective, they only query the network events; internally, all possible network sources are schematized into a master table.<\/p>\n<p id=\"fc15\" class=\"pw-post-body-paragraph ob oc gr od b hp oe of og hs oh oi oj ok ol om on oo op oq or os ot ou ov ow gk bj\">The following query example helps identify all network events, regardless of cloud or datacenter origin, specifically focusing on SSH events that have been BLOCKED.<\/p>\n<pre class=\"nq nr ns nt nu pl pm pn bo po ba bj\"><span id=\"eef6\" class=\"pp pq gr pm b bf pr ps l pt pu\">SELECT * FROM network_master_table <br\/>WHERE event_time BETWEEN \u20182023-12-01' AND \u20182023-12-10\u2019 WHERE source_ip = \u201910.0.0.10\u2019 AND source_port = \u201822\u2019 AND event_outcome = \u2018BLOCKED\u2019<\/span><\/pre>\n<p id=\"16df\" class=\"pw-post-body-paragraph ob oc gr od b hp oe of og hs oh oi oj ok ol om on oo op oq or os ot ou ov ow gk bj\"><strong class=\"od gs\">Conclusion<\/strong><\/p>\n<p id=\"5b05\" class=\"pw-post-body-paragraph ob oc gr od b hp oe of og hs oh oi oj ok ol om on oo op oq or os ot ou ov ow gk bj\">Field normalization is crucial in enhancing security analysis by addressing the diversity in field names and data formats commonly found within security logs. By improving data integration, enhancing correlation, and reducing errors, field normalization facilitates more efficient data sharing among security teams. In turn, utilizing base classes, employing techniques such as schematization and flattening, and transforming raw logs all help enable more effective security analytics.<\/p>\n<\/div>\n<p><br \/>\n<br \/><a href=\"https:\/\/blog.developer.adobe.com\/designing-effective-data-security-schemas-d33af7b205e0?source=rss----9342990108af---4\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Cybersecurity tools generate a vast amount of logging data, each with its own unique data definitions and field names. Meanwhile, enterprises often collect logs from different sources, such as network devices, servers, applications, and user activity. These logs can be massive in size, reaching petabytes or even exabytes, and they often contain a wide variety [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":8572,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[19],"tags":[],"class_list":["post-8571","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-graphics-design"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.satup.xyz\/index.php\/wp-json\/wp\/v2\/posts\/8571","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.satup.xyz\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.satup.xyz\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.satup.xyz\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.satup.xyz\/index.php\/wp-json\/wp\/v2\/comments?post=8571"}],"version-history":[{"count":0,"href":"https:\/\/www.satup.xyz\/index.php\/wp-json\/wp\/v2\/posts\/8571\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.satup.xyz\/index.php\/wp-json\/wp\/v2\/media\/8572"}],"wp:attachment":[{"href":"https:\/\/www.satup.xyz\/index.php\/wp-json\/wp\/v2\/media?parent=8571"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.satup.xyz\/index.php\/wp-json\/wp\/v2\/categories?post=8571"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.satup.xyz\/index.php\/wp-json\/wp\/v2\/tags?post=8571"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}