October 24, 2018

Input Validation - OWASP Java HTML Sanitizer

Why Sanitizing?

If you perform good validation, you stopping attackers by getting in your application via the UI. The input validation strength is in the following order.

  1. White List. Accept only known good characters and send error back to user.
  2. Sanitizing. Accept only known good characters and silently remove others and proceed.
  3. Escape. Accept only known good characters and escape others and proceed.
  4. Black List. Accept everything accept predefined bad ones.

https://www.owasp.org/index.php/Input_Validation_Cheat_Sheet

OWASP Java HTML Sanitizer

OWASP has a free sanitizing library that has been tested thoroughly. https://www.owasp.org/index.php/OWASP_Java_HTML_Sanitizer_Project


<dependency>
 <groupId>com.googlecode.owasp-java-html-sanitizer</groupId>
 <artifactId>owasp-java-html-sanitizer</artifactId>
 <version>20180219.1</version>
 <scope>test</scope>
</dependency>

String untrustedHTML = "<html><p>hello</p></html>";

// Sanitizers.FORMATTING allows common formatting elements, currently these
// "b", "i", "font", "s", "u", "o", "sup", "sub", "ins", "del", "strong",
// "strike", "tt", "code", "big", "small", "br", "span", "em"

// Sanitizers.BLOCKS allows common blocks elements, currently these
// "p", "div", "h1", "h2", "h3", "h4", "h5", "h6", "ul", "ol", "li",
// "blockquote"

PolicyFactory policy = Sanitizers.FORMATTING.and(Sanitizers.FORMATTING).and(Sanitizers.BLOCKS);
String safeHTML = policy.sanitize(untrustedHTML);
System.out.println("safeHTML='" + safeHTML + "'");

safeHTML='<p>hello</p>'

No comments: