Upgrade to commons-lang 3.0 to correctly escape characters from the Unicode Supplemental Multilingual Plane

Description

If one includes characters from the Unicode Supplemental Multilingual Plane (code points U+10000 upwards) in a story file, if one then asks for an HTML report from the test run the characters will not be HTML-escaped correctly.

For example, given a story file with the following scenario:
------------
Scenario: Some scenario
Given some situation
When I do something
Then the result is 𐐆
------------
(The "dagger"-type character is actually code point U+10406 - see http://en.wikibooks.org/wiki/Unicode/Character_reference/10000-10FFF)

The resulting HTML report will have the "dagger" character escaped as �� - which represent surrogate-pair code points (used in UTF-16 only) and so is rendered as gibberish in HTML. The escape should be 𐐆

NOTE: This is NOT a bug in JBehave per se - the bug is in the StringEscapeUtils class of commons-lang. A related bug has already been raised (and fixed) in commons-lang: https://issues.apache.org/jira/browse/LANG-617. Although the commons-lang bug report relates to XML escaping rather than HTML escaping, it seems likely that the fix will cover both. Unfortunately, the fix is in commons-lang 3.0...

Assignee

Mauro Talevi

Reporter

Alistair Dutton

Labels

None

Components

Fix versions

Affects versions

Priority

Minor
Configure