Thoughts on cleaning text by modifying rx_ephox.js

All

Is there any caveats on cleaning the user input by modifying the rx_ephox.js file:

/Rhythmyx_6_5_2/rx_resources/ephox/rx_ephox.js

function getEphoxDocument(sEditorName)
{

var doc = _editLiveInstance.getContentForEditableSection(
_getSectionDivNameByFieldName(sEditorName));
if(rxIsEditLiveDocEmpty(doc))
return “”;

//replace user input html
doc = doc.replace(“color=blue”,"");

alert("bp 7: getEphoxDocument:" + doc); 

return doc;
}
//======= a useful regex for replacing tags:
//var re= new RegExp(’<’+tag+’[^><]>|<.’+tag+’[^><]>’,‘g’)
//obj.value=obj.value.replace(re,’’);

// from http://tim.mackey.ie/CleanWordHTMLUsingRegularExpressions.aspx
/// <summary>
/// Removes all FONT and SPAN tags, and all Class and Style attributes.
/// Designed to get rid of non-standard Microsoft Word HTML tags.
/// </summary>
//private string CleanHtml(string html)
//{
// // start by completely removing all unwanted tags
// html = Regex.Replace(html, @"<[/]?(font|span|xml|del|ins|[ovwxp]:\w+)[^>]?>", “”, RegexOptions.IgnoreCase);
// // then run another pass over the html (twice), removing unwanted attributes
// html = Regex.Replace(html, @"<([^>]
)(?:class|lang|style|size|face|[ovwxp]:\w+)=(?:’[^’]’|""[^""]""|[^\s>]+)([^>])>","<$1$2>", RegexOptions.IgnoreCase);
// html = Regex.Replace(html, @"<([^>]
)(?:class|lang|style|size|face|[ovwxp]:\w+)=(?:’[^’]’|""[^""]""|[^\s>]+)([^>]*)>","<$1$2>", RegexOptions.IgnoreCase);
// return html;
//}

Here’s my updated function (attatched). It does some simple tag replacements. If it detects certain strings it will clean everything but the p tag.

rx_ephox.js is actually overwritten on upgrade so you would need to back your changes up before upgrade and then integrate them back into the file post upgrade.

Newest version attached.