Ephox - cleaning non-Word pasted text

Has anyone else found a way to clean text that is pasted in form MS Outlook or copied from another webpage (i.e. not cleaned by the MSWord filter in Ephox).

I am trying to remove the following opening and closing tags (and leave the text between the opening and closing tags) as I did in my eWebEditPro configuration file:
u, span, font
and ideally the following attributes of any tag:
dir, class

Thanks

Hi Brenda

If you look in the ephox config file:

RXROOT/rx_resources/ephox/elj_config.xml

The below two options need to set to clean.


   <wordImport styleOption="clean"/>
   <htmlImport styleOption="clean"/>

Cheers
James

Thanks, James. I do have those set in my config file.

The problem is that things copied from outlook/email/non-word editors (RTF files?) still have font, span etc. tags and things copied from word or webpages still have underline which interfere with our stylesheets.

Brenda,

If you’re looking for a manual solution, I usually find that pasting the text into Notepad (or another simple text editor) does an effective job of stripping extraneous formatting. I then copy the stripped-down text from the simple text editor and paste it to the target location. I use this technique frequently.

RLJII

I appreciate that there are work arounds but I’m looking for a more automated solution with similar end-results as ewebeditpro’s cleaning accomplishes.

Has anyone done anything with “paste filter” that I’ve seen on both this forum and ephox’s? I’m not familiar enough with Regular Expressions to write my own in this case.

I have over 50 end-users and I don’t think it’s fair to start asking them to copy from one program (original) to another (i.e. notepad) and then to another (RX CE). End-users expect to see enhanced functionality when we “upgrade” our software rather than the addition of another manual step. Moreover, if after switching to Ephox, end-users miss that step occaisionally, the integrity of the consistent look and feel of our site is compromised.

Just food for thought.

Goolge will convert word to html or text:

I was thinking about writing a java service to automate the whole process. Then maybe I could hook it into the cms some how.

This table has the callout and body html fragments that a regex could be used to clear out certain html tags:

http://forum.percussion.com/showthread.php?t=858

This utility might be useful:http://www.supershareware.com/info/detagger.html

Also, we could probably create a jsp page that calls some javascript to remove tags as such:
http://javascript.internet.com/snippets/remove-html-tags.html