Tab "& nbsp;" becomes & #160; characters and published as Â Â Â Â Â Â

picaza1 · May 26, 2011, 8:17am

(ignore extra spaces …)

Problem 1:
a. If I have & n b s p ; entered in Ephox Live edit “code view”, and switch to “design view”, then each & n b s p ; is auto-converted to & # 1 6 0 ;
b. If I have <td></td> anywhere it also auto-inserts a & # 1 6 0 ;.
c. Apparently some folk have noticed this with tabs as well (maybe from pasting MS office docs, which I am not).

If I could stop this behavior, then I could quit here. Anyone come up on this and have a solution? I have checked the check Ephox configuration located in [rx root]\rx_resources\ephox\elj_config.xml which does have the meta tag containing:
content=“text/html; charset=UTF-8” http-equiv=“Content-Type”

Someone will surely point me to Ephox sites, but I am not finding anything there either. And once the conversion and/or addition of characters to & # 1 6 0 ; happens, the next problem is on publishing the page.

Problem 2: “& # 1 6 0 ;” should be a perfectly OK character in my final code (.cfm, .htm, .asp) - however the act of rhythmyx publishing converts the & # 1 6 0 ; to the Â (and this is now in the final static source: has nothing to do with what browser, etc - it is now part of the published document.) So, perhaps there are some settings for rhythmyx publishing somewhere that determine the character set and might translate the Ephox resulting code ( peppered with & # 1 6 0 ; 's) to the final web docs peppered with Â 's.

I assume Problem 1 is Ephox (settings?) and Problem 2 is Rhythmyx (settings?)

Thanks for any hints, ideas, and even just plain sympathy.

Nick_Clark · May 26, 2011, 2:17pm

What character encoding is your template and or global template set to. It should be set to UTF-8.

Check this line at the template level and global template level if you use one:

meta http-equiv=“Content-Type” content=“text/html; charset=utf-8”

Let me know if this fixes your issue.

picaza1 · May 27, 2011, 9:09am

Yes, it is in the global template, in the head section. Sadly, this would only be something the browser uses for displaying. All of the problems happen beforehand, during the “publishing” of the page. I Once those silly Â’s are there, there is nothing in the global template that can help.

Aside: While “previewing” pages (not published), get NOTHING where the
& # 1 6 0 ; 's are.

Rushing · May 27, 2011, 11:07am

this is not the right section to look.

Look on the General tab. middle-right… labeled as “Character Set:” This drop down box should have “UTF-8” selected.

picaza1 · May 31, 2011, 9:33am

Yes, that has been selected. We are thinking that the xml name spaces that are automatically included in the ephox live are the problem, but having a difficult time tracking down how to eliminate. Auto-inserted into our live edit is:

div xmlns:st2=“urn:www.microsoft.com/smarttags2” xmlns:st1=“urn:www.microsoft.com/smarttags” xmlns:o=“urn:www.microsoft.com/office” class=“rxbodyfield” xmlns:x=“urn:www.microsoft.com/excel” xmlns:w=“urn:www.microsoft.com/word”

jitendra · May 31, 2011, 10:09am

I’m curious, when you preview the page template with a 301 context (and preview=public), do you see the strangeness? Or is it only when the item actually gets published?

picaza1 · May 31, 2011, 11:14am

See start of thread. Two seperate problems that seem to add up to trouble.

Publishing does take the & #160; to the Â.

jitendra · May 31, 2011, 12:01pm

Right, you mention that the character it is part of the “final static source” on publish and you also mention that previewing the page you get “NOTHING” where the spaces are. By changing the context and the publish variables, you are essentially “seeing” what the publisher sees. I am just trying to determine if there is something weird going on in that stage (as opposed to a generic preview).

picaza1 · May 31, 2011, 12:25pm

It does seem something weird is going on in that stage (the publishing) - as the & #160; is completely replaced by the Â (which I suppose means somewhere it is “missunderstood”). This happens ONLY in live-edit/ephox boxes. In other parts of the page, neithor “& #160;” nor “& nbsp;”) are not replaced.

picaza1 · May 31, 2011, 2:26pm

example: 
Code in template page, NOT in ephox:
	#field_if_set("<h1>" "displaytitle" "& #160;& nbsp;</h1>" )

Final published .cfm:

     <h1>MY DISPLAY TITLE& #160;&  nbsp;</h1>

That comes through … does this help limit where to look for problems?

jitendra · May 31, 2011, 3:06pm

Sorry, I must be missing something here and perhaps my original question wasn’t fleshed out… When you preview the template regularly, (Right click on content item > Preview > P - whatever with some of the parameters in the url being: sys_context=0 and sys_itemfilter=preview), you’re saying that the source of that page doesn’t display the & #160 or the space on ephox fields? Also, when you change the context (in the url) to be 301 and the sys_itemfilter to be publish, you have the same behavior (of not having anything in the source view of that page for those encoded spaces)?

The next thing I would check is how those are being encoded in the database (ie the db is utf-8 and the ephox field text is as you expect in the database with those spaces and not Â).

picaza1 · June 1, 2011, 8:56am

[QUOTE=jitendra;18558]Sorry, I must be missing something here and perhaps my original question wasn’t fleshed out… When you preview the template regularly, (Right click on content item > Preview > P - whatever with some of the parameters in the url being: sys_context=0 and sys_itemfilter=preview), you’re saying that the source of that page doesn’t display the & #160 or the space on ephox fields? Also, when you change the context (in the url) to be 301 and the sys_itemfilter to be publish, you have the same behavior (of not having anything in the source view of that page for those encoded spaces)?

The next thing I would check is how those are being encoded in the database (ie the db is utf-8 and the ephox field text is as you expect in the database with those spaces and not Â).[/QUOTE]

Yes, that is it shows a true empty space (as in

<td> </td> as opposed to <td></td>

) with no char strings that infer a space.

I don’t know what you you mean about the 301, will try that. We will check DB. However, if I repeatedly access the field (from this and other computers) I continue to get the & #160, so I must assume for now that it is not being stored in DB as Â.

Part one of orig post still throws me - if I could eliminate these & #160 items alltogether, I might at least be able to move on. I am starting to think that has to do with the ephox settings on local computer, as opposed to the server where all else lives.

picaza1 · June 2, 2011, 11:12am

Getting there…

Part one: in these files:
Z:\Rhythmyx\rx_resources\ephox\elj_config.xml AND
Z:\Rhythmyx\sys_resources\ephox\elj_config.xml

I have now set:
outputXHTML=“false”

Now I keep my & nbsp; while in edtor.

But they are still stripped out upon publish. AND REPLACED with some little “space” that I can see on MS word (when I view hidden characters) that the browsers do not recognize.

Perhaps that is what was going on all along, and whether or not it starts out as & nbsp; or as & #160 is moot.

Why are these replaced? HOw can I stop that? I want my & nbsp; and/or “normal” spaces in there.

picaza1 · June 2, 2011, 1:42pm

Workaround:

Since the act of publishing changes all of the & nbsp; to the strange ms word space, I just reverse that after done. For example, we have a live edit field called cibody:

#set($add_code = $sys.item.getProperty("rx:cibody").String)
#set($add_code = $add_code.replaceAll("PASTE THE FUNNY SPACE HERE",'& nbsp;'))
	$add_code

You may need to copy and paste that “funny space” youself if you have had this problem. I can’t paste it to this forum successfully.

Rushing · June 2, 2011, 2:31pm

you’re seeing the actual non-breaking space character being output. You can try this (untested):

#set($result = $sys.item.getProperty(‘rx:XXX’).String.replace("\u00A0"," "))

… or something like it.

picaza1 · June 6, 2011, 9:42am

[QUOTE=Rushing;18570]you’re seeing the actual non-breaking space character being output. You can try this (untested):

#set($result = $sys.item.getProperty(‘rx:XXX’).String.replace("\u00A0"," "))

… or something like it.[/QUOTE]

I have a long list I am replacing, just didn’t show them … copyright, trademark, some quotes, etc. So have repeated “String.replace” functions.

Stephen_Bolton · June 7, 2011, 2:27pm

There are may missunderstandings about character encodings that cause these sorts of issues, and often the fixes cause more problems than they solve without understanding the core reason. The problem I think we are seeing here is that and both represent the actual non-breaking character in xml. When a file containing these is loaded into memory as a DOM object these end up as the unicode non-breaking character after this point there is no different between these two representations and the if the non-breaking character was included into a unicode utf-8 document itself. also will only be valid if it is combined with a html dtd that defines this entity. It is up to the process of converting this internal dom object back to xml or html that decides what form this takes on the output.

What is happening here originally is that the characters in ephox are being converted into an xml dom object internally and when the page is being rendered it is outputting the document as UTF-8 where the character does not need to be turned into an entity. In most cases the default webserver setup is configured to read in html files using the single byte character set iso-8859-1 or microsofts equivalient. this end up seeing the two bytes required for the non-breaking space as two separate characters one of which is the Â you see. You can set up your webserver to expect files to be UTF-8 or if you are outputting JSP or ASP(x) files you can add a header that tells the server it is UTF-8. Most of the time it is best to use UTF-8 all the way through the system this way you should not get into these issues. You can specify the encoding on template to match the web server encoding, you may then get into issues with characters without a single byte equivalent.

picaza1 · June 9, 2011, 3:17pm

We have UTF-8 entered every place we can find. Maybe there is a list of places this is set, set, and reset?

I can’t read all of your post - but we do need to keep & nbsp; placeholders accross the board, as this is a content management system for many kinds of users. (some who will style by adding extra spaces)

picaza1 · June 16, 2011, 10:58am

I have now set:
outputXHTML=“true”, false created other problems

http://forum.percussion.com/showthread.php?10651-EditLive-ephox-class-camelhumps-are-flatening

IC1 · June 30, 2011, 10:25am

Have you found the solution to this problem? I am having the same issue and no resolution. Thanks