How many pieces are there in the UTF-8 puzzle?

I'm blogging this partly as a reminder to myself since I always forget one piece.

  • You need to specify database tables use UTF-8.
  • You need to use setEncoding() on form and URL scope to set them to UTF-8 (in Application.cfc).
  • You need to set the output content type to UTF-8.
  • You need to set the pageEncoding to UTF-8 (on any CFML page which needs it - so you might as well set it on all of them).
  • Your datasource setup, at least for MySQL, must have useUnicode=true&characterEncoding=utf-8 in the Connection String textarea under Advanced Settings.
That last piece is the one I tend to forget when I set up a new server. The rest is in code and once you have it in place, it'll work on any server but that connection string gets me (almost) every time!

Comments (Comment Moderation is enabled. Your comment will not appear until approved.)
CF_MAN's Gravatar What about to collation on the mySQL table? Any special needs?
# Posted By CF_MAN | 3/9/09 3:18 PM
Dan Sorensen's Gravatar Thanks for the concise reminder! That last one is really helpful.
# Posted By Dan Sorensen | 3/9/09 3:19 PM
Paul's Gravatar Great reminder. We do alot of translations as part of our application with MS SQL you need to on the data source, click "Show Advanced Settings" and check "Enable High ASCII characters and Unicode for data sources configured for non-Latin characters" or you end up with ???????? in the DB
# Posted By Paul | 3/9/09 4:14 PM
Henry Ho's Gravatar setEncoding() Usage:

"Use this function when the character encoding of the input to a form or the character encoding of a URL is *not* in UTF-8 encoding."

So why bother including that as one of your points? :)

pageEncoding in cfprocessingdirective is needed when the .cfm file has no BOM mark. Although Eclipse/CFEclipse doesn't save BOM mark, good old Windows Notepad does. :)
# Posted By Henry Ho | 3/9/09 4:53 PM
Raul Riera's Gravatar Great tips, thanks.. I was wondering by there were some weird black boxes on my pages. I will try to do this
# Posted By Raul Riera | 3/9/09 5:29 PM
Henry Ho's Gravatar weird black boxes.. make sure your windows have the necessary font pack as well. Asian fonts are not installed by default.
# Posted By Henry Ho | 3/10/09 4:21 PM
johans's Gravatar BOM / processingdirective / Eclipse - yes this is a pain.

Adding processingdirectives to every template to ensure the complier processes them as UTF-8 just seems silly to me.

DreamWeaver can also save files with a BOM. Sadly Eclipse does not and I guess the new CF IDE (Bolt) which is built on Eclipse will also not.

Lets hope Adobe just makes the CF complier use unicode as the default.
# Posted By johans | 3/10/09 11:13 PM
BlogCFC was created by Raymond Camden. This blog is running version 5.9.2.002. Contact Blog Owner