Sending emails with strange characters to strange people, or at least people with non-ascii names.
Every few months at my employer Zest Software we have an evening of "eten en weten". Literally that is Dutch for "eating and knowing". Let's call it "Food for thought". We eat together and several of us hold presentations on subjects that are in some way related to our work. For example: Django, common Dutch language mistakes, how we use subversion, or local site hooks and the many interesting ways in which they can break when migrating from Plone 2.5 to 3. I managed to squeeze that last one into a lightning talk of a few minutes; you really don't want to know. ;-) (In case you do want to know, take a look at Products.Plone3Cleaners).
It is probably about time for a new "eten en weten" so it is probably also about time I uploaded my talk from last time about international emails. I talked about some base terminology, what can go wrong, pointed to the python email module and showed how to send a complete message, including some details that you can forget as long as you use the proper methods. After all, foreign languages are difficult enough already:
Two terms widely used are:
internationalization i 18 n localization l 10 n
Roughly said, in a Plone context, internationalization is making sure the content or the UI is translated into several languages. Localization is making sure that 3 May 2009 is 05-03-2009 in the USA and 03.05.2009 in Germany.
These two terms are not really the focus here though. The point is: how do you make sure that an email sent from Plone (or any python application really, if you ignore some details) with a Chinese name as From address, a Japanese name as To address, a Russian Subject and a Korean body text is delivered without errors.
Now do not think: "I live and work in America, I only need ascii." Don't you have Spanish colleagues? Some friends from your year abroad at that French university? A few Chinese clients? You could use only ascii, but you might regret that:
Repeat after me: "utf-8 is not unicode", "utf-8 is not unicode", "utf-8 is not unicode":
>>> type('ascii') <type 'str'> >>> type('utf-8') <type 'str'> >>> type(u'unicode') <type 'unicode'>
Sending an email in Plone goes something like this:
charset = portal.getProperty( 'email_charset', 'ISO-8859-1') mailhost = getToolByName(portal, 'MailHost') mailHost.send(message = msg, mto = address, mfrom = mfrom, subject = subject, charset = charset)
Hard to read headers:
From: RenXX Artois
Hard to read body text:
lettere accentate: ÃÂ² ÃÂ¹Ã¢
To: undisclosed recipients
No email body: C
The To and From fields should have something like this:
Maurits van Rees <email@example.com>
The standard python email package has nice utilities for this:
>>> from email.Utils import parseaddr >>> from email.Utils import formataddr >>> formataddr(('Maurits van Rees', 'firstname.lastname@example.org')) 'Maurits van Rees <email@example.com>' >>> parseaddr( 'Maurits van Rees <firstname.lastname@example.org>') ('Maurits van Rees', 'email@example.com')
These functions can get confused by strange characters. You can guard against that by parsing the address that you have just formatted and seeing if the parsed information still makes sense:
from_address = portal.getProperty( 'email_from_address', '') from_name = portal.getProperty( 'email_from_name', '') mfrom = formataddr((from_name, from_address)) if parseaddr(mfrom) != from_address: # formataddr probably got confused # by special characters. mfrom = from_address
The python email.Charset module has interesting information about how email headers and body text should be encoded depending on the input character set. Some examples (QP is quoted printable):
input header enc body enc output conv iso-8859-1: QP QP None iso-8859-15: QP QP None windows-1252: QP QP None us-ascii: None None None big5: BASE64 BASE64 None euc-jp: BASE64 None iso-2022-jp iso-2022-jp: BASE64 None None utf-8: SHORTEST BASE64 utf-8 ...
If that does not make sense, perhaps this helps:
This information is used when creating email headers:
>>> from email.Charset import Charset >>> latin = Charset('iso-8859-1') >>> utf = Charset('utf-8') >>> latin.header_encode('René Artois') u'=?iso-8859-1?q?Ren=C3=A9_Artois?=' >>> utf.header_encode('René Artois') '=?utf-8?q?Ren=C3=A9_Artois?='
and encoding body text:
>>> latin.get_body_encoding() 'quoted-printable' >>> latin.body_encode('René Artois') 'Ren=C3=A9 Artois' >>> utf.get_body_encoding() 'base64' >>> utf.body_encode('René Artois') 'UmVuw6kgQXJ0b2lz\n'
This may look confusing. Surely if you get an email with a text or subject like this it is unreadable? No, your email program should be smart enough to display this to you in a readable fashion. No need for the funny face:
Instead of using email.Charset for formatting headers you normally use the email.Header module:
>>> from email.Header import Header >>> subject = 'Re: René'.decode('latin-1') >>> subject u'Re: Ren\xc3\xa9' >>> subject = Header(subject, 'latin-1') >>> subject <email.Header.Header instance at 0xb79ed90c> >>> print subject =?iso-8859-1?q?Re=3A_Ren=C3=A9?=
You will need to know which character set the body text has, or at least in which character set it can be encoded without errors. This snipped tries three character sets:
charset = portal.getProperty( 'email_charset', 'ISO-8859-1') for body_charset in 'US-ASCII', charset, 'UTF-8': try: message = message.encode(body_charset) except UnicodeError: pass else: break
If the message only contains ascii characters, then at the end of this snippet the message is encoded in ascii and the body_charset variable is 'US-ASCII'.
We have done all the hard work with the Headers so now we can use the 'send' method:
# Create the message. # 'plain' stands for Content-Type: text/plain from email.MIMEText import MIMEText msg = MIMEText(message, 'plain', body_charset) msg['From'] = email_from msg['To'] = email_to msg['Subject'] = subject msg = msg.as_string() mailhost = getToolByName(portal, 'MailHost') mailhost.send(message=msg)
Easier is to use the secureSend method; using with the Header class is not needed then, as secureSend takes care of that:
email_msg = MIMEText(message, 'plain', body_charset) mailhost.secureSend( message = email_msg, mto = email_to, mfrom = email_from, subject = subject, charset = header_charset)
Now international email sending should work:
Images courtesy of: