From: Barry Warsaw Date: Fri, 4 Oct 2002 17:24:24 +0000 (+0000) Subject: Backporting of email 2.4 from Python 2.3. Many newly added modules, X-Git-Tag: v2.2.2b1~75 X-Git-Url: http://git.ipfire.org/gitweb.cgi?a=commitdiff_plain;h=18ff7954685b8388912df74faaec06f82fcd8cfc;p=thirdparty%2FPython%2Fcpython.git Backporting of email 2.4 from Python 2.3. Many newly added modules, some updated modules, updated documentation, and updated tests. Note that Lib/test/regrtest.py added test_email_codecs to the expected skips for all platforms. Also note that test_email_codecs.py differs slightly from its Python 2.3 counterpart due to the difference in package location for TestSkipped. --- diff --git a/Doc/lib/email.tex b/Doc/lib/email.tex index 5ba0ceaea252..47bbf5be6da6 100644 --- a/Doc/lib/email.tex +++ b/Doc/lib/email.tex @@ -1,4 +1,4 @@ -% Copyright (C) 2001 Python Software Foundation +% Copyright (C) 2001,2002 Python Software Foundation % Author: barry@zope.com (Barry Warsaw) \section{\module{email} --- @@ -19,13 +19,10 @@ such as \refmodule{rfc822}, \refmodule{mimetools}, \refmodule{multifile}, and other non-standard packages such as \module{mimecntl}. It is specifically \emph{not} designed to do any sending of email messages to SMTP (\rfc{2821}) servers; that is the -function of the \refmodule{smtplib} module\footnote{For this reason, -line endings in the \module{email} package are always native line -endings. The \module{smtplib} module is responsible for converting -from native line endings to \rfc{2821} line endings, just as your mail -server would be responsible for converting from \rfc{2821} line -endings to native line endings when it stores messages in a local -mailbox.}. +function of the \refmodule{smtplib} module. The \module{email} +package attempts to be as RFC-compliant as possible, supporting in +addition to \rfc{2822}, such MIME-related RFCs as +\rfc{2045}-\rfc{2047}, and \rfc{2231}. The primary distinguishing feature of the \module{email} package is that it splits the parsing and generating of email messages from the @@ -42,21 +39,20 @@ and parsing message field values, creating RFC-compliant dates, etc. The following sections describe the functionality of the \module{email} package. The ordering follows a progression that should be common in applications: an email message is read as flat -text from a file or other source, the text is parsed to produce an -object model representation of the email message, this model is -manipulated, and finally the model is rendered back into -flat text. +text from a file or other source, the text is parsed to produce the +object structure of the email message, this structure is manipulated, +and finally rendered back into flat text. -It is perfectly feasible to create the object model out of whole cloth ---- i.e. completely from scratch. From there, a similar progression -can be taken as above. +It is perfectly feasible to create the object structure out of whole +cloth --- i.e. completely from scratch. From there, a similar +progression can be taken as above. Also included are detailed specifications of all the classes and modules that the \module{email} package provides, the exception classes you might encounter while using the \module{email} package, some auxiliary utilities, and a few examples. For users of the older -\module{mimelib} package, from which the \module{email} package is -descended, a section on differences and porting is provided. +\module{mimelib} package, or previous versions of the \module{email} +package, a section on differences and porting is provided. \begin{seealso} \seemodule{smtplib}{SMTP protocol client} @@ -72,133 +68,13 @@ descended, a section on differences and porting is provided. \input{emailgenerator} \subsection{Creating email and MIME objects from scratch} +\input{emailmimebase} -Ordinarily, you get a message object tree by passing some text to a -parser, which parses the text and returns the root of the message -object tree. However you can also build a complete object tree from -scratch, or even individual \class{Message} objects by hand. In fact, -you can also take an existing tree and add new \class{Message} -objects, move them around, etc. This makes a very convenient -interface for slicing-and-dicing MIME messages. - -You can create a new object tree by creating \class{Message} -instances, adding payloads and all the appropriate headers manually. -For MIME messages though, the \module{email} package provides some -convenient classes to make things easier. Each of these classes -should be imported from a module with the same name as the class, from -within the \module{email} package. E.g.: - -\begin{verbatim} -import email.MIMEImage.MIMEImage -\end{verbatim} - -or - -\begin{verbatim} -from email.MIMEText import MIMEText -\end{verbatim} - -Here are the classes: - -\begin{classdesc}{MIMEBase}{_maintype, _subtype, **_params} -This is the base class for all the MIME-specific subclasses of -\class{Message}. Ordinarily you won't create instances specifically -of \class{MIMEBase}, although you could. \class{MIMEBase} is provided -primarily as a convenient base class for more specific MIME-aware -subclasses. - -\var{_maintype} is the \mailheader{Content-Type} major type -(e.g. \mimetype{text} or \mimetype{image}), and \var{_subtype} is the -\mailheader{Content-Type} minor type -(e.g. \mimetype{plain} or \mimetype{gif}). \var{_params} is a parameter -key/value dictionary and is passed directly to -\method{Message.add_header()}. - -The \class{MIMEBase} class always adds a \mailheader{Content-Type} header -(based on \var{_maintype}, \var{_subtype}, and \var{_params}), and a -\mailheader{MIME-Version} header (always set to \code{1.0}). -\end{classdesc} - -\begin{classdesc}{MIMEAudio}{_audiodata\optional{, _subtype\optional{, - _encoder\optional{, **_params}}}} - -A subclass of \class{MIMEBase}, the \class{MIMEAudio} class is used to -create MIME message objects of major type \mimetype{audio}. -\var{_audiodata} is a string containing the raw audio data. If this -data can be decoded by the standard Python module \refmodule{sndhdr}, -then the subtype will be automatically included in the -\mailheader{Content-Type} header. Otherwise you can explicitly specify the -audio subtype via the \var{_subtype} parameter. If the minor type could -not be guessed and \var{_subtype} was not given, then \exception{TypeError} -is raised. - -Optional \var{_encoder} is a callable (i.e. function) which will -perform the actual encoding of the audio data for transport. This -callable takes one argument, which is the \class{MIMEAudio} instance. -It should use \method{get_payload()} and \method{set_payload()} to -change the payload to encoded form. It should also add any -\mailheader{Content-Transfer-Encoding} or other headers to the message -object as necessary. The default encoding is \emph{Base64}. See the -\refmodule{email.Encoders} module for a list of the built-in encoders. - -\var{_params} are passed straight through to the \class{MIMEBase} -constructor. -\end{classdesc} - -\begin{classdesc}{MIMEImage}{_imagedata\optional{, _subtype\optional{, - _encoder\optional{, **_params}}}} - -A subclass of \class{MIMEBase}, the \class{MIMEImage} class is used to -create MIME message objects of major type \mimetype{image}. -\var{_imagedata} is a string containing the raw image data. If this -data can be decoded by the standard Python module \refmodule{imghdr}, -then the subtype will be automatically included in the -\mailheader{Content-Type} header. Otherwise you can explicitly specify the -image subtype via the \var{_subtype} parameter. If the minor type could -not be guessed and \var{_subtype} was not given, then \exception{TypeError} -is raised. - -Optional \var{_encoder} is a callable (i.e. function) which will -perform the actual encoding of the image data for transport. This -callable takes one argument, which is the \class{MIMEImage} instance. -It should use \method{get_payload()} and \method{set_payload()} to -change the payload to encoded form. It should also add any -\mailheader{Content-Transfer-Encoding} or other headers to the message -object as necessary. The default encoding is \emph{Base64}. See the -\refmodule{email.Encoders} module for a list of the built-in encoders. - -\var{_params} are passed straight through to the \class{MIMEBase} -constructor. -\end{classdesc} - -\begin{classdesc}{MIMEText}{_text\optional{, _subtype\optional{, - _charset\optional{, _encoder}}}} - -A subclass of \class{MIMEBase}, the \class{MIMEText} class is used to -create MIME objects of major type \mimetype{text}. \var{_text} is the -string for the payload. \var{_subtype} is the minor type and defaults -to \mimetype{plain}. \var{_charset} is the character set of the text and is -passed as a parameter to the \class{MIMEBase} constructor; it defaults -to \code{us-ascii}. No guessing or encoding is performed on the text -data, but a newline is appended to \var{_text} if it doesn't already -end with a newline. - -The \var{_encoding} argument is as with the \class{MIMEImage} class -constructor, except that the default encoding for \class{MIMEText} -objects is one that doesn't actually modify the payload, but does set -the \mailheader{Content-Transfer-Encoding} header to \code{7bit} or -\code{8bit} as appropriate. -\end{classdesc} - -\begin{classdesc}{MIMEMessage}{_msg\optional{, _subtype}} -A subclass of \class{MIMEBase}, the \class{MIMEMessage} class is used to -create MIME objects of main type \mimetype{message}. \var{_msg} is used as -the payload, and must be an instance of class \class{Message} (or a -subclass thereof), otherwise a \exception{TypeError} is raised. - -Optional \var{_subtype} sets the subtype of the message; it defaults -to \mimetype{rfc822}. -\end{classdesc} +\subsection{Internationalized headers} +\input{emailheaders} + +\subsection{Representing character sets} +\input{emailcharsets} \subsection{Encoders} \input{emailencoders} @@ -212,6 +88,87 @@ to \mimetype{rfc822}. \subsection{Iterators} \input{emailiter} +\subsection{Differences from \module{email} v1 (up to Python 2.2.1)} + +Version 1 of the \module{email} package was bundled with Python +releases up to Python 2.2.1. Version 2 was developed for the Python +2.3 release, and backported to Python 2.2.2. It was also available as +a separate distutils based package. \module{email} version 2 is +almost entirely backward compatible with version 1, with the +following differences: + +\begin{itemize} +\item The \module{email.Header} and \module{email.Charset} modules + have been added. + +\item The pickle format for \class{Message} instances has changed. + Since this was never (and still isn't) formally defined, this + isn't considered a backward incompatibility. However if your + application pickles and unpickles \class{Message} instances, be + aware that in \module{email} version 2, \class{Message} + instances now have private variables \var{_charset} and + \var{_default_type}. + +\item Several methods in the \class{Message} class have been + deprecated, or their signatures changed. Also, many new methods + have been added. See the documentation for the \class{Message} + class for details. The changes should be completely backward + compatible. + +\item The object structure has changed in the face of + \mimetype{message/rfc822} content types. In \module{email} + version 1, such a type would be represented by a scalar payload, + i.e. the container message's \method{is_multipart()} returned + false, \method{get_payload()} was not a list object, but a single + \class{Message} instance. + + This structure was inconsistent with the rest of the package, so + the object representation for \mimetype{message/rfc822} content + types was changed. In \module{email} version 2, the container + \emph{does} return \code{True} from \method{is_multipart()}, and + \method{get_payload()} returns a list containing a single + \class{Message} item. + + Note that this is one place that backward compatibility could + not be completely maintained. However, if you're already + testing the return type of \method{get_payload()}, you should be + fine. You just need to make sure your code doesn't do a + \method{set_payload()} with a \class{Message} instance on a + container with a content type of \mimetype{message/rfc822}. + +\item The \class{Parser} constructor's \var{strict} argument was + added, and its \method{parse()} and \method{parsestr()} methods + grew a \var{headersonly} argument. The \var{strict} flag was + also added to functions \function{email.message_from_file()} + and \function{email.message_from_string()}. + +\item \method{Generator.__call__()} is deprecated; use + \method{Generator.flatten()} instead. The \class{Generator} + class has also grown the \method{clone()} method. + +\item The \class{DecodedGenerator} class in the + \module{email.Generator} module was added. + +\item The intermediate base classes \class{MIMENonMultipart} and + \class{MIMEMultipart} have been added, and interposed in the + class hierarchy for most of the other MIME-related derived + classes. + +\item The \var{_encoder} argument to the \class{MIMEText} constructor + has been deprecated. Encoding now happens implicitly based + on the \var{_charset} argument. + +\item The following functions in the \module{email.Utils} module have + been deprecated: \function{dump_address_pairs()}, + \function{decode()}, and \function{encode()}. The following + functions have been added to the module: + \function{make_msgid()}, \function{decode_rfc2231()}, + \function{encode_rfc2231()}, and \function{decode_params()}. + +\item The non-public function \function{email.Iterators._structure()} + was added. +\end{itemize} + \subsection{Differences from \module{mimelib}} The \module{email} package was originally prototyped as a separate @@ -222,7 +179,9 @@ method names are more consistent, and some methods or modules have either been added or removed. The semantics of some of the methods have also changed. For the most part, any functionality available in \module{mimelib} is still available in the \refmodule{email} package, -albeit often in a different way. +albeit often in a different way. Backward compatibility between +the \module{mimelib} package and the \module{email} package was not a +priority. Here is a brief description of the differences between the \module{mimelib} and the \refmodule{email} packages, along with hints on @@ -235,47 +194,65 @@ addition, the top-level package has the following differences: \begin{itemize} \item \function{messageFromString()} has been renamed to \function{message_from_string()}. + \item \function{messageFromFile()} has been renamed to \function{message_from_file()}. + \end{itemize} The \class{Message} class has the following differences: \begin{itemize} \item The method \method{asString()} was renamed to \method{as_string()}. + \item The method \method{ismultipart()} was renamed to \method{is_multipart()}. + \item The \method{get_payload()} method has grown a \var{decode} optional argument. + \item The method \method{getall()} was renamed to \method{get_all()}. + \item The method \method{addheader()} was renamed to \method{add_header()}. + \item The method \method{gettype()} was renamed to \method{get_type()}. + \item The method\method{getmaintype()} was renamed to \method{get_main_type()}. + \item The method \method{getsubtype()} was renamed to \method{get_subtype()}. + \item The method \method{getparams()} was renamed to \method{get_params()}. Also, whereas \method{getparams()} returned a list of strings, \method{get_params()} returns a list of 2-tuples, effectively the key/value pairs of the parameters, split on the \character{=} sign. + \item The method \method{getparam()} was renamed to \method{get_param()}. + \item The method \method{getcharsets()} was renamed to \method{get_charsets()}. + \item The method \method{getfilename()} was renamed to \method{get_filename()}. + \item The method \method{getboundary()} was renamed to \method{get_boundary()}. + \item The method \method{setboundary()} was renamed to \method{set_boundary()}. + \item The method \method{getdecodedpayload()} was removed. To get similar functionality, pass the value 1 to the \var{decode} flag of the {get_payload()} method. + \item The method \method{getpayloadastext()} was removed. Similar functionality is supported by the \class{DecodedGenerator} class in the \refmodule{email.Generator} module. + \item The method \method{getbodyastext()} was removed. You can get similar functionality by creating an iterator with \function{typed_subpart_iterator()} in the @@ -302,12 +279,15 @@ The following modules and classes have been changed: \item The \class{MIMEBase} class constructor arguments \var{_major} and \var{_minor} have changed to \var{_maintype} and \var{_subtype} respectively. + \item The \code{Image} class/module has been renamed to \code{MIMEImage}. The \var{_minor} argument has been renamed to \var{_subtype}. + \item The \code{Text} class/module has been renamed to \code{MIMEText}. The \var{_minor} argument has been renamed to \var{_subtype}. + \item The \code{MessageRFC822} class/module has been renamed to \code{MIMEMessage}. Note that an earlier version of \module{mimelib} called this class/module \code{RFC822}, but @@ -336,294 +316,20 @@ MIME messages. First, let's see how to create and send a simple text message: -\begin{verbatim} -# Import smtplib for the actual sending function -import smtplib - -# Here are the email pacakge modules we'll need -from email import Encoders -from email.MIMEText import MIMEText - -# Open a plain text file for reading -fp = open(textfile) -# Create a text/plain message, using Quoted-Printable encoding for non-ASCII -# characters. -msg = MIMEText(fp.read(), _encoder=Encoders.encode_quopri) -fp.close() - -# me == the sender's email address -# you == the recipient's email address -msg['Subject'] = 'The contents of %s' % textfile -msg['From'] = me -msg['To'] = you - -# Send the message via our own SMTP server. Use msg.as_string() with -# unixfrom=0 so as not to confuse SMTP. -s = smtplib.SMTP() -s.connect() -s.sendmail(me, [you], msg.as_string(0)) -s.close() -\end{verbatim} +\verbatiminput{email-simple.py} Here's an example of how to send a MIME message containing a bunch of -family pictures: - -\begin{verbatim} -# Import smtplib for the actual sending function -import smtplib - -# Here are the email pacakge modules we'll need -from email.MIMEImage import MIMEImage -from email.MIMEBase import MIMEBase - -COMMASPACE = ', ' - -# Create the container (outer) email message. -# me == the sender's email address -# family = the list of all recipients' email addresses -msg = MIMEBase('multipart', 'mixed') -msg['Subject'] = 'Our family reunion' -msg['From'] = me -msg['To'] = COMMASPACE.join(family) -msg.preamble = 'Our family reunion' -# Guarantees the message ends in a newline -msg.epilogue = '' - -# Assume we know that the image files are all in PNG format -for file in pngfiles: - # Open the files in binary mode. Let the MIMEIMage class automatically - # guess the specific image type. - fp = open(file, 'rb') - img = MIMEImage(fp.read()) - fp.close() - msg.attach(img) - -# Send the email via our own SMTP server. -s = smtplib.SMTP() -s.connect() -s.sendmail(me, family, msg.as_string(unixfrom=0)) -s.close() -\end{verbatim} +family pictures that may be residing in a directory: + +\verbatiminput{email-mime.py} Here's an example\footnote{Thanks to Matthew Dixon Cowles for the original inspiration and examples.} of how to send the entire contents of a directory as an email message: -\begin{verbatim} -#!/usr/bin/env python - -"""Send the contents of a directory as a MIME message. - -Usage: dirmail [options] from to [to ...]* - -Options: - -h / --help - Print this message and exit. - - -d directory - --directory=directory - Mail the contents of the specified directory, otherwise use the - current directory. Only the regular files in the directory are sent, - and we don't recurse to subdirectories. - -`from' is the email address of the sender of the message. - -`to' is the email address of the recipient of the message, and multiple -recipients may be given. - -The email is sent by forwarding to your local SMTP server, which then does the -normal delivery process. Your local machine must be running an SMTP server. -""" - -import sys -import os -import getopt -import smtplib -# For guessing MIME type based on file name extension -import mimetypes - -from email import Encoders -from email.Message import Message -from email.MIMEAudio import MIMEAudio -from email.MIMEBase import MIMEBase -from email.MIMEImage import MIMEImage -from email.MIMEText import MIMEText - -COMMASPACE = ', ' - - -def usage(code, msg=''): - print >> sys.stderr, __doc__ - if msg: - print >> sys.stderr, msg - sys.exit(code) - - -def main(): - try: - opts, args = getopt.getopt(sys.argv[1:], 'hd:', ['help', 'directory=']) - except getopt.error, msg: - usage(1, msg) - - dir = os.curdir - for opt, arg in opts: - if opt in ('-h', '--help'): - usage(0) - elif opt in ('-d', '--directory'): - dir = arg - - if len(args) < 2: - usage(1) - - sender = args[0] - recips = args[1:] - - # Create the enclosing (outer) message - outer = MIMEBase('multipart', 'mixed') - outer['Subject'] = 'Contents of directory %s' % os.path.abspath(dir) - outer['To'] = COMMASPACE.join(recips) - outer['From'] = sender - outer.preamble = 'You will not see this in a MIME-aware mail reader.\n' - # To guarantee the message ends with a newline - outer.epilogue = '' - - for filename in os.listdir(dir): - path = os.path.join(dir, filename) - if not os.path.isfile(path): - continue - # Guess the Content-Type: based on the file's extension. Encoding - # will be ignored, although we should check for simple things like - # gzip'd or compressed files - ctype, encoding = mimetypes.guess_type(path) - if ctype is None or encoding is not None: - # No guess could be made, or the file is encoded (compressed), so - # use a generic bag-of-bits type. - ctype = 'application/octet-stream' - maintype, subtype = ctype.split('/', 1) - if maintype == 'text': - fp = open(path) - # Note: we should handle calculating the charset - msg = MIMEText(fp.read(), _subtype=subtype) - fp.close() - elif maintype == 'image': - fp = open(path, 'rb') - msg = MIMEImage(fp.read(), _subtype=subtype) - fp.close() - elif maintype == 'audio': - fp = open(path, 'rb') - msg = MIMEAudio(fp.read(), _subtype=subtype) - fp.close() - else: - fp = open(path, 'rb') - msg = MIMEBase(maintype, subtype) - msg.add_payload(fp.read()) - fp.close() - # Encode the payload using Base64 - Encoders.encode_base64(msg) - # Set the filename parameter - msg.add_header('Content-Disposition', 'attachment', filename=filename) - outer.attach(msg) - - fp = open('/tmp/debug.pck', 'w') - import cPickle - cPickle.dump(outer, fp) - fp.close() - # Now send the message - s = smtplib.SMTP() - s.connect() - s.sendmail(sender, recips, outer.as_string(0)) - s.close() - - -if __name__ == '__main__': - main() -\end{verbatim} +\verbatiminput{email-dir.py} And finally, here's an example of how to unpack a MIME message like the one above, into a directory of files: -\begin{verbatim} -#!/usr/bin/env python - -"""Unpack a MIME message into a directory of files. - -Usage: unpackmail [options] msgfile - -Options: - -h / --help - Print this message and exit. - - -d directory - --directory=directory - Unpack the MIME message into the named directory, which will be - created if it doesn't already exist. - -msgfile is the path to the file containing the MIME message. -""" - -import sys -import os -import getopt -import errno -import mimetypes -import email - - -def usage(code, msg=''): - print >> sys.stderr, __doc__ - if msg: - print >> sys.stderr, msg - sys.exit(code) - - -def main(): - try: - opts, args = getopt.getopt(sys.argv[1:], 'hd:', ['help', 'directory=']) - except getopt.error, msg: - usage(1, msg) - - dir = os.curdir - for opt, arg in opts: - if opt in ('-h', '--help'): - usage(0) - elif opt in ('-d', '--directory'): - dir = arg - - try: - msgfile = args[0] - except IndexError: - usage(1) - - try: - os.mkdir(dir) - except OSError, e: - # Ignore directory exists error - if e.errno <> errno.EEXIST: raise - - fp = open(msgfile) - msg = email.message_from_file(fp) - fp.close() - - counter = 1 - for part in msg.walk(): - # multipart/* are just containers - if part.get_main_type() == 'multipart': - continue - # Applications should really sanitize the given filename so that an - # email message can't be used to overwrite important files - filename = part.get_filename() - if not filename: - ext = mimetypes.guess_extension(part.get_type()) - if not ext: - # Use a generic bag-of-bits extension - ext = '.bin' - filename = 'part-%03d%s' % (counter, ext) - counter += 1 - fp = open(os.path.join(dir, filename), 'wb') - fp.write(part.get_payload(decode=1)) - fp.close() - - -if __name__ == '__main__': - main() -\end{verbatim} +\verbatiminput{email-unpack.py} diff --git a/Doc/lib/emailencoders.tex b/Doc/lib/emailencoders.tex index 3e247a925634..cd54d68be9b1 100644 --- a/Doc/lib/emailencoders.tex +++ b/Doc/lib/emailencoders.tex @@ -17,8 +17,8 @@ set the \mailheader{Content-Transfer-Encoding} header as appropriate. Here are the encoding functions provided: \begin{funcdesc}{encode_quopri}{msg} -Encodes the payload into \emph{Quoted-Printable} form and sets the -\code{Content-Transfer-Encoding:} header to +Encodes the payload into quoted-printable form and sets the +\mailheader{Content-Transfer-Encoding} header to \code{quoted-printable}\footnote{Note that encoding with \method{encode_quopri()} also encodes all tabs and space characters in the data.}. @@ -27,11 +27,11 @@ printable data, but contains a few unprintable characters. \end{funcdesc} \begin{funcdesc}{encode_base64}{msg} -Encodes the payload into \emph{Base64} form and sets the +Encodes the payload into base64 form and sets the \mailheader{Content-Transfer-Encoding} header to \code{base64}. This is a good encoding to use when most of your payload is unprintable data since it is a more compact form than -Quoted-Printable. The drawback of Base64 encoding is that it +quoted-printable. The drawback of base64 encoding is that it renders the text non-human readable. \end{funcdesc} diff --git a/Doc/lib/emailexc.tex b/Doc/lib/emailexc.tex index 492924462ce0..824a276f1738 100644 --- a/Doc/lib/emailexc.tex +++ b/Doc/lib/emailexc.tex @@ -21,7 +21,7 @@ a message, this class is derived from \exception{MessageParseError}. It can be raised from the \method{Parser.parse()} or \method{Parser.parsestr()} methods. -Situations where it can be raised include finding a \emph{Unix-From} +Situations where it can be raised include finding an envelope header after the first \rfc{2822} header of the message, finding a continuation line before the first \rfc{2822} header is found, or finding a line in the headers which is neither a header or a continuation @@ -35,7 +35,8 @@ It can be raised from the \method{Parser.parse()} or \method{Parser.parsestr()} methods. Situations where it can be raised include not being able to find the -starting or terminating boundary in a \mimetype{multipart/*} message. +starting or terminating boundary in a \mimetype{multipart/*} message +when strict parsing is used. \end{excclassdesc} \begin{excclassdesc}{MultipartConversionError}{} @@ -45,4 +46,9 @@ message's \mailheader{Content-Type} main type is not either \mimetype{multipart} or missing. \exception{MultipartConversionError} multiply inherits from \exception{MessageError} and the built-in \exception{TypeError}. + +Since \method{Message.add_payload()} is deprecated, this exception is +rarely raised in practice. However the exception may also be raised +if the \method{attach()} method is called on an instance of a class +derived from \class{MIMENonMultipart} (e.g. \class{MIMEImage}). \end{excclassdesc} diff --git a/Doc/lib/emailgenerator.tex b/Doc/lib/emailgenerator.tex index 63ceb73d1d79..96eb2687a9e4 100644 --- a/Doc/lib/emailgenerator.tex +++ b/Doc/lib/emailgenerator.tex @@ -1,11 +1,11 @@ \declaremodule{standard}{email.Generator} -\modulesynopsis{Generate flat text email messages from a message object tree.} +\modulesynopsis{Generate flat text email messages from a message structure.} One of the most common tasks is to generate the flat text of the email -message represented by a message object tree. You will need to do +message represented by a message object structure. You will need to do this if you want to send your message via the \refmodule{smtplib} module or the \refmodule{nntplib} module, or print the message on the -console. Taking a message object tree and producing a flat text +console. Taking a message object structure and producing a flat text document is the job of the \class{Generator} class. Again, as with the \refmodule{email.Parser} module, you aren't limited @@ -13,10 +13,9 @@ to the functionality of the bundled generator; you could write one from scratch yourself. However the bundled generator knows how to generate most email in a standards-compliant way, should handle MIME and non-MIME email messages just fine, and is designed so that the -transformation from flat text, to an object tree via the -\class{Parser} class, -and back to flat text, is idempotent (the input is identical to the -output). +transformation from flat text, to a message structure via the +\class{Parser} class, and back to flat text, is idempotent (the input +is identical to the output). Here are the public methods of the \class{Generator} class: @@ -25,16 +24,18 @@ Here are the public methods of the \class{Generator} class: The constructor for the \class{Generator} class takes a file-like object called \var{outfp} for an argument. \var{outfp} must support the \method{write()} method and be usable as the output file in a -Python 2.0 extended print statement. +Python extended print statement. -Optional \var{mangle_from_} is a flag that, when true, puts a \samp{>} -character in front of any line in the body that starts exactly as -\samp{From } (i.e. \code{From} followed by a space at the front of the -line). This is the only guaranteed portable way to avoid having such -lines be mistaken for \emph{Unix-From} headers (see +Optional \var{mangle_from_} is a flag that, when \code{True}, puts a +\samp{>} character in front of any line in the body that starts exactly as +\samp{From }, i.e. \code{From} followed by a space at the beginning of the +line. This is the only guaranteed portable way to avoid having such +lines be mistaken for a Unix mailbox format envelope header separator (see \ulink{WHY THE CONTENT-LENGTH FORMAT IS BAD} {http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/content-length.html} -for details). +for details). \var{mangle_from_} defaults to \code{True}, but you +might want to set this to \code{False} if you are not writing Unix +mailbox format files. Optional \var{maxheaderlen} specifies the longest length for a non-continued header. When a header line is longer than @@ -47,20 +48,28 @@ recommended (but not required) by \rfc{2822}. The other public \class{Generator} methods are: -\begin{methoddesc}[Generator]{__call__}{msg\optional{, unixfrom}} -Print the textual representation of the message object tree rooted at +\begin{methoddesc}[Generator]{flatten}{msg\optional{, unixfrom}} +Print the textual representation of the message object structure rooted at \var{msg} to the output file specified when the \class{Generator} -instance was created. Sub-objects are visited depth-first and the +instance was created. Subparts are visited depth-first and the resulting text will be properly MIME encoded. Optional \var{unixfrom} is a flag that forces the printing of the -\emph{Unix-From} (a.k.a. envelope header or \code{From_} header) -delimiter before the first \rfc{2822} header of the root message -object. If the root object has no \emph{Unix-From} header, a standard -one is crafted. By default, this is set to 0 to inhibit the printing -of the \emph{Unix-From} delimiter. +envelope header delimiter before the first \rfc{2822} header of the +root message object. If the root object has no envelope header, a +standard one is crafted. By default, this is set to \code{False} to +inhibit the printing of the envelope delimiter. + +Note that for subparts, no envelope header is ever printed. -Note that for sub-objects, no \emph{Unix-From} header is ever printed. +\versionadded{2.2.2} +\end{methoddesc} + +\begin{methoddesc}[Generator]{clone}{fp} +Return an independent clone of this \class{Generator} instance with +the exact same options. + +\versionadded{2.2.2} \end{methoddesc} \begin{methoddesc}[Generator]{write}{s} @@ -74,3 +83,59 @@ As a convenience, see the methods \method{Message.as_string()} and \code{str(aMessage)}, a.k.a. \method{Message.__str__()}, which simplify the generation of a formatted string representation of a message object. For more detail, see \refmodule{email.Message}. + +The \module{email.Generator} module also provides a derived class, +called \class{DecodedGenerator} which is like the \class{Generator} +base class, except that non-\mimetype{text} parts are substituted with +a format string representing the part. + +\begin{classdesc}{DecodedGenerator}{outfp\optional{, mangle_from_\optional{, + maxheaderlen\optional{, fmt}}}} + +This class, derived from \class{Generator} walks through all the +subparts of a message. If the subpart is of main type +\mimetype{text}, then it prints the decoded payload of the subpart. +Optional \var{_mangle_from_} and \var{maxheaderlen} are as with the +\class{Generator} base class. + +If the subpart is not of main type \mimetype{text}, optional \var{fmt} +is a format string that is used instead of the message payload. +\var{fmt} is expanded with the following keywords, \samp{\%(keyword)s} +format: + +\begin{itemize} +\item \code{type} -- Full MIME type of the non-\mimetype{text} part + +\item \code{maintype} -- Main MIME type of the non-\mimetype{text} part + +\item \code{subtype} -- Sub-MIME type of the non-\mimetype{text} part + +\item \code{filename} -- Filename of the non-\mimetype{text} part + +\item \code{description} -- Description associated with the + non-\mimetype{text} part + +\item \code{encoding} -- Content transfer encoding of the + non-\mimetype{text} part + +\end{itemize} + +The default value for \var{fmt} is \code{None}, meaning + +\begin{verbatim} +[Non-text (%(type)s) part of message omitted, filename %(filename)s] +\end{verbatim} + +\versionadded{2.2.2} +\end{classdesc} + +\subsubsection{Deprecated methods} + +The following methods are deprecated in \module{email} version 2. +They are documented here for completeness. + +\begin{methoddesc}[Generator]{__call__}{msg\optional{, unixfrom}} +This method is identical to the \method{flatten()} method. + +\deprecated{2.2.2}{Use the \method{flatten()} method instead.} +\end{methoddesc} diff --git a/Doc/lib/emailiter.tex b/Doc/lib/emailiter.tex index eed98bef92f5..9180ac293eed 100644 --- a/Doc/lib/emailiter.tex +++ b/Doc/lib/emailiter.tex @@ -29,3 +29,35 @@ Thus, by default \function{typed_subpart_iterator()} returns each subpart that has a MIME type of \mimetype{text/*}. \end{funcdesc} +The following function has been added as a useful debugging tool. It +should \emph{not} be considered part of the supported public interface +for the package. + +\begin{funcdesc}{_structure}{msg\optional{, fp\optional{, level}}} +Prints an indented representation of the content types of the +message object structure. For example: + +\begin{verbatim} +>>> msg = email.message_from_file(somefile) +>>> _structure(msg) +multipart/mixed + text/plain + text/plain + multipart/digest + message/rfc822 + text/plain + message/rfc822 + text/plain + message/rfc822 + text/plain + message/rfc822 + text/plain + message/rfc822 + text/plain + text/plain +\end{verbatim} + +Optional \var{fp} is a file-like object to print the output to. It +must be suitable for Python's extended print statement. \var{level} +is used internally. +\end{funcdesc} diff --git a/Doc/lib/emailmessage.tex b/Doc/lib/emailmessage.tex index ecf24eb7faaa..bfd86647cbbd 100644 --- a/Doc/lib/emailmessage.tex +++ b/Doc/lib/emailmessage.tex @@ -12,12 +12,12 @@ values where the field name and value are separated by a colon. The colon is not part of either the field name or the field value. Headers are stored and returned in case-preserving form but are -matched case-insensitively. There may also be a single -\emph{Unix-From} header, also known as the envelope header or the +matched case-insensitively. There may also be a single envelope +header, also known as the \emph{Unix-From} header or the \code{From_} header. The payload is either a string in the case of -simple message objects, a list of \class{Message} objects for -multipart MIME documents, or a single \class{Message} instance for -\mimetype{message/rfc822} type objects. +simple message objects or a list of \class{Message} objects for +MIME container documents (e.g. \mimetype{multipart/*} and +\mimetype{message/rfc822}). \class{Message} objects provide a mapping style interface for accessing the message headers, and an explicit interface for accessing @@ -33,98 +33,116 @@ The constructor takes no arguments. \end{classdesc} \begin{methoddesc}[Message]{as_string}{\optional{unixfrom}} -Return the entire formatted message as a string. Optional -\var{unixfrom}, when true, specifies to include the \emph{Unix-From} -envelope header; it defaults to 0. +Return the entire message flatten as a string. When optional +\var{unixfrom} is \code{True}, the envelope header is included in the +returned string. \var{unixfrom} defaults to \code{False}. \end{methoddesc} -\begin{methoddesc}[Message]{__str__()}{} -Equivalent to \method{aMessage.as_string(unixfrom=1)}. +\begin{methoddesc}[Message]{__str__}{} +Equivalent to \method{as_string(unixfrom=True)}. \end{methoddesc} \begin{methoddesc}[Message]{is_multipart}{} -Return 1 if the message's payload is a list of sub-\class{Message} -objects, otherwise return 0. When \method{is_multipart()} returns 0, -the payload should either be a string object, or a single -\class{Message} instance. +Return \code{True} if the message's payload is a list of +sub-\class{Message} objects, otherwise return \code{False}. When +\method{is_multipart()} returns False, the payload should be a string +object. \end{methoddesc} \begin{methoddesc}[Message]{set_unixfrom}{unixfrom} -Set the \emph{Unix-From} (a.k.a envelope header or \code{From_} -header) to \var{unixfrom}, which should be a string. +Set the message's envelope header to \var{unixfrom}, which should be a string. \end{methoddesc} \begin{methoddesc}[Message]{get_unixfrom}{} -Return the \emph{Unix-From} header. Defaults to \code{None} if the -\emph{Unix-From} header was never set. -\end{methoddesc} - -\begin{methoddesc}[Message]{add_payload}{payload} -Add \var{payload} to the message object's existing payload. If, prior -to calling this method, the object's payload was \code{None} -(i.e. never before set), then after this method is called, the payload -will be the argument \var{payload}. - -If the object's payload was already a list -(i.e. \method{is_multipart()} returns 1), then \var{payload} is -appended to the end of the existing payload list. - -For any other type of existing payload, \method{add_payload()} will -transform the new payload into a list consisting of the old payload -and \var{payload}, but only if the document is already a MIME -multipart document. This condition is satisfied if the message's -\mailheader{Content-Type} header's main type is either -\mimetype{multipart}, or there is no \mailheader{Content-Type} -header. In any other situation, -\exception{MultipartConversionError} is raised. +Return the message's envelope header. Defaults to \code{None} if the +envelope header was never set. \end{methoddesc} \begin{methoddesc}[Message]{attach}{payload} -Synonymous with \method{add_payload()}. +Add the given \var{payload} to the current payload, which must be +\code{None} or a list of \class{Message} objects before the call. +After the call, the payload will always be a list of \class{Message} +objects. If you want to set the payload to a scalar object (e.g. a +string), use \method{set_payload()} instead. \end{methoddesc} \begin{methoddesc}[Message]{get_payload}{\optional{i\optional{, decode}}} -Return the current payload, which will be a list of \class{Message} -objects when \method{is_multipart()} returns 1, or a scalar (either a -string or a single \class{Message} instance) when -\method{is_multipart()} returns 0. +Return a reference the current payload, which will be a list of +\class{Message} objects when \method{is_multipart()} is \code{True}, or a +string when \method{is_multipart()} is \code{False}. If the +payload is a list and you mutate the list object, you modify the +message's payload in place. -With optional \var{i}, \method{get_payload()} will return the +With optional argument \var{i}, \method{get_payload()} will return the \var{i}-th element of the payload, counting from zero, if -\method{is_multipart()} returns 1. An \exception{IndexError} will be raised -if \var{i} is less than 0 or greater than or equal to the number of -items in the payload. If the payload is scalar -(i.e. \method{is_multipart()} returns 0) and \var{i} is given, a +\method{is_multipart()} is \code{True}. An \exception{IndexError} +will be raised if \var{i} is less than 0 or greater than or equal to +the number of items in the payload. If the payload is a string +(i.e. \method{is_multipart()} is \code{False}) and \var{i} is given, a \exception{TypeError} is raised. Optional \var{decode} is a flag indicating whether the payload should be decoded or not, according to the \mailheader{Content-Transfer-Encoding} header. -When true and the message is not a multipart, the payload will be +When \code{True} and the message is not a multipart, the payload will be decoded if this header's value is \samp{quoted-printable} or \samp{base64}. If some other encoding is used, or \mailheader{Content-Transfer-Encoding} header is missing, the payload is returned as-is (undecoded). If the message is -a multipart and the \var{decode} flag is true, then \code{None} is -returned. +a multipart and the \var{decode} flag is \code{True}, then \code{None} is +returned. The default for \var{decode} is \code{False}. \end{methoddesc} -\begin{methoddesc}[Message]{set_payload}{payload} +\begin{methoddesc}[Message]{set_payload}{payload\optional{, charset}} Set the entire message object's payload to \var{payload}. It is the -client's responsibility to ensure the payload invariants. +client's responsibility to ensure the payload invariants. Optional +\var{charset} sets the message's default character set; see +\method{set_charset()} for details. + +\versionchanged[\var{charset} argument added]{2.2.2} +\end{methoddesc} + +\begin{methoddesc}[Message]{set_charset}{charset} +Set the character set of the payload to \var{charset}, which can +either be a \class{Charset} instance (see \refmodule{email.Charset}), a +string naming a character set, +or \code{None}. If it is a string, it will be converted to a +\class{Charset} instance. If \var{charset} is \code{None}, the +\code{charset} parameter will be removed from the +\mailheader{Content-Type} header. Anything else will generate a +\exception{TypeError}. + +The message will be assumed to be of type \mimetype{text/*} encoded with +\code{charset.input_charset}. It will be converted to +\code{charset.output_charset} +and encoded properly, if needed, when generating the plain text +representation of the message. MIME headers +(\mailheader{MIME-Version}, \mailheader{Content-Type}, +\mailheader{Content-Transfer-Encoding}) will be added as needed. + +\versionadded{2.2.2} +\end{methoddesc} + +\begin{methoddesc}[Message]{get_charset}{} +Return the \class{Charset} instance associated with the message's payload. +\versionadded{2.2.2} \end{methoddesc} The following methods implement a mapping-like interface for accessing -the message object's \rfc{2822} headers. Note that there are some +the message's \rfc{2822} headers. Note that there are some semantic differences between these methods and a normal mapping (i.e. dictionary) interface. For example, in a dictionary there are no duplicate keys, but here there may be duplicate message headers. Also, in dictionaries there is no guaranteed order to the keys returned by -\method{keys()}, but in a \class{Message} object, there is an explicit -order. These semantic differences are intentional and are biased -toward maximal convenience. +\method{keys()}, but in a \class{Message} object, headers are always +returned in the order they appeared in the original message, or were +added to the message later. Any header deleted and then re-added are +always appended to the end of the header list. + +These semantic differences are intentional and are biased toward +maximal convenience. -Note that in all cases, any optional \emph{Unix-From} header the message -may have is not included in the mapping interface. +Note that in all cases, any envelope header present in the message is +not included in the mapping interface. \begin{methoddesc}[Message]{__len__}{} Return the total number of headers, including duplicates. @@ -161,8 +179,7 @@ fields. Note that this does \emph{not} overwrite or delete any existing header with the same name. If you want to ensure that the new header is the only one present in the message with field name -\var{name}, first use \method{__delitem__()} to delete all named -fields, e.g.: +\var{name}, delete the field first, e.g.: \begin{verbatim} del msg['subject'] @@ -177,32 +194,21 @@ present in the headers. \end{methoddesc} \begin{methoddesc}[Message]{has_key}{name} -Return 1 if the message contains a header field named \var{name}, -otherwise return 0. +Return true if the message contains a header field named \var{name}, +otherwise return false. \end{methoddesc} \begin{methoddesc}[Message]{keys}{} -Return a list of all the message's header field names. These keys -will be sorted in the order in which they were added to the message -via \method{__setitem__()}, and may contain duplicates. Any fields -deleted and then subsequently re-added are always appended to the end -of the header list. +Return a list of all the message's header field names. \end{methoddesc} \begin{methoddesc}[Message]{values}{} -Return a list of all the message's field values. These will be sorted -in the order in which they were added to the message via -\method{__setitem__()}, and may contain duplicates. Any fields -deleted and then subsequently re-added are always appended to the end -of the header list. +Return a list of all the message's field values. \end{methoddesc} \begin{methoddesc}[Message]{items}{} -Return a list of 2-tuples containing all the message's field headers and -values. These will be sorted in the order in which they were added to -the message via \method{__setitem__()}, and may contain duplicates. -Any fields deleted and then subsequently re-added are always appended -to the end of the header list. +Return a list of 2-tuples containing all the message's field headers +and values. \end{methoddesc} \begin{methoddesc}[Message]{get}{name\optional{, failobj}} @@ -214,12 +220,7 @@ if the named header is missing (defaults to \code{None}). Here are some additional useful methods: \begin{methoddesc}[Message]{get_all}{name\optional{, failobj}} -Return a list of all the values for the field named \var{name}. These -will be sorted in the order in which they were added to the message -via \method{__setitem__()}. Any fields -deleted and then subsequently re-added are always appended to the end -of the list. - +Return a list of all the values for the field named \var{name}. If there are no such named headers in the message, \var{failobj} is returned (defaults to \code{None}). \end{methoddesc} @@ -227,8 +228,8 @@ returned (defaults to \code{None}). \begin{methoddesc}[Message]{add_header}{_name, _value, **_params} Extended header setting. This method is similar to \method{__setitem__()} except that additional header parameters can be -provided as keyword arguments. \var{_name} is the header to set and -\var{_value} is the \emph{primary} value for the header. +provided as keyword arguments. \var{_name} is the header field to add +and \var{_value} is the \emph{primary} value for the header. For each item in the keyword argument dictionary \var{_params}, the key is taken as the parameter name, with underscores converted to @@ -249,43 +250,84 @@ Content-Disposition: attachment; filename="bud.gif" \end{verbatim} \end{methoddesc} -\begin{methoddesc}[Message]{get_type}{\optional{failobj}} -Return the message's content type, as a string of the form -\mimetype{maintype/subtype} as taken from the -\mailheader{Content-Type} header. -The returned string is coerced to lowercase. +\begin{methoddesc}[Message]{replace_header}{_name, _value} +Replace a header. Replace the first header found in the message that +matches \var{_name}, retaining header order and field name case. If +no matching header was found, a \exception{KeyError} is raised. -If there is no \mailheader{Content-Type} header in the message, -\var{failobj} is returned (defaults to \code{None}). +\versionadded{2.2.2} \end{methoddesc} -\begin{methoddesc}[Message]{get_main_type}{\optional{failobj}} -Return the message's \emph{main} content type. This essentially returns the -\var{maintype} part of the string returned by \method{get_type()}, with the -same semantics for \var{failobj}. +\begin{methoddesc}[Message]{get_content_type}{} +Return the message's content type. The returned string is coerced to +lower case of the form \mimetype{maintype/subtype}. If there was no +\mailheader{Content-Type} header in the message the default type as +given by \method{get_default_type()} will be returned. Since +according to \rfc{2045}, messages always have a default type, +\method{get_content_type()} will always return a value. + +\rfc{2045} defines a message's default type to be +\mimetype{text/plain} unless it appears inside a +\mimetype{multipart/digest} container, in which case it would be +\mimetype{message/rfc822}. If the \mailheader{Content-Type} header +has an invalid type specification, \rfc{2045} mandates that the +default type be \mimetype{text/plain}. + +\versionadded{2.2.2} \end{methoddesc} -\begin{methoddesc}[Message]{get_subtype}{\optional{failobj}} -Return the message's sub-content type. This essentially returns the -\var{subtype} part of the string returned by \method{get_type()}, with the -same semantics for \var{failobj}. +\begin{methoddesc}[Message]{get_content_maintype}{} +Return the message's main content type. This is the +\mimetype{maintype} part of the string returned by +\method{get_content_type()}. + +\versionadded{2.2.2} \end{methoddesc} -\begin{methoddesc}[Message]{get_params}{\optional{failobj\optional{, header}}} +\begin{methoddesc}[Message]{get_content_subtype}{} +Return the message's sub-content type. This is the \mimetype{subtype} +part of the string returned by \method{get_content_type()}. + +\versionadded{2.2.2} +\end{methoddesc} + +\begin{methoddesc}[Message]{get_default_type}{} +Return the default content type. Most messages have a default content +type of \mimetype{text/plain}, except for messages that are subparts +of \mimetype{multipart/digest} containers. Such subparts have a +default content type of \mimetype{message/rfc822}. + +\versionadded{2.2.2} +\end{methoddesc} + +\begin{methoddesc}[Message]{set_default_type}{ctype} +Set the default content type. \var{ctype} should either be +\mimetype{text/plain} or \mimetype{message/rfc822}, although this is +not enforced. The default content type is not stored in the +\mailheader{Content-Type} header. + +\versionadded{2.2.2} +\end{methoddesc} + +\begin{methoddesc}[Message]{get_params}{\optional{failobj\optional{, + header\optional{, unquote}}}} Return the message's \mailheader{Content-Type} parameters, as a list. The elements of the returned list are 2-tuples of key/value pairs, as split on the \character{=} sign. The left hand side of the \character{=} is the key, while the right hand side is the value. If there is no \character{=} sign in the parameter the value is the empty -string. The value is always unquoted with \method{Utils.unquote()}. +string, otherwise the value is as described in \method{get_param()} and is +unquoted if optional \var{unquote} is \code{True} (the default). Optional \var{failobj} is the object to return if there is no \mailheader{Content-Type} header. Optional \var{header} is the header to search instead of \mailheader{Content-Type}. + +\versionchanged[\var{unquote} argument added]{2.2.2} \end{methoddesc} \begin{methoddesc}[Message]{get_param}{param\optional{, - failobj\optional{, header}}} + failobj\optional{, header\optional{, unquote}}}} Return the value of the \mailheader{Content-Type} header's parameter \var{param} as a string. If the message has no \mailheader{Content-Type} header or if there is no such parameter, then \var{failobj} is @@ -293,20 +335,80 @@ returned (defaults to \code{None}). Optional \var{header} if given, specifies the message header to use instead of \mailheader{Content-Type}. + +Parameter keys are always compared case insensitively. The return +value can either be a string, or a 3-tuple if the parameter was +\rfc{2231} encoded. When it's a 3-tuple, the elements of the value are of +the form \code{(CHARSET, LANGUAGE, VALUE)}, where \code{LANGUAGE} may +be the empty string. Your application should be prepared to deal with +3-tuple return values, which it can convert to a Unicode string like +so: + +\begin{verbatim} +param = msg.get_param('foo') +if isinstance(param, tuple): + param = unicode(param[2], param[0]) +\end{verbatim} + +In any case, the parameter value (either the returned string, or the +\code{VALUE} item in the 3-tuple) is always unquoted, unless +\var{unquote} is set to \code{False}. + +\versionchanged[\var{unquote} argument added, and 3-tuple return value +possible]{2.2.2} \end{methoddesc} -\begin{methoddesc}[Message]{get_charsets}{\optional{failobj}} -Return a list containing the character set names in the message. If -the message is a \mimetype{multipart}, then the list will contain one -element for each subpart in the payload, otherwise, it will be a list -of length 1. +\begin{methoddesc}[Message]{set_param}{param, value\optional{, + header\optional{, requote\optional{, charset\optional{, language}}}}} -Each item in the list will be a string which is the value of the -\code{charset} parameter in the \mailheader{Content-Type} header for the -represented subpart. However, if the subpart has no -\mailheader{Content-Type} header, no \code{charset} parameter, or is not of -the \mimetype{text} main MIME type, then that item in the returned list -will be \var{failobj}. +Set a parameter in the \mailheader{Content-Type} header. If the +parameter already exists in the header, its value will be replaced +with \var{value}. If the \mailheader{Content-Type} header as not yet +been defined for this message, it will be set to \mimetype{text/plain} +and the new parameter value will be appended as per \rfc{2045}. + +Optional \var{header} specifies an alternative header to +\mailheader{Content-Type}, and all parameters will be quoted as +necessary unless optional \var{requote} is \code{False} (the default +is \code{True}). + +If optional \var{charset} is specified, the parameter will be encoded +according to \rfc{2231}. Optional \var{language} specifies the RFC +2231 language, defaulting to the empty string. Both \var{charset} and +\var{language} should be strings. + +\versionadded{2.2.2} +\end{methoddesc} + +\begin{methoddesc}[Message]{del_param}{param\optional{, header\optional{, + requote}}} +Remove the given parameter completely from the +\mailheader{Content-Type} header. The header will be re-written in +place without the parameter or its value. All values will be quoted +as necessary unless \var{requote} is \code{False} (the default is +\code{True}). Optional \var{header} specifies an alternative to +\mailheader{Content-Type}. + +\versionadded{2.2.2} +\end{methoddesc} + +\begin{methoddesc}[Message]{set_type}{type\optional{, header}\optional{, + requote}} +Set the main type and subtype for the \mailheader{Content-Type} +header. \var{type} must be a string in the form +\mimetype{maintype/subtype}, otherwise a \exception{ValueError} is +raised. + +This method replaces the \mailheader{Content-Type} header, keeping all +the parameters in place. If \var{requote} is \code{False}, this +leaves the existing header's quoting as is, otherwise the parameters +will be quoted (the default). + +An alternative header can be specified in the \var{header} argument. +When the \mailheader{Content-Type} header is set a +\mailheader{MIME-Version} header is also added. + +\versionadded{2.2.2} \end{methoddesc} \begin{methoddesc}[Message]{get_filename}{\optional{failobj}} @@ -326,11 +428,10 @@ returned string will always be unquoted as per \end{methoddesc} \begin{methoddesc}[Message]{set_boundary}{boundary} -Set the \code{boundary} parameter of the \mailheader{Content-Type} header -to \var{boundary}. \method{set_boundary()} will always quote -\var{boundary} so you should not quote it yourself. A -\exception{HeaderParseError} is raised if the message object has no -\mailheader{Content-Type} header. +Set the \code{boundary} parameter of the \mailheader{Content-Type} +header to \var{boundary}. \method{set_boundary()} will always quote +\var{boundary} if necessary. A \exception{HeaderParseError} is raised +if the message object has no \mailheader{Content-Type} header. Note that using this method is subtly different than deleting the old \mailheader{Content-Type} header and adding a new one with the new boundary @@ -340,19 +441,45 @@ However, it does \emph{not} preserve any continuation lines which may have been present in the original \mailheader{Content-Type} header. \end{methoddesc} +\begin{methoddesc}[Message]{get_content_charset}{\optional{failobj}} +Return the \code{charset} parameter of the \mailheader{Content-Type} +header. If there is no \mailheader{Content-Type} header, or if that +header has no \code{charset} parameter, \var{failobj} is returned. + +Note that this method differs from \method{get_charset()} which +returns the \class{Charset} instance for the default encoding of the +message body. + +\versionadded{2.2.2} +\end{methoddesc} + +\begin{methoddesc}[Message]{get_charsets}{\optional{failobj}} +Return a list containing the character set names in the message. If +the message is a \mimetype{multipart}, then the list will contain one +element for each subpart in the payload, otherwise, it will be a list +of length 1. + +Each item in the list will be a string which is the value of the +\code{charset} parameter in the \mailheader{Content-Type} header for the +represented subpart. However, if the subpart has no +\mailheader{Content-Type} header, no \code{charset} parameter, or is not of +the \mimetype{text} main MIME type, then that item in the returned list +will be \var{failobj}. +\end{methoddesc} + \begin{methoddesc}[Message]{walk}{} The \method{walk()} method is an all-purpose generator which can be used to iterate over all the parts and subparts of a message object tree, in depth-first traversal order. You will typically use -\method{walk()} as the iterator in a \code{for ... in} loop; each +\method{walk()} as the iterator in a \code{for} loop; each iteration returns the next subpart. -Here's an example that prints the MIME type of every part of a message -object tree: +Here's an example that prints the MIME type of every part of a +multipart message structure: \begin{verbatim} >>> for part in msg.walk(): ->>> print part.get_type('text/plain') +>>> print part.get_content_type() multipart/report text/plain message/delivery-status @@ -380,7 +507,8 @@ the headers but before the first boundary string, it assigns this text to the message's \var{preamble} attribute. When the \class{Generator} is writing out the plain text representation of a MIME message, and it finds the message has a \var{preamble} attribute, it will write this -text in the area between the headers and the first boundary. +text in the area between the headers and the first boundary. See +\refmodule{email.Parser} and \refmodule{email.Generator} for details. Note that if the message object has no preamble, the \var{preamble} attribute will be \code{None}. @@ -401,3 +529,59 @@ practical sense. The upshot is that if you want to ensure that a newline get printed after your closing \mimetype{multipart} boundary, set the \var{epilogue} to the empty string. \end{datadesc} + +\subsubsection{Deprecated methods} + +The following methods are deprecated in \module{email} version 2. +They are documented here for completeness. + +\begin{methoddesc}[Message]{add_payload}{payload} +Add \var{payload} to the message object's existing payload. If, prior +to calling this method, the object's payload was \code{None} +(i.e. never before set), then after this method is called, the payload +will be the argument \var{payload}. + +If the object's payload was already a list +(i.e. \method{is_multipart()} returns 1), then \var{payload} is +appended to the end of the existing payload list. + +For any other type of existing payload, \method{add_payload()} will +transform the new payload into a list consisting of the old payload +and \var{payload}, but only if the document is already a MIME +multipart document. This condition is satisfied if the message's +\mailheader{Content-Type} header's main type is either +\mimetype{multipart}, or there is no \mailheader{Content-Type} +header. In any other situation, +\exception{MultipartConversionError} is raised. + +\deprecated{2.2.2}{Use the \method{attach()} method instead.} +\end{methoddesc} + +\begin{methoddesc}[Message]{get_type}{\optional{failobj}} +Return the message's content type, as a string of the form +\mimetype{maintype/subtype} as taken from the +\mailheader{Content-Type} header. +The returned string is coerced to lowercase. + +If there is no \mailheader{Content-Type} header in the message, +\var{failobj} is returned (defaults to \code{None}). + +\deprecated{2.2.2}{Use the \method{get_content_type()} method instead.} +\end{methoddesc} + +\begin{methoddesc}[Message]{get_main_type}{\optional{failobj}} +Return the message's \emph{main} content type. This essentially returns the +\var{maintype} part of the string returned by \method{get_type()}, with the +same semantics for \var{failobj}. + +\deprecated{2.2.2}{Use the \method{get_content_maintype()} method instead.} +\end{methoddesc} + +\begin{methoddesc}[Message]{get_subtype}{\optional{failobj}} +Return the message's sub-content type. This essentially returns the +\var{subtype} part of the string returned by \method{get_type()}, with the +same semantics for \var{failobj}. + +\deprecated{2.2.2}{Use the \method{get_content_subtype()} method instead.} +\end{methoddesc} + diff --git a/Doc/lib/emailparser.tex b/Doc/lib/emailparser.tex index 40ce8530282b..706ecbbf1f3b 100644 --- a/Doc/lib/emailparser.tex +++ b/Doc/lib/emailparser.tex @@ -1,20 +1,20 @@ \declaremodule{standard}{email.Parser} \modulesynopsis{Parse flat text email messages to produce a message - object tree.} + object structure.} -Message object trees can be created in one of two ways: they can be +Message object structures can be created in one of two ways: they can be created from whole cloth by instantiating \class{Message} objects and -stringing them together via \method{add_payload()} and +stringing them together via \method{attach()} and \method{set_payload()} calls, or they can be created by parsing a flat text representation of the email message. The \module{email} package provides a standard parser that understands most email document structures, including MIME documents. You can pass the parser a string or a file object, and the parser will return -to you the root \class{Message} instance of the object tree. For +to you the root \class{Message} instance of the object structure. For simple, non-MIME messages the payload of this root object will likely be a string containing the text of the message. For MIME -messages, the root object will return true from its +messages, the root object will return \code{True} from its \method{is_multipart()} method, and the subparts can be accessed via the \method{get_payload()} and \method{walk()} methods. @@ -27,61 +27,95 @@ message object trees any way it finds necessary. The primary parser class is \class{Parser} which parses both the headers and the payload of the message. In the case of \mimetype{multipart} messages, it will recursively parse the body of -the container message. The \module{email.Parser} module also provides -a second class, called \class{HeaderParser} which can be used if -you're only interested in the headers of the message. -\class{HeaderParser} can be much faster in this situations, since it -does not attempt to parse the message body, instead setting the -payload to the raw body as a string. \class{HeaderParser} has the -same API as the \class{Parser} class. +the container message. Two modes of parsing are supported, +\emph{strict} parsing, which will usually reject any non-RFC compliant +message, and \emph{lax} parsing, which attempts to adjust for common +MIME formatting problems. + +The \module{email.Parser} module also provides a second class, called +\class{HeaderParser} which can be used if you're only interested in +the headers of the message. \class{HeaderParser} can be much faster in +these situations, since it does not attempt to parse the message body, +instead setting the payload to the raw body as a string. +\class{HeaderParser} has the same API as the \class{Parser} class. \subsubsection{Parser class API} -\begin{classdesc}{Parser}{\optional{_class}} -The constructor for the \class{Parser} class takes a single optional +\begin{classdesc}{Parser}{\optional{_class\optional{, strict}}} +The constructor for the \class{Parser} class takes an optional argument \var{_class}. This must be a callable factory (such as a function or a class), and it is used whenever a sub-message object needs to be created. It defaults to \class{Message} (see \refmodule{email.Message}). The factory will be called without arguments. + +The optional \var{strict} flag specifies whether strict or lax parsing +should be performed. Normally, when things like MIME terminating +boundaries are missing, or when messages contain other formatting +problems, the \class{Parser} will raise a +\exception{MessageParseError}. However, when lax parsing is enabled, +the \class{Parser} will attempt to work around such broken formatting +to produce a usable message structure (this doesn't mean +\exception{MessageParseError}s are never raised; some ill-formatted +messages just can't be parsed). The \var{strict} flag defaults to +\code{False} since lax parsing usually provides the most convenient +behavior. + +\versionchanged[The \var{strict} flag was added]{2.2.2} \end{classdesc} The other public \class{Parser} methods are: -\begin{methoddesc}[Parser]{parse}{fp} +\begin{methoddesc}[Parser]{parse}{fp\optional{, headersonly}} Read all the data from the file-like object \var{fp}, parse the resulting text, and return the root message object. \var{fp} must support both the \method{readline()} and the \method{read()} methods on file-like objects. The text contained in \var{fp} must be formatted as a block of \rfc{2822} -style headers and header continuation lines, optionally preceeded by a -\emph{Unix-From} header. The header block is terminated either by the +style headers and header continuation lines, optionally preceded by a +envelope header. The header block is terminated either by the end of the data or by a blank line. Following the header block is the body of the message (which may contain MIME-encoded subparts). + +Optional \var{headersonly} is as with the \method{parse()} method. + +\versionchanged[The \var{headersonly} flag was added]{2.2.2} \end{methoddesc} -\begin{methoddesc}[Parser]{parsestr}{text} +\begin{methoddesc}[Parser]{parsestr}{text\optional{, headersonly}} Similar to the \method{parse()} method, except it takes a string object instead of a file-like object. Calling this method on a string is exactly equivalent to wrapping \var{text} in a \class{StringIO} instance first and calling \method{parse()}. + +Optional \var{headersonly} is a flag specifying whether to stop +parsing after reading the headers or not. The default is \code{False}, +meaning it parses the entire contents of the file. + +\versionchanged[The \var{headersonly} flag was added]{2.2.2} \end{methoddesc} -Since creating a message object tree from a string or a file object is -such a common task, two functions are provided as a convenience. They -are available in the top-level \module{email} package namespace. +Since creating a message object structure from a string or a file +object is such a common task, two functions are provided as a +convenience. They are available in the top-level \module{email} +package namespace. + +\begin{funcdesc}{message_from_string}{s\optional{, _class\optional{, strict}}} +Return a message object structure from a string. This is exactly +equivalent to \code{Parser().parsestr(s)}. Optional \var{_class} and +\var{strict} are interpreted as with the \class{Parser} class constructor. -\begin{funcdesc}{message_from_string}{s\optional{, _class}} -Return a message object tree from a string. This is exactly -equivalent to \code{Parser().parsestr(s)}. Optional \var{_class} is -interpreted as with the \class{Parser} class constructor. +\versionchanged[The \var{strict} flag was added]{2.2.2} \end{funcdesc} -\begin{funcdesc}{message_from_file}{fp\optional{, _class}} -Return a message object tree from an open file object. This is exactly -equivalent to \code{Parser().parse(fp)}. Optional \var{_class} is -interpreted as with the \class{Parser} class constructor. +\begin{funcdesc}{message_from_file}{fp\optional{, _class\optional{, strict}}} +Return a message object structure tree from an open file object. This +is exactly equivalent to \code{Parser().parse(fp)}. Optional +\var{_class} and \var{strict} are interpreted as with the +\class{Parser} class constructor. + +\versionchanged[The \var{strict} flag was added]{2.2.2} \end{funcdesc} Here's an example of how you might use this at an interactive Python @@ -99,15 +133,20 @@ Here are some notes on the parsing semantics: \begin{itemize} \item Most non-\mimetype{multipart} type messages are parsed as a single message object with a string payload. These objects will return - 0 for \method{is_multipart()}. -\item One exception is for \mimetype{message/delivery-status} type - messages. Because the body of such messages consist of - blocks of headers, \class{Parser} will create a non-multipart - object containing non-multipart subobjects for each header - block. -\item Another exception is for \mimetype{message/*} types (more - general than \mimetype{message/delivery-status}). These are - typically \mimetype{message/rfc822} messages, represented as a - non-multipart object containing a singleton payload which is - another non-multipart \class{Message} instance. + \code{False} for \method{is_multipart()}. Their + \method{get_payload()} method will return a string object. + +\item All \mimetype{multipart} type messages will be parsed as a + container message object with a list of sub-message objects for + their payload. The outer container message will return + \code{True} for \method{is_multipart()} and their + \method{get_payload()} method will return the list of + \class{Message} subparts. + +\item Most messages with a content type of \mimetype{message/*} + (e.g. \mimetype{message/deliver-status} and + \mimetype{message/rfc822}) will also be parsed as container + object containing a list payload of length 1. Their + \method{is_multipart()} method will return \code{True}. The + single element in the list payload will be a sub-message object. \end{itemize} diff --git a/Doc/lib/emailutil.tex b/Doc/lib/emailutil.tex index 75f37987049f..80f0acfd3713 100644 --- a/Doc/lib/emailutil.tex +++ b/Doc/lib/emailutil.tex @@ -6,7 +6,7 @@ package. \begin{funcdesc}{quote}{str} Return a new string with backslashes in \var{str} replaced by two -backslashes and double quotes replaced by backslash-double quote. +backslashes, and double quotes replaced by backslash-double quote. \end{funcdesc} \begin{funcdesc}{unquote}{str} @@ -21,10 +21,10 @@ Parse address -- which should be the value of some address-containing field such as \mailheader{To} or \mailheader{Cc} -- into its constituent \emph{realname} and \emph{email address} parts. Returns a tuple of that information, unless the parse fails, in which case a 2-tuple of -\code{(None, None)} is returned. +\code{('', '')} is returned. \end{funcdesc} -\begin{funcdesc}{dump_address_pair}{pair} +\begin{funcdesc}{formataddr}{pair} The inverse of \method{parseaddr()}, this takes a 2-tuple of the form \code{(realname, email_address)} and returns the string value suitable for a \mailheader{To} or \mailheader{Cc} header. If the first element of @@ -48,27 +48,6 @@ all_recipients = getaddresses(tos + ccs + resent_tos + resent_ccs) \end{verbatim} \end{funcdesc} -\begin{funcdesc}{decode}{s} -This method decodes a string according to the rules in \rfc{2047}. It -returns the decoded string as a Python unicode string. -\end{funcdesc} - -\begin{funcdesc}{encode}{s\optional{, charset\optional{, encoding}}} -This method encodes a string according to the rules in \rfc{2047}. It -is not actually the inverse of \function{decode()} since it doesn't -handle multiple character sets or multiple string parts needing -encoding. In fact, the input string \var{s} must already be encoded -in the \var{charset} character set (Python can't reliably guess what -character set a string might be encoded in). The default -\var{charset} is \samp{iso-8859-1}. - -\var{encoding} must be either the letter \character{q} for -Quoted-Printable or \character{b} for Base64 encoding. If -neither, a \exception{ValueError} is raised. Both the \var{charset} and -the \var{encoding} strings are case-insensitive, and coerced to lower -case in the returned string. -\end{funcdesc} - \begin{funcdesc}{parsedate}{date} Attempts to parse a date according to the rules in \rfc{2822}. however, some mailers don't follow that format as specified, so @@ -106,7 +85,7 @@ common use. \end{funcdesc} \begin{funcdesc}{formatdate}{\optional{timeval\optional{, localtime}}} -Returns a date string as per Internet standard \rfc{2822}, e.g.: +Returns a date string as per \rfc{2822}, e.g.: \begin{verbatim} Fri, 09 Nov 2001 01:08:47 -0000 @@ -116,7 +95,48 @@ Optional \var{timeval} if given is a floating point time value as accepted by \function{time.gmtime()} and \function{time.localtime()}, otherwise the current time is used. -Optional \var{localtime} is a flag that when true, interprets +Optional \var{localtime} is a flag that when \code{True}, interprets \var{timeval}, and returns a date relative to the local timezone instead of UTC, properly taking daylight savings time into account. +The default is \code{False} meaning UTC is used. +\end{funcdesc} + +\begin{funcdesc}{make_msgid}{\optional{idstring}} +Returns a string suitable for an \rfc{2822}-compliant +\mailheader{Message-ID} header. Optional \var{idstring} if given, is +a string used to strengthen the uniqueness of the message id. +\end{funcdesc} + +\begin{funcdesc}{decode_rfc2231}{s} +Decode the string \var{s} according to \rfc{2231}. +\end{funcdesc} + +\begin{funcdesc}{encode_rfc2231}{s\optional{, charset\optional{, language}}} +Encode the string \var{s} according to \rfc{2231}. Optional +\var{charset} and \var{language}, if given is the character set name +and language name to use. If neither is given, \var{s} is returned +as-is. If \var{charset} is given but \var{language} is not, the +string is encoded using the empty string for \var{language}. \end{funcdesc} + +\begin{funcdesc}{decode_params}{params} +Decode parameters list according to \rfc{2231}. \var{params} is a +sequence of 2-tuples containing elements of the form +\code{(content-type, string-value)}. +\end{funcdesc} + +The following functions have been deprecated: + +\begin{funcdesc}{dump_address_pair}{pair} +\deprecated{2.2.2}{Use \function{formataddr()} instead.} +\end{funcdesc} + +\begin{funcdesc}{decode}{s} +\deprecated{2.2.2}{Use \method{Header.decode_header()} instead.} +\end{funcdesc} + + +\begin{funcdesc}{encode}{s\optional{, charset\optional{, encoding}}} +\deprecated{2.2.2}{Use \method{Header.encode()} instead.} +\end{funcdesc} + diff --git a/Lib/email/Encoders.py b/Lib/email/Encoders.py index d9cd42d45cb0..5460fdb956bb 100644 --- a/Lib/email/Encoders.py +++ b/Lib/email/Encoders.py @@ -1,17 +1,38 @@ -# Copyright (C) 2001 Python Software Foundation +# Copyright (C) 2001,2002 Python Software Foundation # Author: barry@zope.com (Barry Warsaw) """Module containing encoding functions for Image.Image and Text.Text. """ import base64 -from quopri import encodestring as _encodestring # Helpers -def _qencode(s): - return _encodestring(s, quotetabs=1) +try: + from quopri import encodestring as _encodestring + + def _qencode(s): + enc = _encodestring(s, quotetabs=1) + # Must encode spaces, which quopri.encodestring() doesn't do + return enc.replace(' ', '=20') +except ImportError: + # Python 2.1 doesn't have quopri.encodestring() + from cStringIO import StringIO + import quopri as _quopri + + def _qencode(s): + if not s: + return s + hasnewline = (s[-1] == '\n') + infp = StringIO(s) + outfp = StringIO() + _quopri.encode(infp, outfp, quotetabs=1) + # Python 2.x's encode() doesn't encode spaces even when quotetabs==1 + value = outfp.getvalue().replace(' ', '=20') + if not hasnewline and value[-1] == '\n': + return value[:-1] + return value def _bencode(s): @@ -30,7 +51,7 @@ def _bencode(s): def encode_base64(msg): """Encode the message's payload in Base64. - Also, add an appropriate Content-Transfer-Encoding: header. + Also, add an appropriate Content-Transfer-Encoding header. """ orig = msg.get_payload() encdata = _bencode(orig) @@ -40,9 +61,9 @@ def encode_base64(msg): def encode_quopri(msg): - """Encode the message's payload in Quoted-Printable. + """Encode the message's payload in quoted-printable. - Also, add an appropriate Content-Transfer-Encoding: header. + Also, add an appropriate Content-Transfer-Encoding header. """ orig = msg.get_payload() encdata = _qencode(orig) @@ -52,8 +73,12 @@ def encode_quopri(msg): def encode_7or8bit(msg): - """Set the Content-Transfer-Encoding: header to 7bit or 8bit.""" + """Set the Content-Transfer-Encoding header to 7bit or 8bit.""" orig = msg.get_payload() + if orig is None: + # There's no payload. For backwards compatibility we use 7bit + msg['Content-Transfer-Encoding'] = '7bit' + return # We play a trick to make this go fast. If encoding to ASCII succeeds, we # know the data must be 7bit, otherwise treat it as 8bit. try: diff --git a/Lib/email/Errors.py b/Lib/email/Errors.py index 71d7663c3471..93485dedff7f 100644 --- a/Lib/email/Errors.py +++ b/Lib/email/Errors.py @@ -1,4 +1,4 @@ -# Copyright (C) 2001 Python Software Foundation +# Copyright (C) 2001,2002 Python Software Foundation # Author: barry@zope.com (Barry Warsaw) """email package exception classes. @@ -7,7 +7,7 @@ class MessageError(Exception): - """Base class for errors in this module.""" + """Base class for errors in the email package.""" class MessageParseError(MessageError): diff --git a/Lib/email/Generator.py b/Lib/email/Generator.py index 8849d20ae489..7f05218d4859 100644 --- a/Lib/email/Generator.py +++ b/Lib/email/Generator.py @@ -1,4 +1,4 @@ -# Copyright (C) 2001 Python Software Foundation +# Copyright (C) 2001,2002 Python Software Foundation # Author: barry@zope.com (Barry Warsaw) """Classes to generate plain text from a message object tree. @@ -8,12 +8,21 @@ import time import re import random -from types import ListType, StringType +from types import ListType from cStringIO import StringIO -# Intrapackage imports -import Message -import Errors +from email.Header import Header + +try: + from email._compat22 import _isstring +except SyntaxError: + from email._compat21 import _isstring + +try: + True, False +except NameError: + True = 1 + False = 0 EMPTYSTRING = '' SEMISPACE = '; ' @@ -38,14 +47,15 @@ class Generator: # Public interface # - def __init__(self, outfp, mangle_from_=1, maxheaderlen=78): + def __init__(self, outfp, mangle_from_=True, maxheaderlen=78): """Create the generator for message flattening. outfp is the output file-like object for writing the message to. It must have a write() method. - Optional mangle_from_ is a flag that, when true, escapes From_ lines - in the body of the message by putting a `>' in front of them. + Optional mangle_from_ is a flag that, when True (the default), escapes + From_ lines in the body of the message by putting a `>' in front of + them. Optional maxheaderlen specifies the longest length for a non-continued header. When a header line is longer (in characters, with tabs @@ -57,21 +67,20 @@ class Generator: """ self._fp = outfp self._mangle_from_ = mangle_from_ - self.__first = 1 self.__maxheaderlen = maxheaderlen def write(self, s): # Just delegate to the file object self._fp.write(s) - def __call__(self, msg, unixfrom=0): + def flatten(self, msg, unixfrom=False): """Print the message object tree rooted at msg to the output file specified when the Generator instance was created. unixfrom is a flag that forces the printing of a Unix From_ delimiter before the first object in the message tree. If the original message has no From_ delimiter, a `standard' one is crafted. By default, this - is 0 to inhibit the printing of any From_ delimiter. + is False to inhibit the printing of any From_ delimiter. Note that for subobjects, no From_ line is printed. """ @@ -82,6 +91,13 @@ class Generator: print >> self._fp, ufrom self._write(msg) + # For backwards compatibility, but this is slower + __call__ = flatten + + def clone(self, fp): + """Clone this generator with the exact same options.""" + return self.__class__(fp, self._mangle_from_, self.__maxheaderlen) + # # Protected interface - undocumented ;/ # @@ -115,23 +131,19 @@ class Generator: def _dispatch(self, msg): # Get the Content-Type: for the message, then try to dispatch to - # self._handle_maintype_subtype(). If there's no handler for the full - # MIME type, then dispatch to self._handle_maintype(). If that's - # missing too, then dispatch to self._writeBody(). - ctype = msg.get_type() - if ctype is None: - # No Content-Type: header so try the default handler - self._writeBody(msg) - else: - # We do have a Content-Type: header. - specific = UNDERSCORE.join(ctype.split('/')).replace('-', '_') - meth = getattr(self, '_handle_' + specific, None) + # self._handle__(). If there's no handler for the + # full MIME type, then dispatch to self._handle_(). If + # that's missing too, then dispatch to self._writeBody(). + main = msg.get_content_maintype() + sub = msg.get_content_subtype() + specific = UNDERSCORE.join((main, sub)).replace('-', '_') + meth = getattr(self, '_handle_' + specific, None) + if meth is None: + generic = main.replace('-', '_') + meth = getattr(self, '_handle_' + generic, None) if meth is None: - generic = msg.get_main_type().replace('-', '_') - meth = getattr(self, '_handle_' + generic, None) - if meth is None: - meth = self._writeBody - meth(msg) + meth = self._writeBody + meth(msg) # # Default handlers @@ -139,12 +151,6 @@ class Generator: def _write_headers(self, msg): for h, v in msg.items(): - # We only write the MIME-Version: header for the outermost - # container message. Unfortunately, we can't use same technique - # as for the Unix-From above because we don't know when - # MIME-Version: will occur. - if h.lower() == 'mime-version' and not self.__first: - continue # RFC 2822 says that lines SHOULD be no more than maxheaderlen # characters wide, so we're well within our rights to split long # headers. @@ -160,7 +166,7 @@ class Generator: # Find out whether any lines in the header are really longer than # maxheaderlen characters wide. There could be continuation lines # that actually shorten it. Also, replace hard tabs with 8 spaces. - lines = [s.replace('\t', SPACE8) for s in text.split('\n')] + lines = [s.replace('\t', SPACE8) for s in text.splitlines()] for line in lines: if len(line) > maxheaderlen: break @@ -168,52 +174,12 @@ class Generator: # No line was actually longer than maxheaderlen characters, so # just return the original unchanged. return text - rtn = [] - for line in text.split('\n'): - # Short lines can remain unchanged - if len(line.replace('\t', SPACE8)) <= maxheaderlen: - rtn.append(line) - SEMINLTAB.join(rtn) - else: - oldlen = len(text) - # Try to break the line on semicolons, but if that doesn't - # work, try to split on folding whitespace. - while len(text) > maxheaderlen: - i = text.rfind(';', 0, maxheaderlen) - if i < 0: - break - rtn.append(text[:i]) - text = text[i+1:].lstrip() - if len(text) <> oldlen: - # Splitting on semis worked - rtn.append(text) - return SEMINLTAB.join(rtn) - # Splitting on semis didn't help, so try to split on - # whitespace. - parts = re.split(r'(\s+)', text) - # Watch out though for "Header: longnonsplittableline" - if parts[0].endswith(':') and len(parts) == 3: - return text - first = parts.pop(0) - sublines = [first] - acc = len(first) - while parts: - len0 = len(parts[0]) - len1 = len(parts[1]) - if acc + len0 + len1 < maxheaderlen: - sublines.append(parts.pop(0)) - sublines.append(parts.pop(0)) - acc += len0 + len1 - else: - # Split it here, but don't forget to ignore the - # next whitespace-only part - rtn.append(EMPTYSTRING.join(sublines)) - del parts[0] - first = parts.pop(0) - sublines = [first] - acc = len(first) - rtn.append(EMPTYSTRING.join(sublines)) - return NLTAB.join(rtn) + # The `text' argument already has the field name prepended, so don't + # provide it here or the first line will get folded too short. + h = Header(text, maxlinelen=maxheaderlen, + # For backwards compatibility, we use a hard tab here + continuation_ws='\t') + return h.encode() # # Handlers for writing types and subtypes @@ -223,7 +189,10 @@ class Generator: payload = msg.get_payload() if payload is None: return - if not isinstance(payload, StringType): + cset = msg.get_charset() + if cset is not None: + payload = cset.body_encode(payload) + if not _isstring(payload): raise TypeError, 'string payload expected: %s' % type(payload) if self._mangle_from_: payload = fcre.sub('>From ', payload) @@ -232,28 +201,30 @@ class Generator: # Default body handler _writeBody = _handle_text - def _handle_multipart(self, msg, isdigest=0): + def _handle_multipart(self, msg): # The trick here is to write out each part separately, merge them all # together, and then make sure that the boundary we've chosen isn't # present in the payload. msgtexts = [] - # BAW: kludge for broken add_payload() semantics; watch out for - # multipart/* MIME types with None or scalar payloads. subparts = msg.get_payload() if subparts is None: - # Nothing has every been attached + # Nothing has ever been attached boundary = msg.get_boundary(failobj=_make_boundary()) print >> self._fp, '--' + boundary print >> self._fp, '\n' print >> self._fp, '--' + boundary + '--' return + elif _isstring(subparts): + # e.g. a non-strict parse of a message with no starting boundary. + self._fp.write(subparts) + return elif not isinstance(subparts, ListType): # Scalar payload subparts = [subparts] for part in subparts: s = StringIO() - g = self.__class__(s, self._mangle_from_, self.__maxheaderlen) - g(part, unixfrom=0) + g = self.clone(s) + g.flatten(part, unixfrom=False) msgtexts.append(s.getvalue()) # Now make sure the boundary we've selected doesn't appear in any of # the message texts. @@ -274,14 +245,8 @@ class Generator: # First boundary is a bit different; it doesn't have a leading extra # newline. print >> self._fp, '--' + boundary - if isdigest: - print >> self._fp # Join and write the individual parts joiner = '\n--' + boundary + '\n' - if isdigest: - # multipart/digest types effectively add an extra newline between - # the boundary and the body part. - joiner += '\n' self._fp.write(joiner.join(msgtexts)) print >> self._fp, '\n--' + boundary + '--', # Write out any epilogue @@ -290,9 +255,6 @@ class Generator: print >> self._fp self._fp.write(msg.epilogue) - def _handle_multipart_digest(self, msg): - self._handle_multipart(msg, isdigest=1) - def _handle_message_delivery_status(self, msg): # We can't just write the headers directly to self's file object # because this will leave an extra newline between the last header @@ -300,8 +262,8 @@ class Generator: blocks = [] for part in msg.get_payload(): s = StringIO() - g = self.__class__(s, self._mangle_from_, self.__maxheaderlen) - g(part, unixfrom=0) + g = self.clone(s) + g.flatten(part, unixfrom=False) text = s.getvalue() lines = text.split('\n') # Strip off the unnecessary trailing empty line @@ -316,11 +278,12 @@ class Generator: def _handle_message(self, msg): s = StringIO() - g = self.__class__(s, self._mangle_from_, self.__maxheaderlen) - # A message/rfc822 should contain a scalar payload which is another - # Message object. Extract that object, stringify it, and write that - # out. - g(msg.get_payload(), unixfrom=0) + g = self.clone(s) + # The payload of a message/rfc822 part should be a multipart sequence + # of length 1. The zeroth element of the list should be the Message + # object for the subpart. Extract that object, stringify it, and + # write it out. + g.flatten(msg.get_payload(0), unixfrom=False) self._fp.write(s.getvalue()) @@ -331,7 +294,7 @@ class DecodedGenerator(Generator): Like the Generator base class, except that non-text parts are substituted with a format string representing the part. """ - def __init__(self, outfp, mangle_from_=1, maxheaderlen=78, fmt=None): + def __init__(self, outfp, mangle_from_=True, maxheaderlen=78, fmt=None): """Like Generator.__init__() except that an additional optional argument is allowed. @@ -363,7 +326,7 @@ class DecodedGenerator(Generator): for part in msg.walk(): maintype = part.get_main_type('text') if maintype == 'text': - print >> self, part.get_payload(decode=1) + print >> self, part.get_payload(decode=True) elif maintype == 'multipart': # Just skip this pass @@ -390,7 +353,7 @@ def _make_boundary(text=None): return boundary b = boundary counter = 0 - while 1: + while True: cre = re.compile('^--' + re.escape(b) + '(--)?$', re.MULTILINE) if not cre.search(text): break diff --git a/Lib/email/Iterators.py b/Lib/email/Iterators.py index a64495d9b069..3ecd632ecf32 100644 --- a/Lib/email/Iterators.py +++ b/Lib/email/Iterators.py @@ -1,33 +1,25 @@ -# Copyright (C) 2001 Python Software Foundation +# Copyright (C) 2001,2002 Python Software Foundation # Author: barry@zope.com (Barry Warsaw) """Various types of useful iterators and generators. """ -from __future__ import generators -from cStringIO import StringIO -from types import StringType +import sys - - -def body_line_iterator(msg): - """Iterate over the parts, returning string payloads line-by-line.""" - for subpart in msg.walk(): - payload = subpart.get_payload() - if type(payload) is StringType: - for line in StringIO(payload): - yield line +try: + from email._compat22 import body_line_iterator, typed_subpart_iterator +except SyntaxError: + # Python 2.1 doesn't have generators + from email._compat21 import body_line_iterator, typed_subpart_iterator -def typed_subpart_iterator(msg, maintype='text', subtype=None): - """Iterate over the subparts with a given MIME type. - - Use `maintype' as the main MIME type to match against; this defaults to - "text". Optional `subtype' is the MIME subtype to match against; if - omitted, only the main type is matched. - """ - for subpart in msg.walk(): - if subpart.get_main_type('text') == maintype: - if subtype is None or subpart.get_subtype('plain') == subtype: - yield subpart +def _structure(msg, fp=None, level=0): + """A handy debugging aid""" + if fp is None: + fp = sys.stdout + tab = ' ' * (level * 4) + print >> fp, tab + msg.get_content_type() + if msg.is_multipart(): + for subpart in msg.get_payload(): + _structure(subpart, fp, level+1) diff --git a/Lib/email/MIMEAudio.py b/Lib/email/MIMEAudio.py index 57b7b6da99d1..dda7689a4c87 100644 --- a/Lib/email/MIMEAudio.py +++ b/Lib/email/MIMEAudio.py @@ -6,9 +6,9 @@ import sndhdr from cStringIO import StringIO -import MIMEBase -import Errors -import Encoders +from email import Errors +from email import Encoders +from email.MIMENonMultipart import MIMENonMultipart @@ -37,7 +37,7 @@ def _whatsnd(data): -class MIMEAudio(MIMEBase.MIMEBase): +class MIMEAudio(MIMENonMultipart): """Class for generating audio/* MIME documents.""" def __init__(self, _audiodata, _subtype=None, @@ -46,7 +46,7 @@ class MIMEAudio(MIMEBase.MIMEBase): _audiodata is a string containing the raw audio data. If this data can be decoded by the standard Python `sndhdr' module, then the - subtype will be automatically included in the Content-Type: header. + subtype will be automatically included in the Content-Type header. Otherwise, you can specify the specific audio subtype via the _subtype parameter. If _subtype is not given, and no subtype can be guessed, a TypeError is raised. @@ -55,17 +55,17 @@ class MIMEAudio(MIMEBase.MIMEBase): transport of the image data. It takes one argument, which is this Image instance. It should use get_payload() and set_payload() to change the payload to the encoded form. It should also add any - Content-Transfer-Encoding: or other headers to the message as + Content-Transfer-Encoding or other headers to the message as necessary. The default encoding is Base64. Any additional keyword arguments are passed to the base class - constructor, which turns them into parameters on the Content-Type: + constructor, which turns them into parameters on the Content-Type header. """ if _subtype is None: _subtype = _whatsnd(_audiodata) if _subtype is None: raise TypeError, 'Could not find audio MIME subtype' - MIMEBase.MIMEBase.__init__(self, 'audio', _subtype, **_params) + MIMENonMultipart.__init__(self, 'audio', _subtype, **_params) self.set_payload(_audiodata) _encoder(self) diff --git a/Lib/email/MIMEBase.py b/Lib/email/MIMEBase.py index 33216f6acb5e..7485d855c4fa 100644 --- a/Lib/email/MIMEBase.py +++ b/Lib/email/MIMEBase.py @@ -1,10 +1,10 @@ -# Copyright (C) 2001 Python Software Foundation +# Copyright (C) 2001,2002 Python Software Foundation # Author: barry@zope.com (Barry Warsaw) """Base class for MIME specializations. """ -import Message +from email import Message diff --git a/Lib/email/MIMEImage.py b/Lib/email/MIMEImage.py index 963da23a2de8..5306e537066d 100644 --- a/Lib/email/MIMEImage.py +++ b/Lib/email/MIMEImage.py @@ -1,4 +1,4 @@ -# Copyright (C) 2001 Python Software Foundation +# Copyright (C) 2001,2002 Python Software Foundation # Author: barry@zope.com (Barry Warsaw) """Class representing image/* type MIME documents. @@ -6,14 +6,13 @@ import imghdr -# Intrapackage imports -import MIMEBase -import Errors -import Encoders +from email import Errors +from email import Encoders +from email.MIMENonMultipart import MIMENonMultipart -class MIMEImage(MIMEBase.MIMEBase): +class MIMEImage(MIMENonMultipart): """Class for generating image/* type MIME documents.""" def __init__(self, _imagedata, _subtype=None, @@ -22,7 +21,7 @@ class MIMEImage(MIMEBase.MIMEBase): _imagedata is a string containing the raw image data. If this data can be decoded by the standard Python `imghdr' module, then the - subtype will be automatically included in the Content-Type: header. + subtype will be automatically included in the Content-Type header. Otherwise, you can specify the specific image subtype via the _subtype parameter. @@ -30,17 +29,17 @@ class MIMEImage(MIMEBase.MIMEBase): transport of the image data. It takes one argument, which is this Image instance. It should use get_payload() and set_payload() to change the payload to the encoded form. It should also add any - Content-Transfer-Encoding: or other headers to the message as + Content-Transfer-Encoding or other headers to the message as necessary. The default encoding is Base64. Any additional keyword arguments are passed to the base class - constructor, which turns them into parameters on the Content-Type: + constructor, which turns them into parameters on the Content-Type header. """ if _subtype is None: _subtype = imghdr.what(None, _imagedata) if _subtype is None: raise TypeError, 'Could not guess image MIME subtype' - MIMEBase.MIMEBase.__init__(self, 'image', _subtype, **_params) + MIMENonMultipart.__init__(self, 'image', _subtype, **_params) self.set_payload(_imagedata) _encoder(self) diff --git a/Lib/email/MIMEMessage.py b/Lib/email/MIMEMessage.py index fc4b2c6bdc9a..2042dd97529f 100644 --- a/Lib/email/MIMEMessage.py +++ b/Lib/email/MIMEMessage.py @@ -1,15 +1,15 @@ -# Copyright (C) 2001 Python Software Foundation +# Copyright (C) 2001,2002 Python Software Foundation # Author: barry@zope.com (Barry Warsaw) """Class representing message/* MIME documents. """ -import Message -import MIMEBase +from email import Message +from email.MIMENonMultipart import MIMENonMultipart -class MIMEMessage(MIMEBase.MIMEBase): +class MIMEMessage(MIMENonMultipart): """Class representing message/* MIME documents.""" def __init__(self, _msg, _subtype='rfc822'): @@ -22,7 +22,11 @@ class MIMEMessage(MIMEBase.MIMEBase): default is "rfc822" (this is defined by the MIME standard, even though the term "rfc822" is technically outdated by RFC 2822). """ - MIMEBase.MIMEBase.__init__(self, 'message', _subtype) + MIMENonMultipart.__init__(self, 'message', _subtype) if not isinstance(_msg, Message.Message): raise TypeError, 'Argument is not an instance of Message' - self.set_payload(_msg) + # It's convenient to use this base class method. We need to do it + # this way or we'll get an exception + Message.Message.attach(self, _msg) + # And be sure our default type is set correctly + self.set_default_type('message/rfc822') diff --git a/Lib/email/MIMEText.py b/Lib/email/MIMEText.py index ccce9fb5b1a3..d91b93df3d18 100644 --- a/Lib/email/MIMEText.py +++ b/Lib/email/MIMEText.py @@ -1,19 +1,20 @@ -# Copyright (C) 2001 Python Software Foundation +# Copyright (C) 2001,2002 Python Software Foundation # Author: barry@zope.com (Barry Warsaw) """Class representing text/* type MIME documents. """ -import MIMEBase -from Encoders import encode_7or8bit +import warnings +from email.MIMENonMultipart import MIMENonMultipart +from email.Encoders import encode_7or8bit -class MIMEText(MIMEBase.MIMEBase): +class MIMEText(MIMENonMultipart): """Class for generating text/* type MIME documents.""" def __init__(self, _text, _subtype='plain', _charset='us-ascii', - _encoder=encode_7or8bit): + _encoder=None): """Create a text/* type MIME document. _text is the string for this message object. If the text does not end @@ -21,21 +22,27 @@ class MIMEText(MIMEBase.MIMEBase): _subtype is the MIME sub content type, defaulting to "plain". - _charset is the character set parameter added to the Content-Type: - header. This defaults to "us-ascii". - - _encoder is a function which will perform the actual encoding for - transport of the text data. It takes one argument, which is this - Text instance. It should use get_payload() and set_payload() to - change the payload to the encoded form. It should also add any - Content-Transfer-Encoding: or other headers to the message as - necessary. The default encoding doesn't actually modify the payload, - but it does set Content-Transfer-Encoding: to either `7bit' or `8bit' - as appropriate. + _charset is the character set parameter added to the Content-Type + header. This defaults to "us-ascii". Note that as a side-effect, the + Content-Transfer-Encoding header will also be set. + + The use of the _encoder is deprecated. The encoding of the payload, + and the setting of the character set parameter now happens implicitly + based on the _charset argument. If _encoder is supplied, then a + DeprecationWarning is used, and the _encoder functionality may + override any header settings indicated by _charset. This is probably + not what you want. """ - MIMEBase.MIMEBase.__init__(self, 'text', _subtype, - **{'charset': _charset}) - if _text and _text[-1] <> '\n': + MIMENonMultipart.__init__(self, 'text', _subtype, + **{'charset': _charset}) + if _text and not _text.endswith('\n'): _text += '\n' - self.set_payload(_text) - _encoder(self) + self.set_payload(_text, _charset) + if _encoder is not None: + warnings.warn('_encoder argument is obsolete.', + DeprecationWarning, 2) + # Because set_payload() with a _charset will set its own + # Content-Transfer-Encoding header, we need to delete the + # existing one or will end up with two of them. :( + del self['content-transfer-encoding'] + _encoder(self) diff --git a/Lib/email/Message.py b/Lib/email/Message.py index 91931a11e2da..87ab309885cc 100644 --- a/Lib/email/Message.py +++ b/Lib/email/Message.py @@ -1,69 +1,117 @@ -# Copyright (C) 2001 Python Software Foundation +# Copyright (C) 2001,2002 Python Software Foundation # Author: barry@zope.com (Barry Warsaw) """Basic message object for the email package object model. """ -from __future__ import generators - import re -import base64 -import quopri +import warnings from cStringIO import StringIO -from types import ListType +from types import ListType, TupleType, StringType # Intrapackage imports -import Errors -import Utils +from email import Errors +from email import Utils +from email import Charset SEMISPACE = '; ' + +try: + True, False +except NameError: + True = 1 + False = 0 + +# Regular expression used to split header parameters. BAW: this may be too +# simple. It isn't strictly RFC 2045 (section 5.1) compliant, but it catches +# most headers found in the wild. We may eventually need a full fledged +# parser eventually. paramre = re.compile(r'\s*;\s*') +# Regular expression that matches `special' characters in parameters, the +# existance of which force quoting of the parameter value. +tspecials = re.compile(r'[ \(\)<>@,;:\\"/\[\]\?=]') + + + +# Helper functions +def _formatparam(param, value=None, quote=True): + """Convenience function to format and return a key=value pair. + + This will quote the value if needed or if quote is true. + """ + if value is not None and len(value) > 0: + # TupleType is used for RFC 2231 encoded parameter values where items + # are (charset, language, value). charset is a string, not a Charset + # instance. + if isinstance(value, TupleType): + # Encode as per RFC 2231 + param += '*' + value = Utils.encode_rfc2231(value[2], value[0], value[1]) + # BAW: Please check this. I think that if quote is set it should + # force quoting even if not necessary. + if quote or tspecials.search(value): + return '%s="%s"' % (param, Utils.quote(value)) + else: + return '%s=%s' % (param, value) + else: + return param + + +def _unquotevalue(value): + if isinstance(value, TupleType): + return value[0], value[1], Utils.unquote(value[2]) + else: + return Utils.unquote(value) class Message: - """Basic message object for use inside the object tree. + """Basic message object. A message object is defined as something that has a bunch of RFC 2822 - headers and a payload. If the body of the message is a multipart, then - the payload is a list of Messages, otherwise it is a string. + headers and a payload. It may optionally have an envelope header + (a.k.a. Unix-From or From_ header). If the message is a container (i.e. a + multipart or a message/rfc822), then the payload is a list of Message + objects, otherwise it is a string. - These objects implement part of the `mapping' interface, which assumes + Message objects implement part of the `mapping' interface, which assumes there is exactly one occurrance of the header per message. Some headers - do in fact appear multiple times (e.g. Received:) and for those headers, + do in fact appear multiple times (e.g. Received) and for those headers, you must use the explicit API to set or get all the headers. Not all of the mapping methods are implemented. - """ def __init__(self): self._headers = [] self._unixfrom = None self._payload = None + self._charset = None # Defaults for multipart messages self.preamble = self.epilogue = None + # Default content type + self._default_type = 'text/plain' def __str__(self): """Return the entire formatted message as a string. - This includes the headers, body, and `unixfrom' line. + This includes the headers, body, and envelope header. """ - return self.as_string(unixfrom=1) + return self.as_string(unixfrom=True) - def as_string(self, unixfrom=0): + def as_string(self, unixfrom=False): """Return the entire formatted message as a string. - Optional `unixfrom' when true, means include the Unix From_ envelope + Optional `unixfrom' when True, means include the Unix From_ envelope header. """ - from Generator import Generator + from email.Generator import Generator fp = StringIO() g = Generator(fp) - g(self, unixfrom=unixfrom) + g.flatten(self, unixfrom=unixfrom) return fp.getvalue() def is_multipart(self): - """Return true if the message consists of multiple parts.""" - if type(self._payload) is ListType: - return 1 - return 0 + """Return True if the message consists of multiple parts.""" + if isinstance(self._payload, ListType): + return True + return False # # Unix From_ line @@ -82,36 +130,52 @@ class Message: If the current payload is empty, then the current payload will be made a scalar, set to the given value. + + Note: This method is deprecated. Use .attach() instead. """ + warnings.warn('add_payload() is deprecated, use attach() instead.', + DeprecationWarning, 2) if self._payload is None: self._payload = payload - elif type(self._payload) is ListType: + elif isinstance(self._payload, ListType): self._payload.append(payload) elif self.get_main_type() not in (None, 'multipart'): raise Errors.MultipartConversionError( - 'Message main Content-Type: must be "multipart" or missing') + 'Message main content type must be "multipart" or missing') else: self._payload = [self._payload, payload] - # A useful synonym - attach = add_payload - - def get_payload(self, i=None, decode=0): - """Return the current payload exactly as is. + def attach(self, payload): + """Add the given payload to the current payload. - Optional i returns that index into the payload. + The current payload will always be a list of objects after this method + is called. If you want to set the payload to a scalar object, use + set_payload() instead. + """ + if self._payload is None: + self._payload = [payload] + else: + self._payload.append(payload) - Optional decode is a flag indicating whether the payload should be - decoded or not, according to the Content-Transfer-Encoding: header. - When true and the message is not a multipart, the payload will be - decoded if this header's value is `quoted-printable' or `base64'. If - some other encoding is used, or the header is missing, the payload is - returned as-is (undecoded). If the message is a multipart and the - decode flag is true, then None is returned. + def get_payload(self, i=None, decode=False): + """Return a reference to the payload. + + The payload will either be a list object or a string. If you mutate + the list object, you modify the message's payload in place. Optional + i returns that index into the payload. + + Optional decode is a flag (defaulting to False) indicating whether the + payload should be decoded or not, according to the + Content-Transfer-Encoding header. When True and the message is not a + multipart, the payload will be decoded if this header's value is + `quoted-printable' or `base64'. If some other encoding is used, or + the header is missing, the payload is returned as-is (undecoded). If + the message is a multipart and the decode flag is True, then None is + returned. """ if i is None: payload = self._payload - elif type(self._payload) is not ListType: + elif not isinstance(self._payload, ListType): raise TypeError, i else: payload = self._payload[i] @@ -127,10 +191,60 @@ class Message: # unchanged. return payload + def set_payload(self, payload, charset=None): + """Set the payload to the given value. - def set_payload(self, payload): - """Set the payload to the given value.""" + Optional charset sets the message's default character set. See + set_charset() for details. + """ self._payload = payload + if charset is not None: + self.set_charset(charset) + + def set_charset(self, charset): + """Set the charset of the payload to a given character set. + + charset can be a Charset instance, a string naming a character set, or + None. If it is a string it will be converted to a Charset instance. + If charset is None, the charset parameter will be removed from the + Content-Type field. Anything else will generate a TypeError. + + The message will be assumed to be of type text/* encoded with + charset.input_charset. It will be converted to charset.output_charset + and encoded properly, if needed, when generating the plain text + representation of the message. MIME headers (MIME-Version, + Content-Type, Content-Transfer-Encoding) will be added as needed. + + """ + if charset is None: + self.del_param('charset') + self._charset = None + return + if isinstance(charset, StringType): + charset = Charset.Charset(charset) + if not isinstance(charset, Charset.Charset): + raise TypeError, charset + # BAW: should we accept strings that can serve as arguments to the + # Charset constructor? + self._charset = charset + if not self.has_key('MIME-Version'): + self.add_header('MIME-Version', '1.0') + if not self.has_key('Content-Type'): + self.add_header('Content-Type', 'text/plain', + charset=charset.get_output_charset()) + else: + self.set_param('charset', charset.get_output_charset()) + if not self.has_key('Content-Transfer-Encoding'): + cte = charset.get_body_encoding() + if callable(cte): + cte(self) + else: + self.add_header('Content-Transfer-Encoding', cte) + + def get_charset(self): + """Return the Charset instance associated with the message's payload. + """ + return self._charset # # MAPPING INTERFACE (partial) @@ -170,8 +284,8 @@ class Message: newheaders.append((k, v)) self._headers = newheaders - def __contains__(self, key): - return key.lower() in [k.lower() for k, v in self._headers] + def __contains__(self, name): + return name.lower() in [k.lower() for k, v in self._headers] def has_key(self, name): """Return true if the message contains the header.""" @@ -182,8 +296,9 @@ class Message: """Return a list of all the message's header field names. These will be sorted in the order they appeared in the original - message, and may contain duplicates. Any fields deleted and - re-inserted are always appended to the header list. + message, or were added to the message, and may contain duplicates. + Any fields deleted and re-inserted are always appended to the header + list. """ return [k for k, v in self._headers] @@ -191,8 +306,9 @@ class Message: """Return a list of all the message's header values. These will be sorted in the order they appeared in the original - message, and may contain duplicates. Any fields deleted and - re-inserted are always appended to the header list. + message, or were added to the message, and may contain duplicates. + Any fields deleted and re-inserted are always appended to the header + list. """ return [v for k, v in self._headers] @@ -200,8 +316,9 @@ class Message: """Get all the message's header fields and values. These will be sorted in the order they appeared in the original - message, and may contain duplicates. Any fields deleted and - re-inserted are always appended to the header list. + message, or were added to the message, and may contain duplicates. + Any fields deleted and re-inserted are always appended to the header + list. """ return self._headers[:] @@ -250,30 +367,49 @@ class Message: Example: msg.add_header('content-disposition', 'attachment', filename='bud.gif') - """ parts = [] for k, v in _params.items(): if v is None: parts.append(k.replace('_', '-')) else: - parts.append('%s="%s"' % (k.replace('_', '-'), v)) + parts.append(_formatparam(k.replace('_', '-'), v)) if _value is not None: parts.insert(0, _value) self._headers.append((_name, SEMISPACE.join(parts))) + def replace_header(self, _name, _value): + """Replace a header. + + Replace the first matching header found in the message, retaining + header order and case. If no matching header was found, a KeyError is + raised. + """ + _name = _name.lower() + for i, (k, v) in zip(range(len(self._headers)), self._headers): + if k.lower() == _name: + self._headers[i] = (k, _value) + break + else: + raise KeyError, _name + + # + # These methods are silently deprecated in favor of get_content_type() and + # friends (see below). They will be noisily deprecated in email 3.0. + # + def get_type(self, failobj=None): """Returns the message's content type. The returned string is coerced to lowercase and returned as a single - string of the form `maintype/subtype'. If there was no Content-Type: + string of the form `maintype/subtype'. If there was no Content-Type header in the message, failobj is returned (defaults to None). """ missing = [] value = self.get('content-type', missing) if value is missing: return failobj - return paramre.split(value)[0].lower() + return paramre.split(value)[0].lower().strip() def get_main_type(self, failobj=None): """Return the message's main content type if present.""" @@ -281,10 +417,9 @@ class Message: ctype = self.get_type(missing) if ctype is missing: return failobj - parts = ctype.split('/') - if len(parts) > 0: - return ctype.split('/')[0] - return failobj + if ctype.count('/') <> 1: + return failobj + return ctype.split('/')[0] def get_subtype(self, failobj=None): """Return the message's content subtype if present.""" @@ -292,10 +427,73 @@ class Message: ctype = self.get_type(missing) if ctype is missing: return failobj - parts = ctype.split('/') - if len(parts) > 1: - return ctype.split('/')[1] - return failobj + if ctype.count('/') <> 1: + return failobj + return ctype.split('/')[1] + + # + # Use these three methods instead of the three above. + # + + def get_content_type(self): + """Return the message's content type. + + The returned string is coerced to lower case of the form + `maintype/subtype'. If there was no Content-Type header in the + message, the default type as given by get_default_type() will be + returned. Since according to RFC 2045, messages always have a default + type this will always return a value. + + RFC 2045 defines a message's default type to be text/plain unless it + appears inside a multipart/digest container, in which case it would be + message/rfc822. + """ + missing = [] + value = self.get('content-type', missing) + if value is missing: + # This should have no parameters + return self.get_default_type() + ctype = paramre.split(value)[0].lower().strip() + # RFC 2045, section 5.2 says if its invalid, use text/plain + if ctype.count('/') <> 1: + return 'text/plain' + return ctype + + def get_content_maintype(self): + """Return the message's main content type. + + This is the `maintype' part of the string returned by + get_content_type(). + """ + ctype = self.get_content_type() + return ctype.split('/')[0] + + def get_content_subtype(self): + """Returns the message's sub-content type. + + This is the `subtype' part of the string returned by + get_content_type(). + """ + ctype = self.get_content_type() + return ctype.split('/')[1] + + def get_default_type(self): + """Return the `default' content type. + + Most messages have a default content type of text/plain, except for + messages that are subparts of multipart/digest containers. Such + subparts have a default content type of message/rfc822. + """ + return self._default_type + + def set_default_type(self, ctype): + """Set the `default' content type. + + ctype should be either "text/plain" or "message/rfc822", although this + is not enforced. The default content type is not stored in the + Content-Type header. + """ + self._default_type = ctype def _get_params_preserve(self, failobj, header): # Like get_params() but preserves the quoting of values. BAW: @@ -308,103 +506,236 @@ class Message: for p in paramre.split(value): try: name, val = p.split('=', 1) + name = name.strip() + val = val.strip() except ValueError: # Must have been a bare attribute - name = p + name = p.strip() val = '' params.append((name, val)) + params = Utils.decode_params(params) return params - def get_params(self, failobj=None, header='content-type'): - """Return the message's Content-Type: parameters, as a list. + def get_params(self, failobj=None, header='content-type', unquote=True): + """Return the message's Content-Type parameters, as a list. The elements of the returned list are 2-tuples of key/value pairs, as split on the `=' sign. The left hand side of the `=' is the key, while the right hand side is the value. If there is no `=' sign in - the parameter the value is the empty string. The value is always - unquoted. + the parameter the value is the empty string. The value is as + described in the get_param() method. - Optional failobj is the object to return if there is no Content-Type: + Optional failobj is the object to return if there is no Content-Type header. Optional header is the header to search instead of - Content-Type: + Content-Type. If unquote is True, the value is unquoted. """ missing = [] params = self._get_params_preserve(missing, header) if params is missing: return failobj - return [(k, Utils.unquote(v)) for k, v in params] - - def get_param(self, param, failobj=None, header='content-type'): - """Return the parameter value if found in the Content-Type: header. - - Optional failobj is the object to return if there is no Content-Type: - header. Optional header is the header to search instead of - Content-Type: - - Parameter keys are always compared case insensitively. Values are - always unquoted. + if unquote: + return [(k, _unquotevalue(v)) for k, v in params] + else: + return params + + def get_param(self, param, failobj=None, header='content-type', + unquote=True): + """Return the parameter value if found in the Content-Type header. + + Optional failobj is the object to return if there is no Content-Type + header, or the Content-Type header has no such parameter. Optional + header is the header to search instead of Content-Type. + + Parameter keys are always compared case insensitively. The return + value can either be a string, or a 3-tuple if the parameter was RFC + 2231 encoded. When it's a 3-tuple, the elements of the value are of + the form (CHARSET, LANGUAGE, VALUE), where LANGUAGE may be the empty + string. Your application should be prepared to deal with these, and + can convert the parameter to a Unicode string like so: + + param = msg.get_param('foo') + if isinstance(param, tuple): + param = unicode(param[2], param[0]) + + In any case, the parameter value (either the returned string, or the + VALUE item in the 3-tuple) is always unquoted, unless unquote is set + to False. """ if not self.has_key(header): return failobj for k, v in self._get_params_preserve(failobj, header): if k.lower() == param.lower(): - return Utils.unquote(v) + if unquote: + return _unquotevalue(v) + else: + return v return failobj + def set_param(self, param, value, header='Content-Type', requote=True, + charset=None, language=''): + """Set a parameter in the Content-Type header. + + If the parameter already exists in the header, its value will be + replaced with the new value. + + If header is Content-Type and has not yet been defined for this + message, it will be set to "text/plain" and the new parameter and + value will be appended as per RFC 2045. + + An alternate header can specified in the header argument, and all + parameters will be quoted as necessary unless requote is False. + + If charset is specified, the parameter will be encoded according to RFC + 2231. Optional language specifies the RFC 2231 language, defaulting + to the empty string. Both charset and language should be strings. + """ + if not isinstance(value, TupleType) and charset: + value = (charset, language, value) + + if not self.has_key(header) and header.lower() == 'content-type': + ctype = 'text/plain' + else: + ctype = self.get(header) + if not self.get_param(param, header=header): + if not ctype: + ctype = _formatparam(param, value, requote) + else: + ctype = SEMISPACE.join( + [ctype, _formatparam(param, value, requote)]) + else: + ctype = '' + for old_param, old_value in self.get_params(header=header, + unquote=requote): + append_param = '' + if old_param.lower() == param.lower(): + append_param = _formatparam(param, value, requote) + else: + append_param = _formatparam(old_param, old_value, requote) + if not ctype: + ctype = append_param + else: + ctype = SEMISPACE.join([ctype, append_param]) + if ctype <> self.get(header): + del self[header] + self[header] = ctype + + def del_param(self, param, header='content-type', requote=True): + """Remove the given parameter completely from the Content-Type header. + + The header will be re-written in place without the parameter or its + value. All values will be quoted as necessary unless requote is + False. Optional header specifies an alternative to the Content-Type + header. + """ + if not self.has_key(header): + return + new_ctype = '' + for p, v in self.get_params(header, unquote=requote): + if p.lower() <> param.lower(): + if not new_ctype: + new_ctype = _formatparam(p, v, requote) + else: + new_ctype = SEMISPACE.join([new_ctype, + _formatparam(p, v, requote)]) + if new_ctype <> self.get(header): + del self[header] + self[header] = new_ctype + + def set_type(self, type, header='Content-Type', requote=True): + """Set the main type and subtype for the Content-Type header. + + type must be a string in the form "maintype/subtype", otherwise a + ValueError is raised. + + This method replaces the Content-Type header, keeping all the + parameters in place. If requote is False, this leaves the existing + header's quoting as is. Otherwise, the parameters will be quoted (the + default). + + An alternative header can be specified in the header argument. When + the Content-Type header is set, we'll always also add a MIME-Version + header. + """ + # BAW: should we be strict? + if not type.count('/') == 1: + raise ValueError + # Set the Content-Type, you get a MIME-Version + if header.lower() == 'content-type': + del self['mime-version'] + self['MIME-Version'] = '1.0' + if not self.has_key(header): + self[header] = type + return + params = self.get_params(header, unquote=requote) + del self[header] + self[header] = type + # Skip the first param; it's the old type. + for p, v in params[1:]: + self.set_param(p, v, header, requote) + def get_filename(self, failobj=None): """Return the filename associated with the payload if present. - The filename is extracted from the Content-Disposition: header's + The filename is extracted from the Content-Disposition header's `filename' parameter, and it is unquoted. """ missing = [] filename = self.get_param('filename', missing, 'content-disposition') if filename is missing: return failobj - return Utils.unquote(filename.strip()) + if isinstance(filename, TupleType): + # It's an RFC 2231 encoded parameter + newvalue = _unquotevalue(filename) + return unicode(newvalue[2], newvalue[0]) + else: + newvalue = _unquotevalue(filename.strip()) + return newvalue def get_boundary(self, failobj=None): """Return the boundary associated with the payload if present. - The boundary is extracted from the Content-Type: header's `boundary' + The boundary is extracted from the Content-Type header's `boundary' parameter, and it is unquoted. """ missing = [] boundary = self.get_param('boundary', missing) if boundary is missing: return failobj - return Utils.unquote(boundary.strip()) + if isinstance(boundary, TupleType): + # RFC 2231 encoded, so decode. It better end up as ascii + return unicode(boundary[2], boundary[0]).encode('us-ascii') + return _unquotevalue(boundary.strip()) def set_boundary(self, boundary): - """Set the boundary parameter in Content-Type: to 'boundary'. + """Set the boundary parameter in Content-Type to 'boundary'. - This is subtly different than deleting the Content-Type: header and + This is subtly different than deleting the Content-Type header and adding a new one with a new boundary parameter via add_header(). The main difference is that using the set_boundary() method preserves the - order of the Content-Type: header in the original message. + order of the Content-Type header in the original message. - HeaderParseError is raised if the message has no Content-Type: header. + HeaderParseError is raised if the message has no Content-Type header. """ missing = [] params = self._get_params_preserve(missing, 'content-type') if params is missing: - # There was no Content-Type: header, and we don't know what type + # There was no Content-Type header, and we don't know what type # to set it to, so raise an exception. - raise Errors.HeaderParseError, 'No Content-Type: header found' + raise Errors.HeaderParseError, 'No Content-Type header found' newparams = [] - foundp = 0 + foundp = False for pk, pv in params: if pk.lower() == 'boundary': newparams.append(('boundary', '"%s"' % boundary)) - foundp = 1 + foundp = True else: newparams.append((pk, pv)) if not foundp: - # The original Content-Type: header had no boundary attribute. + # The original Content-Type header had no boundary attribute. # Tack one one the end. BAW: should we raise an exception # instead??? newparams.append(('boundary', '"%s"' % boundary)) - # Replace the existing Content-Type: header with the new value + # Replace the existing Content-Type header with the new value newheaders = [] for h, v in self._headers: if h.lower() == 'content-type': @@ -420,27 +751,36 @@ class Message: newheaders.append((h, v)) self._headers = newheaders - def walk(self): - """Walk over the message tree, yielding each subpart. + try: + from email._compat22 import walk + except SyntaxError: + # Must be using Python 2.1 + from email._compat21 import walk + + def get_content_charset(self, failobj=None): + """Return the charset parameter of the Content-Type header. - The walk is performed in depth-first order. This method is a - generator. + If there is no Content-Type header, or if that header has no charset + parameter, failobj is returned. """ - yield self - if self.is_multipart(): - for subpart in self.get_payload(): - for subsubpart in subpart.walk(): - yield subsubpart + missing = [] + charset = self.get_param('charset', missing) + if charset is missing: + return failobj + if isinstance(charset, TupleType): + # RFC 2231 encoded, so decode it, and it better end up as ascii. + return unicode(charset[2], charset[0]).encode('us-ascii') + return charset def get_charsets(self, failobj=None): """Return a list containing the charset(s) used in this message. - The returned list of items describes the Content-Type: headers' + The returned list of items describes the Content-Type headers' charset parameter for this message and all the subparts in its payload. Each item will either be a string (the value of the charset parameter - in the Content-Type: header of that part) or the value of the + in the Content-Type header of that part) or the value of the 'failobj' parameter (defaults to None), if the part does not have a main MIME type of "text", or the charset is not defined. @@ -448,4 +788,4 @@ class Message: one for the container message (i.e. self), so that a non-multipart message will still return a list of length 1. """ - return [part.get_param('charset', failobj) for part in self.walk()] + return [part.get_content_charset(failobj) for part in self.walk()] diff --git a/Lib/email/Parser.py b/Lib/email/Parser.py index 2f131d6b3a92..119a90dbf491 100644 --- a/Lib/email/Parser.py +++ b/Lib/email/Parser.py @@ -4,20 +4,26 @@ """A parser of RFC 2822 and MIME email messages. """ +import re from cStringIO import StringIO from types import ListType -# Intrapackage imports -import Errors -import Message +from email import Errors +from email import Message EMPTYSTRING = '' NL = '\n' +try: + True, False +except NameError: + True = 1 + False = 0 + class Parser: - def __init__(self, _class=Message.Message): + def __init__(self, _class=Message.Message, strict=False): """Parser of RFC 2822 and MIME email messages. Creates an in-memory object tree representing the email message, which @@ -32,17 +38,39 @@ class Parser: _class is the class to instantiate for new message objects when they must be created. This class must have a constructor that can take zero arguments. Default is Message.Message. + + Optional strict tells the parser to be strictly RFC compliant or to be + more forgiving in parsing of ill-formatted MIME documents. When + non-strict mode is used, the parser will try to make up for missing or + erroneous boundaries and other peculiarities seen in the wild. + Default is non-strict parsing. """ self._class = _class + self._strict = strict + + def parse(self, fp, headersonly=False): + """Create a message structure from the data in a file. - def parse(self, fp): + Reads all the data from the file and returns the root of the message + structure. Optional headersonly is a flag specifying whether to stop + parsing after reading the headers or not. The default is False, + meaning it parses the entire contents of the file. + """ root = self._class() self._parseheaders(root, fp) - self._parsebody(root, fp) + if not headersonly: + self._parsebody(root, fp) return root - def parsestr(self, text): - return self.parse(StringIO(text)) + def parsestr(self, text, headersonly=False): + """Create a message structure from a string. + + Returns the root of the message structure. Optional headersonly is a + flag specifying whether to stop parsing after reading the headers or + not. The default is False, meaning it parses the entire contents of + the file. + """ + return self.parse(StringIO(text), headersonly=headersonly) def _parseheaders(self, container, fp): # Parse the headers, returning a list of header/value pairs. None as @@ -50,20 +78,30 @@ class Parser: lastheader = '' lastvalue = [] lineno = 0 - while 1: - line = fp.readline()[:-1] - if not line or not line.strip(): + while True: + # Don't strip the line before we test for the end condition, + # because whitespace-only header lines are RFC compliant + # continuation lines. + line = fp.readline() + if not line: + break + line = line.splitlines()[0] + if not line: break + # Ignore the trailing newline lineno += 1 # Check for initial Unix From_ line if line.startswith('From '): if lineno == 1: container.set_unixfrom(line) continue - else: + elif self._strict: raise Errors.HeaderParseError( 'Unix-from in headers after first rfc822 header') - # + else: + # ignore the wierdly placed From_ line + # XXX: maybe set unixfrom anyway? or only if not already? + continue # Header continuation line if line[0] in ' \t': if not lastheader: @@ -78,8 +116,15 @@ class Parser: # instead of raising the exception). i = line.find(':') if i < 0: - raise Errors.HeaderParseError( - 'Not a header, not a continuation') + if self._strict: + raise Errors.HeaderParseError( + "Not a header, not a continuation: ``%s''"%line) + elif lineno == 1 and line.startswith('--'): + # allow through duplicate boundary tags. + continue + else: + raise Errors.HeaderParseError( + "Not a header, not a continuation: ``%s''"%line) if lastheader: container[lastheader] = NL.join(lastvalue) lastheader = line[:i] @@ -100,51 +145,99 @@ class Parser: if boundary: preamble = epilogue = None # Split into subparts. The first boundary we're looking for won't - # have the leading newline since we're at the start of the body - # text. + # always have a leading newline since we're at the start of the + # body text, and there's not always a preamble before the first + # boundary. separator = '--' + boundary payload = fp.read() - start = payload.find(separator) - if start < 0: - raise Errors.BoundaryError( - "Couldn't find starting boundary: %s" % boundary) + # We use an RE here because boundaries can have trailing + # whitespace. + mo = re.search( + r'(?P' + re.escape(separator) + r')(?P[ \t]*)', + payload) + if not mo: + if self._strict: + raise Errors.BoundaryError( + "Couldn't find starting boundary: %s" % boundary) + container.set_payload(payload) + return + start = mo.start() if start > 0: # there's some pre-MIME boundary preamble preamble = payload[0:start] - start += len(separator) + 1 + isdigest - terminator = payload.find('\n' + separator + '--', start) - if terminator < 0: + # Find out what kind of line endings we're using + start += len(mo.group('sep')) + len(mo.group('ws')) + cre = re.compile('\r\n|\r|\n') + mo = cre.search(payload, start) + if mo: + start += len(mo.group(0)) + # We create a compiled regexp first because we need to be able to + # specify the start position, and the module function doesn't + # support this signature. :( + cre = re.compile('(?P\r\n|\r|\n)' + + re.escape(separator) + '--') + mo = cre.search(payload, start) + if mo: + terminator = mo.start() + linesep = mo.group('sep') + if mo.end() < len(payload): + # There's some post-MIME boundary epilogue + epilogue = payload[mo.end():] + elif self._strict: raise Errors.BoundaryError( - "Couldn't find terminating boundary: %s" % boundary) - if terminator+len(separator)+3 < len(payload): - # there's some post-MIME boundary epilogue - epilogue = payload[terminator+len(separator)+3:] + "Couldn't find terminating boundary: %s" % boundary) + else: + # Handle the case of no trailing boundary. Check that it ends + # in a blank line. Some cases (spamspamspam) don't even have + # that! + mo = re.search('(?P\r\n|\r|\n){2}$', payload) + if not mo: + mo = re.search('(?P\r\n|\r|\n)$', payload) + if not mo: + raise Errors.BoundaryError( + 'No terminating boundary and no trailing empty line') + linesep = mo.group('sep') + terminator = len(payload) # We split the textual payload on the boundary separator, which - # includes the trailing newline. If the container is a + # includes the trailing newline. If the container is a # multipart/digest then the subparts are by default message/rfc822 - # instead of text/plain. In that case, they'll have an extra - # newline before the headers to distinguish the message's headers - # from the subpart headers. - if isdigest: - separator += '\n\n' - else: - separator += '\n' - parts = payload[start:terminator].split('\n' + separator) + # instead of text/plain. In that case, they'll have a optional + # block of MIME headers, then an empty line followed by the + # message headers. + parts = re.split( + linesep + re.escape(separator) + r'[ \t]*' + linesep, + payload[start:terminator]) for part in parts: - msgobj = self.parsestr(part) + if isdigest: + if part[0] == linesep: + # There's no header block so create an empty message + # object as the container, and lop off the newline so + # we can parse the sub-subobject + msgobj = self._class() + part = part[1:] + else: + parthdrs, part = part.split(linesep+linesep, 1) + # msgobj in this case is the "message/rfc822" container + msgobj = self.parsestr(parthdrs, headersonly=1) + # while submsgobj is the message itself + submsgobj = self.parsestr(part) + msgobj.attach(submsgobj) + msgobj.set_default_type('message/rfc822') + else: + msgobj = self.parsestr(part) container.preamble = preamble container.epilogue = epilogue - # Ensure that the container's payload is a list - if not isinstance(container.get_payload(), ListType): - container.set_payload([msgobj]) - else: - container.add_payload(msgobj) + container.attach(msgobj) + elif container.get_main_type() == 'multipart': + # Very bad. A message is a multipart with no boundary! + raise Errors.BoundaryError( + 'multipart message with no defined boundary') elif container.get_type() == 'message/delivery-status': # This special kind of type contains blocks of headers separated # by a blank line. We'll represent each header block as a # separate Message object blocks = [] - while 1: + while True: blockmsg = self._class() self._parseheaders(blockmsg, fp) if not len(blockmsg): @@ -160,9 +253,9 @@ class Parser: except Errors.HeaderParseError: msg = self._class() self._parsebody(msg, fp) - container.add_payload(msg) + container.attach(msg) else: - container.add_payload(fp.read()) + container.set_payload(fp.read()) diff --git a/Lib/email/Utils.py b/Lib/email/Utils.py index 3d4828723781..b619c6b79851 100644 --- a/Lib/email/Utils.py +++ b/Lib/email/Utils.py @@ -1,25 +1,61 @@ -# Copyright (C) 2001 Python Software Foundation +# Copyright (C) 2001,2002 Python Software Foundation # Author: barry@zope.com (Barry Warsaw) """Miscellaneous utilities. """ import time +import socket import re +import random +import os +import warnings +from cStringIO import StringIO +from types import ListType -from rfc822 import unquote, quote, parseaddr -from rfc822 import dump_address_pair -from rfc822 import AddrlistClass as _AddrlistClass -from rfc822 import parsedate_tz, parsedate, mktime_tz +from rfc822 import quote +from rfc822 import AddressList as _AddressList +from rfc822 import mktime_tz + +# We need wormarounds for bugs in these methods in older Pythons (see below) +from rfc822 import parsedate as _parsedate +from rfc822 import parsedate_tz as _parsedate_tz + +try: + True, False +except NameError: + True = 1 + False = 0 + +try: + from quopri import decodestring as _qdecode +except ImportError: + # Python 2.1 doesn't have quopri.decodestring() + def _qdecode(s): + import quopri as _quopri + + if not s: + return s + infp = StringIO(s) + outfp = StringIO() + _quopri.decode(infp, outfp) + value = outfp.getvalue() + if not s.endswith('\n') and value.endswith('\n'): + return value[:-1] + return value -from quopri import decodestring as _qdecode import base64 # Intrapackage imports -from Encoders import _bencode, _qencode +from email.Encoders import _bencode, _qencode COMMASPACE = ', ' +EMPTYSTRING = '' UEMPTYSTRING = u'' +CRLF = '\r\n' + +specialsre = re.compile(r'[][\()<>@,:;".]') +escapesre = re.compile(r'[][\()"]') @@ -36,19 +72,53 @@ def _bdecode(s): # newline". Blech! if not s: return s - hasnewline = (s[-1] == '\n') value = base64.decodestring(s) - if not hasnewline and value[-1] == '\n': + if not s.endswith('\n') and value.endswith('\n'): return value[:-1] return value + +def fix_eols(s): + """Replace all line-ending characters with \r\n.""" + # Fix newlines with no preceding carriage return + s = re.sub(r'(?', name) + return '%s%s%s <%s>' % (quotes, name, quotes, address) + return address + +# For backwards compatibility +def dump_address_pair(pair): + warnings.warn('Use email.Utils.formataddr() instead', + DeprecationWarning, 2) + return formataddr(pair) + + def getaddresses(fieldvalues): """Return a list of (REALNAME, EMAIL) for each fieldvalue.""" all = COMMASPACE.join(fieldvalues) - a = _AddrlistClass(all) - return a.getaddrlist() + a = _AddressList(all) + return a.addresslist @@ -64,30 +134,26 @@ ecre = re.compile(r''' def decode(s): - """Return a decoded string according to RFC 2047, as a unicode string.""" + """Return a decoded string according to RFC 2047, as a unicode string. + + NOTE: This function is deprecated. Use Header.decode_header() instead. + """ + warnings.warn('Use Header.decode_header() instead.', DeprecationWarning, 2) + # Intra-package import here to avoid circular import problems. + from email.Header import decode_header + L = decode_header(s) + if not isinstance(L, ListType): + # s wasn't decoded + return s + rtn = [] - parts = ecre.split(s, 1) - while parts: - # If there are less than 4 parts, it can't be encoded and we're done - if len(parts) < 5: - rtn.extend(parts) - break - # The first element is any non-encoded leading text - rtn.append(parts[0]) - charset = parts[1] - encoding = parts[2].lower() - atom = parts[3] - # The next chunk to decode should be in parts[4] - parts = ecre.split(parts[4]) - # The encoding must be either `q' or `b', case-insensitive - if encoding == 'q': - func = _qdecode - elif encoding == 'b': - func = _bdecode + for atom, charset in L: + if charset is None: + rtn.append(atom) else: - func = _identity - # Decode and get the unicode in the charset - rtn.append(unicode(func(atom), charset)) + # Convert the string to Unicode using the given encoding. Leave + # Unicode conversion errors to strict. + rtn.append(unicode(atom, charset)) # Now that we've decoded everything, we just need to join all the parts # together into the final string. return UEMPTYSTRING.join(rtn) @@ -96,6 +162,7 @@ def decode(s): def encode(s, charset='iso-8859-1', encoding='q'): """Encode a string according to RFC 2047.""" + warnings.warn('Use Header.Header.encode() instead.', DeprecationWarning, 2) encoding = encoding.lower() if encoding == 'q': estr = _qencode(s) @@ -107,7 +174,7 @@ def encode(s, charset='iso-8859-1', encoding='q'): -def formatdate(timeval=None, localtime=0): +def formatdate(timeval=None, localtime=False): """Returns a date string as specified by RFC 2822, e.g.: Fri, 09 Nov 2001 01:08:47 -0000 @@ -115,7 +182,7 @@ def formatdate(timeval=None, localtime=0): Optional timeval if given is a floating point time value as accepted by gmtime() and localtime(), otherwise the current time is used. - Optional localtime is a flag that when true, interprets timeval, and + Optional localtime is a flag that when True, interprets timeval, and returns a date relative to the local timezone instead of UTC, properly taking daylight savings time into account. """ @@ -150,3 +217,124 @@ def formatdate(timeval=None, localtime=0): 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'][now[1] - 1], now[0], now[3], now[4], now[5], zone) + + + +def make_msgid(idstring=None): + """Returns a string suitable for RFC 2822 compliant Message-ID, e.g: + + <20020201195627.33539.96671@nightshade.la.mastaler.com> + + Optional idstring if given is a string used to strengthen the + uniqueness of the message id. + """ + timeval = time.time() + utcdate = time.strftime('%Y%m%d%H%M%S', time.gmtime(timeval)) + pid = os.getpid() + randint = random.randrange(100000) + if idstring is None: + idstring = '' + else: + idstring = '.' + idstring + idhost = socket.getfqdn() + msgid = '<%s.%s.%s%s@%s>' % (utcdate, pid, randint, idstring, idhost) + return msgid + + + +# These functions are in the standalone mimelib version only because they've +# subsequently been fixed in the latest Python versions. We use this to worm +# around broken older Pythons. +def parsedate(data): + if not data: + return None + return _parsedate(data) + + +def parsedate_tz(data): + if not data: + return None + return _parsedate_tz(data) + + +def parseaddr(addr): + addrs = _AddressList(addr).addresslist + if not addrs: + return '', '' + return addrs[0] + + +# rfc822.unquote() doesn't properly de-backslash-ify in Python pre-2.3. +def unquote(str): + """Remove quotes from a string.""" + if len(str) > 1: + if str.startswith('"') and str.endswith('"'): + return str[1:-1].replace('\\\\', '\\').replace('\\"', '"') + if str.startswith('<') and str.endswith('>'): + return str[1:-1] + return str + + + +# RFC2231-related functions - parameter encoding and decoding +def decode_rfc2231(s): + """Decode string according to RFC 2231""" + import urllib + charset, language, s = s.split("'", 2) + s = urllib.unquote(s) + return charset, language, s + + +def encode_rfc2231(s, charset=None, language=None): + """Encode string according to RFC 2231. + + If neither charset nor language is given, then s is returned as-is. If + charset is given but not language, the string is encoded using the empty + string for language. + """ + import urllib + s = urllib.quote(s, safe='') + if charset is None and language is None: + return s + if language is None: + language = '' + return "%s'%s'%s" % (charset, language, s) + + +rfc2231_continuation = re.compile(r'^(?P\w+)\*((?P[0-9]+)\*?)?$') + +def decode_params(params): + """Decode parameters list according to RFC 2231. + + params is a sequence of 2-tuples containing (content type, string value). + """ + new_params = [] + # maps parameter's name to a list of continuations + rfc2231_params = {} + # params is a sequence of 2-tuples containing (content_type, string value) + name, value = params[0] + new_params.append((name, value)) + # Cycle through each of the rest of the parameters. + for name, value in params[1:]: + value = unquote(value) + mo = rfc2231_continuation.match(name) + if mo: + name, num = mo.group('name', 'num') + if num is not None: + num = int(num) + rfc2231_param1 = rfc2231_params.setdefault(name, []) + rfc2231_param1.append((num, value)) + else: + new_params.append((name, '"%s"' % quote(value))) + if rfc2231_params: + for name, continuations in rfc2231_params.items(): + value = [] + # Sort by number + continuations.sort() + # And now append all values in num order + for num, continuation in continuations: + value.append(continuation) + charset, language, value = decode_rfc2231(EMPTYSTRING.join(value)) + new_params.append((name, + (charset, language, '"%s"' % quote(value)))) + return new_params diff --git a/Lib/email/__init__.py b/Lib/email/__init__.py index c13495b4b887..4c892ff3fe0b 100644 --- a/Lib/email/__init__.py +++ b/Lib/email/__init__.py @@ -1,35 +1,72 @@ -# Copyright (C) 2001 Python Software Foundation +# Copyright (C) 2001,2002 Python Software Foundation # Author: barry@zope.com (Barry Warsaw) """A package for parsing, handling, and generating email messages. """ -__version__ = '1.0' - -__all__ = ['Encoders', - 'Errors', - 'Generator', - 'Iterators', - 'MIMEAudio', - 'MIMEBase', - 'MIMEImage', - 'MIMEMessage', - 'MIMEText', - 'Message', - 'Parser', - 'Utils', - 'message_from_string', - 'message_from_file', - ] +__version__ = '2.4' + +__all__ = [ + 'base64MIME', + 'Charset', + 'Encoders', + 'Errors', + 'Generator', + 'Header', + 'Iterators', + 'Message', + 'MIMEAudio', + 'MIMEBase', + 'MIMEImage', + 'MIMEMessage', + 'MIMEMultipart', + 'MIMENonMultipart', + 'MIMEText', + 'Parser', + 'quopriMIME', + 'Utils', + 'message_from_string', + 'message_from_file', + ] + +try: + True, False +except NameError: + True = 1 + False = 0 -# Some convenience routines -from Parser import Parser as _Parser -from Message import Message as _Message +# Some convenience routines. Don't import Parser and Message as side-effects +# of importing email since those cascadingly import most of the rest of the +# email package. +def message_from_string(s, _class=None, strict=False): + """Parse a string into a Message object model. + + Optional _class and strict are passed to the Parser constructor. + """ + from email.Parser import Parser + if _class is None: + from email.Message import Message + _class = Message + return Parser(_class, strict=strict).parsestr(s) -def message_from_string(s, _class=_Message): - return _Parser(_class).parsestr(s) +def message_from_file(fp, _class=None, strict=False): + """Read a file and parse its contents into a Message object model. -def message_from_file(fp, _class=_Message): - return _Parser(_class).parse(fp) + Optional _class and strict are passed to the Parser constructor. + """ + from email.Parser import Parser + if _class is None: + from email.Message import Message + _class = Message + return Parser(_class, strict=strict).parse(fp) + + + +# Patch encodings.aliases to recognize 'ansi_x3.4_1968' which isn't a standard +# alias in Python 2.1.3, but is used by the email package test suite. +from encodings.aliases import aliases # The aliases dictionary +if not aliases.has_key('ansi_x3.4_1968'): + aliases['ansi_x3.4_1968'] = 'ascii' +del aliases # Not needed any more diff --git a/Lib/test/regrtest.py b/Lib/test/regrtest.py index 2d806f16c75a..2b1eedb5e0b1 100755 --- a/Lib/test/regrtest.py +++ b/Lib/test/regrtest.py @@ -484,6 +484,7 @@ _expectations = { test_curses test_dbm test_dl + test_email_codecs test_fcntl test_fork1 test_gdbm @@ -511,6 +512,7 @@ _expectations = { test_cl test_curses test_dl + test_email_codecs test_gl test_imgfile test_largefile @@ -534,6 +536,7 @@ _expectations = { test_curses test_dbm test_dl + test_email_codecs test_fcntl test_fork1 test_gl @@ -567,6 +570,7 @@ _expectations = { test_cd test_cl test_dl + test_email_codecs test_gl test_imgfile test_largefile @@ -591,6 +595,7 @@ _expectations = { test_cd test_cl test_dl + test_email_codecs test_gl test_imgfile test_largefile @@ -616,6 +621,7 @@ _expectations = { test_cd test_cl test_dl + test_email_codecs test_fork1 test_gettext test_gl @@ -652,6 +658,7 @@ _expectations = { test_crypt test_dbm test_dl + test_email_codecs test_fcntl test_fork1 test_gdbm @@ -690,6 +697,7 @@ _expectations = { test_cl test_curses test_dl + test_email_codecs test_gdbm test_gl test_imgfile diff --git a/Lib/test/test_email.py b/Lib/test/test_email.py index 5155680bfc41..5df59606086a 100644 --- a/Lib/test/test_email.py +++ b/Lib/test/test_email.py @@ -1,1132 +1,11 @@ # Copyright (C) 2001,2002 Python Software Foundation # email package unit tests -import os -import time import unittest -import base64 -from cStringIO import StringIO -from types import StringType - -import email - -from email.Parser import Parser, HeaderParser -from email.Generator import Generator, DecodedGenerator -from email.Message import Message -from email.MIMEAudio import MIMEAudio -from email.MIMEText import MIMEText -from email.MIMEImage import MIMEImage -from email.MIMEBase import MIMEBase -from email.MIMEMessage import MIMEMessage -from email import Utils -from email import Errors -from email import Encoders -from email import Iterators - -from test_support import findfile, __file__ as test_support_file - - -NL = '\n' -EMPTYSTRING = '' -SPACE = ' ' - - - -def openfile(filename): - path = os.path.join(os.path.dirname(test_support_file), 'data', filename) - return open(path) - - - -# Base test class -class TestEmailBase(unittest.TestCase): - def _msgobj(self, filename): - fp = openfile(filename) - try: - msg = email.message_from_file(fp) - finally: - fp.close() - return msg +# The specific tests now live in Lib/email/test +from email.test.test_email import suite -# Test various aspects of the Message class's API -class TestMessageAPI(TestEmailBase): - def test_get_all(self): - eq = self.assertEqual - msg = self._msgobj('msg_20.txt') - eq(msg.get_all('cc'), ['ccc@zzz.org', 'ddd@zzz.org', 'eee@zzz.org']) - eq(msg.get_all('xx', 'n/a'), 'n/a') - - def test_get_charsets(self): - eq = self.assertEqual - - msg = self._msgobj('msg_08.txt') - charsets = msg.get_charsets() - eq(charsets, [None, 'us-ascii', 'iso-8859-1', 'iso-8859-2', 'koi8-r']) - - msg = self._msgobj('msg_09.txt') - charsets = msg.get_charsets('dingbat') - eq(charsets, ['dingbat', 'us-ascii', 'iso-8859-1', 'dingbat', - 'koi8-r']) - - msg = self._msgobj('msg_12.txt') - charsets = msg.get_charsets() - eq(charsets, [None, 'us-ascii', 'iso-8859-1', None, 'iso-8859-2', - 'iso-8859-3', 'us-ascii', 'koi8-r']) - - def test_get_filename(self): - eq = self.assertEqual - - msg = self._msgobj('msg_04.txt') - filenames = [p.get_filename() for p in msg.get_payload()] - eq(filenames, ['msg.txt', 'msg.txt']) - - msg = self._msgobj('msg_07.txt') - subpart = msg.get_payload(1) - eq(subpart.get_filename(), 'dingusfish.gif') - - def test_get_boundary(self): - eq = self.assertEqual - msg = self._msgobj('msg_07.txt') - # No quotes! - eq(msg.get_boundary(), 'BOUNDARY') - - def test_set_boundary(self): - eq = self.assertEqual - # This one has no existing boundary parameter, but the Content-Type: - # header appears fifth. - msg = self._msgobj('msg_01.txt') - msg.set_boundary('BOUNDARY') - header, value = msg.items()[4] - eq(header.lower(), 'content-type') - eq(value, 'text/plain; charset=us-ascii; boundary="BOUNDARY"') - # This one has a Content-Type: header, with a boundary, stuck in the - # middle of its headers. Make sure the order is preserved; it should - # be fifth. - msg = self._msgobj('msg_04.txt') - msg.set_boundary('BOUNDARY') - header, value = msg.items()[4] - eq(header.lower(), 'content-type') - eq(value, 'multipart/mixed; boundary="BOUNDARY"') - # And this one has no Content-Type: header at all. - msg = self._msgobj('msg_03.txt') - self.assertRaises(Errors.HeaderParseError, - msg.set_boundary, 'BOUNDARY') - - def test_get_decoded_payload(self): - eq = self.assertEqual - msg = self._msgobj('msg_10.txt') - # The outer message is a multipart - eq(msg.get_payload(decode=1), None) - # Subpart 1 is 7bit encoded - eq(msg.get_payload(0).get_payload(decode=1), - 'This is a 7bit encoded message.\n') - # Subpart 2 is quopri - eq(msg.get_payload(1).get_payload(decode=1), - '\xa1This is a Quoted Printable encoded message!\n') - # Subpart 3 is base64 - eq(msg.get_payload(2).get_payload(decode=1), - 'This is a Base64 encoded message.') - # Subpart 4 has no Content-Transfer-Encoding: header. - eq(msg.get_payload(3).get_payload(decode=1), - 'This has no Content-Transfer-Encoding: header.\n') - - def test_decoded_generator(self): - eq = self.assertEqual - msg = self._msgobj('msg_07.txt') - fp = openfile('msg_17.txt') - try: - text = fp.read() - finally: - fp.close() - s = StringIO() - g = DecodedGenerator(s) - g(msg) - eq(s.getvalue(), text) - - def test__contains__(self): - msg = Message() - msg['From'] = 'Me' - msg['to'] = 'You' - # Check for case insensitivity - self.failUnless('from' in msg) - self.failUnless('From' in msg) - self.failUnless('FROM' in msg) - self.failUnless('to' in msg) - self.failUnless('To' in msg) - self.failUnless('TO' in msg) - - def test_as_string(self): - eq = self.assertEqual - msg = self._msgobj('msg_01.txt') - fp = openfile('msg_01.txt') - try: - text = fp.read() - finally: - fp.close() - eq(text, msg.as_string()) - fullrepr = str(msg) - lines = fullrepr.split('\n') - self.failUnless(lines[0].startswith('From ')) - eq(text, NL.join(lines[1:])) - - def test_bad_param(self): - msg = email.message_from_string("Content-Type: blarg; baz; boo\n") - self.assertEqual(msg.get_param('baz'), '') - - def test_missing_filename(self): - msg = email.message_from_string("From: foo\n") - self.assertEqual(msg.get_filename(), None) - - def test_bogus_filename(self): - msg = email.message_from_string( - "Content-Disposition: blarg; filename\n") - self.assertEqual(msg.get_filename(), '') - - def test_missing_boundary(self): - msg = email.message_from_string("From: foo\n") - self.assertEqual(msg.get_boundary(), None) - - def test_get_params(self): - eq = self.assertEqual - msg = email.message_from_string( - 'X-Header: foo=one; bar=two; baz=three\n') - eq(msg.get_params(header='x-header'), - [('foo', 'one'), ('bar', 'two'), ('baz', 'three')]) - msg = email.message_from_string( - 'X-Header: foo; bar=one; baz=two\n') - eq(msg.get_params(header='x-header'), - [('foo', ''), ('bar', 'one'), ('baz', 'two')]) - eq(msg.get_params(), None) - msg = email.message_from_string( - 'X-Header: foo; bar="one"; baz=two\n') - eq(msg.get_params(header='x-header'), - [('foo', ''), ('bar', 'one'), ('baz', 'two')]) - - def test_get_param(self): - eq = self.assertEqual - msg = email.message_from_string( - "X-Header: foo=one; bar=two; baz=three\n") - eq(msg.get_param('bar', header='x-header'), 'two') - eq(msg.get_param('quuz', header='x-header'), None) - eq(msg.get_param('quuz'), None) - msg = email.message_from_string( - 'X-Header: foo; bar="one"; baz=two\n') - eq(msg.get_param('foo', header='x-header'), '') - eq(msg.get_param('bar', header='x-header'), 'one') - eq(msg.get_param('baz', header='x-header'), 'two') - - def test_get_param_funky_continuation_lines(self): - msg = self._msgobj('msg_22.txt') - self.assertEqual(msg.get_payload(1).get_param('name'), 'wibble.JPG') - - def test_has_key(self): - msg = email.message_from_string('Header: exists') - self.failUnless(msg.has_key('header')) - self.failUnless(msg.has_key('Header')) - self.failUnless(msg.has_key('HEADER')) - self.failIf(msg.has_key('headeri')) - - - -# Test the email.Encoders module -class TestEncoders(unittest.TestCase): - def test_encode_noop(self): - eq = self.assertEqual - msg = MIMEText('hello world', _encoder=Encoders.encode_noop) - eq(msg.get_payload(), 'hello world\n') - eq(msg['content-transfer-encoding'], None) - - def test_encode_7bit(self): - eq = self.assertEqual - msg = MIMEText('hello world', _encoder=Encoders.encode_7or8bit) - eq(msg.get_payload(), 'hello world\n') - eq(msg['content-transfer-encoding'], '7bit') - msg = MIMEText('hello \x7f world', _encoder=Encoders.encode_7or8bit) - eq(msg.get_payload(), 'hello \x7f world\n') - eq(msg['content-transfer-encoding'], '7bit') - - def test_encode_8bit(self): - eq = self.assertEqual - msg = MIMEText('hello \x80 world', _encoder=Encoders.encode_7or8bit) - eq(msg.get_payload(), 'hello \x80 world\n') - eq(msg['content-transfer-encoding'], '8bit') - - def test_encode_base64(self): - eq = self.assertEqual - msg = MIMEText('hello world', _encoder=Encoders.encode_base64) - eq(msg.get_payload(), 'aGVsbG8gd29ybGQK\n') - eq(msg['content-transfer-encoding'], 'base64') - - def test_encode_quoted_printable(self): - eq = self.assertEqual - msg = MIMEText('hello world', _encoder=Encoders.encode_quopri) - eq(msg.get_payload(), 'hello=20world\n') - eq(msg['content-transfer-encoding'], 'quoted-printable') - - - -# Test long header wrapping -class TestLongHeaders(unittest.TestCase): - def test_header_splitter(self): - msg = MIMEText('') - # It'd be great if we could use add_header() here, but that doesn't - # guarantee an order of the parameters. - msg['X-Foobar-Spoink-Defrobnit'] = ( - 'wasnipoop; giraffes="very-long-necked-animals"; ' - 'spooge="yummy"; hippos="gargantuan"; marshmallows="gooey"') - sfp = StringIO() - g = Generator(sfp) - g(msg) - self.assertEqual(sfp.getvalue(), openfile('msg_18.txt').read()) - - def test_no_semis_header_splitter(self): - msg = Message() - msg['From'] = 'test@dom.ain' - refparts = [] - for i in range(10): - refparts.append('<%d@dom.ain>' % i) - msg['References'] = SPACE.join(refparts) - msg.set_payload('Test') - sfp = StringIO() - g = Generator(sfp) - g(msg) - self.assertEqual(sfp.getvalue(), """\ -From: test@dom.ain -References: <0@dom.ain> <1@dom.ain> <2@dom.ain> <3@dom.ain> <4@dom.ain> -\t<5@dom.ain> <6@dom.ain> <7@dom.ain> <8@dom.ain> <9@dom.ain> - -Test""") - - def test_no_split_long_header(self): - msg = Message() - msg['From'] = 'test@dom.ain' - refparts = [] - msg['References'] = 'x' * 80 - msg.set_payload('Test') - sfp = StringIO() - g = Generator(sfp) - g(msg) - self.assertEqual(sfp.getvalue(), """\ -From: test@dom.ain -References: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx - -Test""") - - - -# Test mangling of "From " lines in the body of a message -class TestFromMangling(unittest.TestCase): - def setUp(self): - self.msg = Message() - self.msg['From'] = 'aaa@bbb.org' - self.msg.add_payload("""\ -From the desk of A.A.A.: -Blah blah blah -""") - - def test_mangled_from(self): - s = StringIO() - g = Generator(s, mangle_from_=1) - g(self.msg) - self.assertEqual(s.getvalue(), """\ -From: aaa@bbb.org - ->From the desk of A.A.A.: -Blah blah blah -""") - - def test_dont_mangle_from(self): - s = StringIO() - g = Generator(s, mangle_from_=0) - g(self.msg) - self.assertEqual(s.getvalue(), """\ -From: aaa@bbb.org - -From the desk of A.A.A.: -Blah blah blah -""") - - - -# Test the basic MIMEAudio class -class TestMIMEAudio(unittest.TestCase): - def setUp(self): - # In Python, audiotest.au lives in Lib/test not Lib/test/data - fp = open(findfile('audiotest.au')) - try: - self._audiodata = fp.read() - finally: - fp.close() - self._au = MIMEAudio(self._audiodata) - - def test_guess_minor_type(self): - self.assertEqual(self._au.get_type(), 'audio/basic') - - def test_encoding(self): - payload = self._au.get_payload() - self.assertEqual(base64.decodestring(payload), self._audiodata) - - def checkSetMinor(self): - au = MIMEAudio(self._audiodata, 'fish') - self.assertEqual(im.get_type(), 'audio/fish') - - def test_custom_encoder(self): - eq = self.assertEqual - def encoder(msg): - orig = msg.get_payload() - msg.set_payload(0) - msg['Content-Transfer-Encoding'] = 'broken64' - au = MIMEAudio(self._audiodata, _encoder=encoder) - eq(au.get_payload(), 0) - eq(au['content-transfer-encoding'], 'broken64') - - def test_add_header(self): - eq = self.assertEqual - unless = self.failUnless - self._au.add_header('Content-Disposition', 'attachment', - filename='audiotest.au') - eq(self._au['content-disposition'], - 'attachment; filename="audiotest.au"') - eq(self._au.get_params(header='content-disposition'), - [('attachment', ''), ('filename', 'audiotest.au')]) - eq(self._au.get_param('filename', header='content-disposition'), - 'audiotest.au') - missing = [] - eq(self._au.get_param('attachment', header='content-disposition'), '') - unless(self._au.get_param('foo', failobj=missing, - header='content-disposition') is missing) - # Try some missing stuff - unless(self._au.get_param('foobar', missing) is missing) - unless(self._au.get_param('attachment', missing, - header='foobar') is missing) - - - -# Test the basic MIMEImage class -class TestMIMEImage(unittest.TestCase): - def setUp(self): - fp = openfile('PyBanner048.gif') - try: - self._imgdata = fp.read() - finally: - fp.close() - self._im = MIMEImage(self._imgdata) - - def test_guess_minor_type(self): - self.assertEqual(self._im.get_type(), 'image/gif') - - def test_encoding(self): - payload = self._im.get_payload() - self.assertEqual(base64.decodestring(payload), self._imgdata) - - def checkSetMinor(self): - im = MIMEImage(self._imgdata, 'fish') - self.assertEqual(im.get_type(), 'image/fish') - - def test_custom_encoder(self): - eq = self.assertEqual - def encoder(msg): - orig = msg.get_payload() - msg.set_payload(0) - msg['Content-Transfer-Encoding'] = 'broken64' - im = MIMEImage(self._imgdata, _encoder=encoder) - eq(im.get_payload(), 0) - eq(im['content-transfer-encoding'], 'broken64') - - def test_add_header(self): - eq = self.assertEqual - unless = self.failUnless - self._im.add_header('Content-Disposition', 'attachment', - filename='dingusfish.gif') - eq(self._im['content-disposition'], - 'attachment; filename="dingusfish.gif"') - eq(self._im.get_params(header='content-disposition'), - [('attachment', ''), ('filename', 'dingusfish.gif')]) - eq(self._im.get_param('filename', header='content-disposition'), - 'dingusfish.gif') - missing = [] - eq(self._im.get_param('attachment', header='content-disposition'), '') - unless(self._im.get_param('foo', failobj=missing, - header='content-disposition') is missing) - # Try some missing stuff - unless(self._im.get_param('foobar', missing) is missing) - unless(self._im.get_param('attachment', missing, - header='foobar') is missing) - - - -# Test the basic MIMEText class -class TestMIMEText(unittest.TestCase): - def setUp(self): - self._msg = MIMEText('hello there') - - def test_types(self): - eq = self.assertEqual - unless = self.failUnless - eq(self._msg.get_type(), 'text/plain') - eq(self._msg.get_param('charset'), 'us-ascii') - missing = [] - unless(self._msg.get_param('foobar', missing) is missing) - unless(self._msg.get_param('charset', missing, header='foobar') - is missing) - - def test_payload(self): - self.assertEqual(self._msg.get_payload(), 'hello there\n') - self.failUnless(not self._msg.is_multipart()) - - - -# Test a more complicated multipart/mixed type message -class TestMultipartMixed(unittest.TestCase): - def setUp(self): - fp = openfile('PyBanner048.gif') - try: - data = fp.read() - finally: - fp.close() - - container = MIMEBase('multipart', 'mixed', boundary='BOUNDARY') - image = MIMEImage(data, name='dingusfish.gif') - image.add_header('content-disposition', 'attachment', - filename='dingusfish.gif') - intro = MIMEText('''\ -Hi there, - -This is the dingus fish. -''') - container.add_payload(intro) - container.add_payload(image) - container['From'] = 'Barry ' - container['To'] = 'Dingus Lovers ' - container['Subject'] = 'Here is your dingus fish' - - now = 987809702.54848599 - timetuple = time.localtime(now) - if timetuple[-1] == 0: - tzsecs = time.timezone - else: - tzsecs = time.altzone - if tzsecs > 0: - sign = '-' - else: - sign = '+' - tzoffset = ' %s%04d' % (sign, tzsecs / 36) - container['Date'] = time.strftime( - '%a, %d %b %Y %H:%M:%S', - time.localtime(now)) + tzoffset - self._msg = container - self._im = image - self._txt = intro - - def test_hierarchy(self): - # convenience - eq = self.assertEqual - unless = self.failUnless - raises = self.assertRaises - # tests - m = self._msg - unless(m.is_multipart()) - eq(m.get_type(), 'multipart/mixed') - eq(len(m.get_payload()), 2) - raises(IndexError, m.get_payload, 2) - m0 = m.get_payload(0) - m1 = m.get_payload(1) - unless(m0 is self._txt) - unless(m1 is self._im) - eq(m.get_payload(), [m0, m1]) - unless(not m0.is_multipart()) - unless(not m1.is_multipart()) - - def test_no_parts_in_a_multipart(self): - outer = MIMEBase('multipart', 'mixed') - outer['Subject'] = 'A subject' - outer['To'] = 'aperson@dom.ain' - outer['From'] = 'bperson@dom.ain' - outer.preamble = '' - outer.epilogue = '' - outer.set_boundary('BOUNDARY') - msg = MIMEText('hello world') - self.assertEqual(outer.as_string(), '''\ -Content-Type: multipart/mixed; boundary="BOUNDARY" -MIME-Version: 1.0 -Subject: A subject -To: aperson@dom.ain -From: bperson@dom.ain - ---BOUNDARY - - ---BOUNDARY-- -''') - - def test_one_part_in_a_multipart(self): - outer = MIMEBase('multipart', 'mixed') - outer['Subject'] = 'A subject' - outer['To'] = 'aperson@dom.ain' - outer['From'] = 'bperson@dom.ain' - outer.preamble = '' - outer.epilogue = '' - outer.set_boundary('BOUNDARY') - msg = MIMEText('hello world') - outer.attach(msg) - self.assertEqual(outer.as_string(), '''\ -Content-Type: multipart/mixed; boundary="BOUNDARY" -MIME-Version: 1.0 -Subject: A subject -To: aperson@dom.ain -From: bperson@dom.ain - ---BOUNDARY -Content-Type: text/plain; charset="us-ascii" -MIME-Version: 1.0 -Content-Transfer-Encoding: 7bit - -hello world - ---BOUNDARY-- -''') - - def test_seq_parts_in_a_multipart(self): - outer = MIMEBase('multipart', 'mixed') - outer['Subject'] = 'A subject' - outer['To'] = 'aperson@dom.ain' - outer['From'] = 'bperson@dom.ain' - outer.preamble = '' - outer.epilogue = '' - msg = MIMEText('hello world') - outer.attach([msg]) - outer.set_boundary('BOUNDARY') - self.assertEqual(outer.as_string(), '''\ -Content-Type: multipart/mixed; boundary="BOUNDARY" -MIME-Version: 1.0 -Subject: A subject -To: aperson@dom.ain -From: bperson@dom.ain - ---BOUNDARY -Content-Type: text/plain; charset="us-ascii" -MIME-Version: 1.0 -Content-Transfer-Encoding: 7bit - -hello world - ---BOUNDARY-- -''') - - - -# Test some badly formatted messages -class TestNonConformant(TestEmailBase): - def test_parse_missing_minor_type(self): - eq = self.assertEqual - msg = self._msgobj('msg_14.txt') - eq(msg.get_type(), 'text') - eq(msg.get_main_type(), 'text') - self.failUnless(msg.get_subtype() is None) - - def test_bogus_boundary(self): - fp = openfile('msg_15.txt') - try: - data = fp.read() - finally: - fp.close() - p = Parser() - # Note, under a future non-strict parsing mode, this would parse the - # message into the intended message tree. - self.assertRaises(Errors.BoundaryError, p.parsestr, data) - - - -# Test RFC 2047 header encoding and decoding -class TestRFC2047(unittest.TestCase): - def test_iso_8859_1(self): - eq = self.assertEqual - s = '=?iso-8859-1?q?this=20is=20some=20text?=' - eq(Utils.decode(s), 'this is some text') - s = '=?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?=' - eq(Utils.decode(s), u'Keld_J\xf8rn_Simonsen') - s = '=?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=' \ - '=?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=' - eq(Utils.decode(s), 'If you can read this you understand the example.') - s = '=?iso-8859-8?b?7eXs+SDv4SDp7Oj08A==?=' - eq(Utils.decode(s), - u'\u05dd\u05d5\u05dc\u05e9 \u05df\u05d1 \u05d9\u05dc\u05d8\u05e4\u05e0') - s = '=?iso-8859-1?q?this=20is?= =?iso-8859-1?q?some=20text?=' - eq(Utils.decode(s), u'this is some text') - - def test_encode_header(self): - eq = self.assertEqual - s = 'this is some text' - eq(Utils.encode(s), '=?iso-8859-1?q?this=20is=20some=20text?=') - s = 'Keld_J\xf8rn_Simonsen' - eq(Utils.encode(s), '=?iso-8859-1?q?Keld_J=F8rn_Simonsen?=') - s1 = 'If you can read this yo' - s2 = 'u understand the example.' - eq(Utils.encode(s1, encoding='b'), - '=?iso-8859-1?b?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=') - eq(Utils.encode(s2, charset='iso-8859-2', encoding='b'), - '=?iso-8859-2?b?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=') - - - -# Test the MIMEMessage class -class TestMIMEMessage(TestEmailBase): - def setUp(self): - fp = openfile('msg_11.txt') - self._text = fp.read() - fp.close() - - def test_type_error(self): - self.assertRaises(TypeError, MIMEMessage, 'a plain string') - - def test_valid_argument(self): - eq = self.assertEqual - subject = 'A sub-message' - m = Message() - m['Subject'] = subject - r = MIMEMessage(m) - eq(r.get_type(), 'message/rfc822') - self.failUnless(r.get_payload() is m) - eq(r.get_payload()['subject'], subject) - - def test_generate(self): - # First craft the message to be encapsulated - m = Message() - m['Subject'] = 'An enclosed message' - m.add_payload('Here is the body of the message.\n') - r = MIMEMessage(m) - r['Subject'] = 'The enclosing message' - s = StringIO() - g = Generator(s) - g(r) - self.assertEqual(s.getvalue(), """\ -Content-Type: message/rfc822 -MIME-Version: 1.0 -Subject: The enclosing message - -Subject: An enclosed message - -Here is the body of the message. -""") - - def test_parse_message_rfc822(self): - eq = self.assertEqual - msg = self._msgobj('msg_11.txt') - eq(msg.get_type(), 'message/rfc822') - eq(len(msg.get_payload()), 1) - submsg = msg.get_payload() - self.failUnless(isinstance(submsg, Message)) - eq(submsg['subject'], 'An enclosed message') - eq(submsg.get_payload(), 'Here is the body of the message.\n') - - def test_dsn(self): - eq = self.assertEqual - unless = self.failUnless - # msg 16 is a Delivery Status Notification, see RFC XXXX - msg = self._msgobj('msg_16.txt') - eq(msg.get_type(), 'multipart/report') - unless(msg.is_multipart()) - eq(len(msg.get_payload()), 3) - # Subpart 1 is a text/plain, human readable section - subpart = msg.get_payload(0) - eq(subpart.get_type(), 'text/plain') - eq(subpart.get_payload(), """\ -This report relates to a message you sent with the following header fields: - - Message-id: <002001c144a6$8752e060$56104586@oxy.edu> - Date: Sun, 23 Sep 2001 20:10:55 -0700 - From: "Ian T. Henry" - To: SoCal Raves - Subject: [scr] yeah for Ians!! - -Your message cannot be delivered to the following recipients: - - Recipient address: jangel1@cougar.noc.ucla.edu - Reason: recipient reached disk quota - -""") - # Subpart 2 contains the machine parsable DSN information. It - # consists of two blocks of headers, represented by two nested Message - # objects. - subpart = msg.get_payload(1) - eq(subpart.get_type(), 'message/delivery-status') - eq(len(subpart.get_payload()), 2) - # message/delivery-status should treat each block as a bunch of - # headers, i.e. a bunch of Message objects. - dsn1 = subpart.get_payload(0) - unless(isinstance(dsn1, Message)) - eq(dsn1['original-envelope-id'], '0GK500B4HD0888@cougar.noc.ucla.edu') - eq(dsn1.get_param('dns', header='reporting-mta'), '') - # Try a missing one - eq(dsn1.get_param('nsd', header='reporting-mta'), None) - dsn2 = subpart.get_payload(1) - unless(isinstance(dsn2, Message)) - eq(dsn2['action'], 'failed') - eq(dsn2.get_params(header='original-recipient'), - [('rfc822', ''), ('jangel1@cougar.noc.ucla.edu', '')]) - eq(dsn2.get_param('rfc822', header='final-recipient'), '') - # Subpart 3 is the original message - subpart = msg.get_payload(2) - eq(subpart.get_type(), 'message/rfc822') - subsubpart = subpart.get_payload() - unless(isinstance(subsubpart, Message)) - eq(subsubpart.get_type(), 'text/plain') - eq(subsubpart['message-id'], - '<002001c144a6$8752e060$56104586@oxy.edu>') - - def test_epilogue(self): - fp = openfile('msg_21.txt') - try: - text = fp.read() - finally: - fp.close() - msg = Message() - msg['From'] = 'aperson@dom.ain' - msg['To'] = 'bperson@dom.ain' - msg['Subject'] = 'Test' - msg.preamble = 'MIME message\n' - msg.epilogue = 'End of MIME message\n' - msg1 = MIMEText('One') - msg2 = MIMEText('Two') - msg.add_header('Content-Type', 'multipart/mixed', boundary='BOUNDARY') - msg.add_payload(msg1) - msg.add_payload(msg2) - sfp = StringIO() - g = Generator(sfp) - g(msg) - self.assertEqual(sfp.getvalue(), text) - - - -# A general test of parser->model->generator idempotency. IOW, read a message -# in, parse it into a message object tree, then without touching the tree, -# regenerate the plain text. The original text and the transformed text -# should be identical. Note: that we ignore the Unix-From since that may -# contain a changed date. -class TestIdempotent(unittest.TestCase): - def _msgobj(self, filename): - fp = openfile(filename) - try: - data = fp.read() - finally: - fp.close() - msg = email.message_from_string(data) - return msg, data - - def _idempotent(self, msg, text): - eq = self.assertEquals - s = StringIO() - g = Generator(s, maxheaderlen=0) - g(msg) - eq(text, s.getvalue()) - - def test_parse_text_message(self): - eq = self.assertEquals - msg, text = self._msgobj('msg_01.txt') - eq(msg.get_type(), 'text/plain') - eq(msg.get_main_type(), 'text') - eq(msg.get_subtype(), 'plain') - eq(msg.get_params()[1], ('charset', 'us-ascii')) - eq(msg.get_param('charset'), 'us-ascii') - eq(msg.preamble, None) - eq(msg.epilogue, None) - self._idempotent(msg, text) - - def test_parse_untyped_message(self): - eq = self.assertEquals - msg, text = self._msgobj('msg_03.txt') - eq(msg.get_type(), None) - eq(msg.get_params(), None) - eq(msg.get_param('charset'), None) - self._idempotent(msg, text) - - def test_simple_multipart(self): - msg, text = self._msgobj('msg_04.txt') - self._idempotent(msg, text) - - def test_MIME_digest(self): - msg, text = self._msgobj('msg_02.txt') - self._idempotent(msg, text) - - def test_mixed_with_image(self): - msg, text = self._msgobj('msg_06.txt') - self._idempotent(msg, text) - - def test_multipart_report(self): - msg, text = self._msgobj('msg_05.txt') - self._idempotent(msg, text) - - def test_dsn(self): - msg, text = self._msgobj('msg_16.txt') - self._idempotent(msg, text) - - def test_preamble_epilogue(self): - msg, text = self._msgobj('msg_21.txt') - self._idempotent(msg, text) - - def test_multipart_one_part(self): - msg, text = self._msgobj('msg_23.txt') - self._idempotent(msg, text) - - def test_content_type(self): - eq = self.assertEquals - # Get a message object and reset the seek pointer for other tests - msg, text = self._msgobj('msg_05.txt') - eq(msg.get_type(), 'multipart/report') - # Test the Content-Type: parameters - params = {} - for pk, pv in msg.get_params(): - params[pk] = pv - eq(params['report-type'], 'delivery-status') - eq(params['boundary'], 'D1690A7AC1.996856090/mail.example.com') - eq(msg.preamble, 'This is a MIME-encapsulated message.\n\n') - eq(msg.epilogue, '\n\n') - eq(len(msg.get_payload()), 3) - # Make sure the subparts are what we expect - msg1 = msg.get_payload(0) - eq(msg1.get_type(), 'text/plain') - eq(msg1.get_payload(), 'Yadda yadda yadda\n') - msg2 = msg.get_payload(1) - eq(msg2.get_type(), None) - eq(msg2.get_payload(), 'Yadda yadda yadda\n') - msg3 = msg.get_payload(2) - eq(msg3.get_type(), 'message/rfc822') - self.failUnless(isinstance(msg3, Message)) - msg4 = msg3.get_payload() - self.failUnless(isinstance(msg4, Message)) - eq(msg4.get_payload(), 'Yadda yadda yadda\n') - - def test_parser(self): - eq = self.assertEquals - msg, text = self._msgobj('msg_06.txt') - # Check some of the outer headers - eq(msg.get_type(), 'message/rfc822') - # Make sure there's exactly one thing in the payload and that's a - # sub-Message object of type text/plain - msg1 = msg.get_payload() - self.failUnless(isinstance(msg1, Message)) - eq(msg1.get_type(), 'text/plain') - self.failUnless(isinstance(msg1.get_payload(), StringType)) - eq(msg1.get_payload(), '\n') - - - -# Test various other bits of the package's functionality -class TestMiscellaneous(unittest.TestCase): - def test_message_from_string(self): - fp = openfile('msg_01.txt') - try: - text = fp.read() - finally: - fp.close() - msg = email.message_from_string(text) - s = StringIO() - # Don't wrap/continue long headers since we're trying to test - # idempotency. - g = Generator(s, maxheaderlen=0) - g(msg) - self.assertEqual(text, s.getvalue()) - - def test_message_from_file(self): - fp = openfile('msg_01.txt') - try: - text = fp.read() - fp.seek(0) - msg = email.message_from_file(fp) - s = StringIO() - # Don't wrap/continue long headers since we're trying to test - # idempotency. - g = Generator(s, maxheaderlen=0) - g(msg) - self.assertEqual(text, s.getvalue()) - finally: - fp.close() - - def test_message_from_string_with_class(self): - unless = self.failUnless - fp = openfile('msg_01.txt') - try: - text = fp.read() - finally: - fp.close() - # Create a subclass - class MyMessage(Message): - pass - - msg = email.message_from_string(text, MyMessage) - unless(isinstance(msg, MyMessage)) - # Try something more complicated - fp = openfile('msg_02.txt') - try: - text = fp.read() - finally: - fp.close() - msg = email.message_from_string(text, MyMessage) - for subpart in msg.walk(): - unless(isinstance(subpart, MyMessage)) - - def test_message_from_file_with_class(self): - unless = self.failUnless - # Create a subclass - class MyMessage(Message): - pass - - fp = openfile('msg_01.txt') - try: - msg = email.message_from_file(fp, MyMessage) - finally: - fp.close() - unless(isinstance(msg, MyMessage)) - # Try something more complicated - fp = openfile('msg_02.txt') - try: - msg = email.message_from_file(fp, MyMessage) - finally: - fp.close() - for subpart in msg.walk(): - unless(isinstance(subpart, MyMessage)) - - def test__all__(self): - module = __import__('email') - all = module.__all__ - all.sort() - self.assertEqual(all, ['Encoders', 'Errors', 'Generator', 'Iterators', - 'MIMEAudio', 'MIMEBase', 'MIMEImage', - 'MIMEMessage', 'MIMEText', 'Message', 'Parser', - 'Utils', - 'message_from_file', 'message_from_string']) - - def test_formatdate(self): - now = 1005327232.109884 - gm_epoch = time.gmtime(0)[0:3] - loc_epoch = time.localtime(0)[0:3] - # When does the epoch start? - if gm_epoch == (1970, 1, 1): - # traditional Unix epoch - matchdate = 'Fri, 09 Nov 2001 17:33:52 -0000' - elif loc_epoch == (1904, 1, 1): - # Mac epoch - matchdate = 'Sat, 09 Nov 1935 16:33:52 -0000' - else: - matchdate = "I don't understand your epoch" - gdate = Utils.formatdate(now) - self.assertEqual(gdate, matchdate) - - def test_formatdate_localtime(self): - now = 1005327232.109884 - ldate = Utils.formatdate(now, localtime=1) - zone = ldate.split()[5] - offset = int(zone[1:3]) * 3600 + int(zone[-2:]) * 60 - # Remember offset is in seconds west of UTC, but the timezone is in - # minutes east of UTC, so the signs differ. - if zone[0] == '+': - offset = -offset - if time.daylight and time.localtime(now)[-1]: - toff = time.altzone - else: - toff = time.timezone - self.assertEqual(offset, toff) - - def test_parsedate_none(self): - self.assertEqual(Utils.parsedate(''), None) - - def test_parseaddr_empty(self): - self.assertEqual(Utils.parseaddr('<>'), ('', '')) - self.assertEqual(Utils.dump_address_pair(Utils.parseaddr('<>')), '') - - - -# Test the iterator/generators -class TestIterators(TestEmailBase): - def test_body_line_iterator(self): - eq = self.assertEqual - # First a simple non-multipart message - msg = self._msgobj('msg_01.txt') - it = Iterators.body_line_iterator(msg) - lines = list(it) - eq(len(lines), 6) - eq(EMPTYSTRING.join(lines), msg.get_payload()) - # Now a more complicated multipart - msg = self._msgobj('msg_02.txt') - it = Iterators.body_line_iterator(msg) - lines = list(it) - eq(len(lines), 43) - eq(EMPTYSTRING.join(lines), openfile('msg_19.txt').read()) - - def test_typed_subpart_iterator(self): - eq = self.assertEqual - msg = self._msgobj('msg_04.txt') - it = Iterators.typed_subpart_iterator(msg, 'text') - lines = [subpart.get_payload() for subpart in it] - eq(len(lines), 2) - eq(EMPTYSTRING.join(lines), """\ -a simple kind of mirror -to reflect upon our own -a simple kind of mirror -to reflect upon our own -""") - - def test_typed_subpart_iterator_default_type(self): - eq = self.assertEqual - msg = self._msgobj('msg_03.txt') - it = Iterators.typed_subpart_iterator(msg, 'text', 'plain') - lines = [] - subparts = 0 - for subpart in it: - subparts += 1 - lines.append(subpart.get_payload()) - eq(subparts, 1) - eq(EMPTYSTRING.join(lines), """\ - -Hi, - -Do you like this message? - --Me -""") - - -class TestParsers(unittest.TestCase): - def test_header_parser(self): - eq = self.assertEqual - # Parse only the headers of a complex multipart MIME document - p = HeaderParser() - fp = openfile('msg_02.txt') - msg = p.parse(fp) - eq(msg['from'], 'ppp-request@zzz.org') - eq(msg['to'], 'ppp@zzz.org') - eq(msg.get_type(), 'multipart/mixed') - eq(msg.is_multipart(), 0) - self.failUnless(isinstance(msg.get_payload(), StringType)) - - - -def suite(): - suite = unittest.TestSuite() - suite.addTest(unittest.makeSuite(TestMessageAPI)) - suite.addTest(unittest.makeSuite(TestEncoders)) - suite.addTest(unittest.makeSuite(TestLongHeaders)) - suite.addTest(unittest.makeSuite(TestFromMangling)) - suite.addTest(unittest.makeSuite(TestMIMEAudio)) - suite.addTest(unittest.makeSuite(TestMIMEImage)) - suite.addTest(unittest.makeSuite(TestMIMEText)) - suite.addTest(unittest.makeSuite(TestMultipartMixed)) - suite.addTest(unittest.makeSuite(TestNonConformant)) - suite.addTest(unittest.makeSuite(TestRFC2047)) - suite.addTest(unittest.makeSuite(TestMIMEMessage)) - suite.addTest(unittest.makeSuite(TestIdempotent)) - suite.addTest(unittest.makeSuite(TestMiscellaneous)) - suite.addTest(unittest.makeSuite(TestIterators)) - suite.addTest(unittest.makeSuite(TestParsers)) - return suite - - - -def test_main(): - from test_support import run_suite - run_suite(suite()) - if __name__ == '__main__': - test_main() + unittest.main(defaultTest='suite')