The GNUstep Mime parser. This is collection Objective-C
classes for representing MIME (and HTTP) documents and
managing conversions to and from convenient internal
formats.
The idea is to center round two classes -
document
A container for the actual data (and headers) of a
mime/http document, this is also used to create raw
MIME data for sending.
parser
An object that can be fed data and will parse it into a
document. This object also provides various utility
methods and an API that permits overriding in order
to extend the functionality to cope with new document
types.
Coding contexts are objects used by the parser to
store the state of decoding incoming data while it is
being incrementally parsed. The most rudimentary
context... this is used for decoding plain text and
binary data (ie data which is not really decoded at
all) and all other decoding work is done by a subclass.
This class is intended to provide a wrapper for MIME
messages permitting easy access to the contents of
a message and providing a basis for parsing an unparsing
messages that have arrived via email or as a web
document.
The class keeps track of all the document headers, and
provides methods for modifying and examining the
headers that apply to a document.
Return the MIME characterset name corresponding to
the specified string encoding. As a special
case, returns "us-ascii" if enc is zero.
Returns nil if enc
cannot be mapped to a charset. NB. The
correspondence between charsets and
encodings is not a direct one to one mapping, so
successive calls to
+encodingFromCharset:
and
+charsetFromEncoding:
may not produce the original input.
Decode the source data from base64
encoding and return the result. The
source data is expected to be ASCII text
and may be multiple lines or a line of any length
(decoding is very tolerant).
Converts the base64 encoded data in
source to a decoded ASCII or UTF8 string
using the
+decodeBase64:
method. If the encoded data does not represent an
ASCII or UTF8 string, you should use the
+decodeBase64:
method directly.
Convenience method to return an autoreleased
document using the specified content,
type, and name value. This
calls the
-setContent:type:name:
method to set up the document.
Encode the source data to base64 encoding
and return the result. The resulting data is
ASCII text and contains only the base64 encoded
values with no line breaks or extraneous data. This
is base64 encoded data in it's general format as
mandated in RFC 3548. If the data is to be used as
part of a MIME document body, line breaks must be
introduced at 76 byte intervals (GSMime does
this when automatically encoding data for you). If the
data is to be used in a PEM document line breaks must
be introduced at 74 byte intervals.
Converts the ASCII or UTF8 string
source into base64 encoded data using the
+encodeBase64:
method. If the original data is not an ASCII or
UTF8 string, you should use the
+encodeBase64:
method directly.
Return the string encoding corresponding to the
specified MIME characterset name. As a
special case, returns NSASCIIStringEncoding if
charset is nil.
Returns 0 if charset cannot be found.
NB. We treat iso-10646-ucs-2 as utf-16, which
should work for most text, but is not strictly
correct. The correspondence between
charsets and encodings is not a direct one to one
mapping, so successive calls to
+encodingFromCharset:
and
+charsetFromEncoding:
may not produce the original input.
This method may be called to add a header to the
document. The header must be a mutable
dictionary object that contains at least the
fields that are standard for all headers.
Certain well-known headers are restricted to one
occurrence in an email, and when extra copies
are added they replace originals.
The mime-version header is special... it is inserted
before any other mime headers rather than being
added at the end.
Search the content of this document to locate a part
whose content ID matches the specified key
. Recursively descend into other documents. Wraps
the supplied key in angle brackets if they
are not present. Return nil if no
match is found, the matching GSMimeDocument
otherwise.
Search the content of this document to locate a part
whose content ID matches the specified key
. Recursively descend into other documents. Wraps
the supplied key in angle brackets if they
are not present. Return nil if no
match is found, the matching GSMimeDocument
otherwise.
Search the content of this document to locate a part
whose content-type name or content-disposition name
matches the specified key. Recursively
descend into other documents. Return
nil if no match is found, the matching
GSMimeDocument otherwise.
Search the content of this document to locate all
parts whose content-type name or content-disposition
name matches the specified key. Do
NOT recurse into other documents.
Return nil if no match is found, an
array of matching GSMimeDocument instances otherwise.
Converts any binary parts of the receiver's
content to be base64 (or quoted-printable for text
parts) encoded rather than 8bit or binary encoded...
a convenience method to make the results of the
-rawMimeData
method safe for sending via routes which only
support 7bit data.
Converts any base64 (or quoted-printable) encoded
parts of the receiver's content to be binary encoded
instead... a convenience method to shrink down the
size of the message when converted to data using the
-rawMimeData
method.
Return the content as an NSData object (unless it is
multipart) Perform conversion from text
to data using the charset specified in the content-type
header, or infer the charset, and update the header
accordingly. If the content can not be
represented as a plain NSData object, this
method returns nil.
Make a probably unique string suitable for use as
the boundary parameter in the content of a multipart
document.
This implementation provides base64 encoded data
consisting of an MD5 digest of some pseudo
random stuff, plus an incrementing counter. The
inclusion of the counter guarantees that we
won't produce two identical strings in the same run
of the program.
The boundary has a suffix of '=_' to ensure it's not
mistaken for quoted-printable data.
Create new content ID header, set it as the content
ID of the document and return it. This is a
convenience method which simply places angle
brackets around an
[NSProcessInfo -globallyUniqueString]
to form the header value.
Create new message ID header, set it as the message
ID of the document and return it. This is a
convenience method which simply places angle
brackets around an
[NSProcessInfo -globallyUniqueString]
to form the header value.
Return an NSData object representing the MIME
document as raw data ready to be sent via an email
system. Calls
-rawMimeData:
with the isOuter flag set to YES.
Return an NSData object representing the MIME
document as raw data ready to be sent via an
email system.
The isOuter flag denotes whether this
document is the outermost part of a MIME
message, or is a part of a multipart message.
The fold number specifes the column at
which lines are considered to be 'long', and get
broken/folded.
During generation of the document this method will
perform some consistency checks and try to
automatically generate missing header
information needed to build the mime data
(eg. filling in the boundary parameter in the
content-type header for multipart
documents). However, you should not
depend on automatic behaviors but should fill in
as much detail as possible before generating data.
Convenience method calling
-setContent:type:name:
to set document content and type with a
nil value for name... useful for
top-level documents rather than parts within a
document (parts should really be named).
Convenience method to set the content of the
document along with creating a content-type
header for it.
The type parameter may be a simple common
content type (text, multipart, or
application), in which case the default
subtype for that type is used.
Alternatively it may be full detail of a
content type header value, which will
be parsed into 'type', 'subtype' and 'parameters'.
NB. In this case, if the parsed data
contains a 'name' parameter and the
name argument is non-nil, the argument
value will override the parsed value.
Using this method imposes a few extra checks and
restrictions on the combination of content
and type/subtype you may use... so you may want to
use the more primitive methods in order to bypass
these checks if you are using unusual
type/subtype information or if you need to
provide additional parameters in the header.
Convenience method to set the content type of
the document without altering any content. The
supplied newType may be full type
information including subtype and parameters
as found after the colon in a mime Content-Type
header.
Warning the underscore at the start of the
name of this instance variable indicates that, even
though it is not technically private, it is
intended for internal use within the package, and
you should not use the variable in other code.
Makes the value into a quoted string if necessary (ie
if it contains any special / non-token characters). If
flag is YES then the value is
made into a quoted string even if it does not contain
special characters.
Convert the supplied string to a standardized token
by removing all illegal characters. If
preserve is NO then the
result is converted to lowercase. Returns an
autoreleased (and possibly modified) copy of
the original.
Returns the full value of the header including any
parameters and preserving case. This is an
unfolded (long) line with no
escape sequences (ie contains a unicode string not
necessarily plain ASCII). If you just
want the plain value excluding any parameters, use the
-value
method instead.
Returns the parameters of this header... a
dictionary whose keys are strings preserving the
case originally used to set the values or all
lowercase depending on the preserve
argument.
Returns the full text of the header, built from its
component parts, and including a terminating
CR-LF. If preserve is
YES then we attempt to build the text
using the same case as it was originally parsed/set
from, otherwise we use common conventions of
capitalising the header names and using
lowercase parameter names. If
fold is greater than zero, lines with more
than the specified number of characters are considered
'long' and are folded into multiple lines.
Method to store specific information for particular
types of header. This is used for non-standard parts
of headers. Setting a nil value for
o will remove any existing value set using
the k as its key.
Sets a parameter of this header... converts name to
lowercase and removes illegal characters.
If a nil parameter name is supplied,
removes any parameter with the specified key.
This class provides support for parsing MIME messages
into GSMimeDocument objects. Each parser object
maintains an associated document into which data
is stored.
You supply the document to be parsed as one or more
data items passed to the
-parse:
method, and (if the method always returns
YES, you give it a final
nil argument to mark the end of the
document.
On completion of parsing a valid document, the
[GSMimeParser -mimeDocument]
method returns the resulting parsed document.
If you need to parse faulty documents (eg where a faulty
mail client has produced an email which does not
conform to the MIME standards), you should look at
the
-setBuggyQuotes:
and
-setDefaultCharset:
methods, which are designed to cope with the most
common faults.
Return a coding context object to be used for
decoding data according to the scheme specified in
the header.
The default implementation supports the following
transfer encodings specified in either a
transfer-encoding of
content-transfer-encoding header -
base64
quoted-printable
binary (no coding actually performed)
7bit (no coding actually performed)
8bit (no coding actually performed)
chunked (for HTTP/1.1)
x-uuencode
To add new coding schemes to the parser, you need to
override this method to return a new coding
context for your scheme when the info
argument indicates that this is appropriate.
Return the data accumulated in the parser. If the
parser is still parsing headers, this will be the
header data read so far. If the parse has parsed the
body of the message, this will be the data of the
body, with any transfer encoding removed.
Decodes the raw data from the specified range in
the source data object and appends it to the
destination data object. The context object
provides information about the content encoding
type in use, and the state of the decoding
operation.
This method may be called repeatedly to
incrementally decode information as it
arrives on some communications channel. It should
be called with a nil source data item (or
with the atEnd flag of the context set to
YES) in order to flush any
information held in the context to the output
data object.
You may override this method in order to implement
additional coding schemes, but usually it
should be enough for you to implement a custom
GSMimeCodingContext subclass fotr
this method to use.
Correct the size of the output buffer (shrink back
from the original allocation to the actual unchunked
size).
This method may be called to tell the parser that it
should not expect to parse any headers, and that the
data it will receive is body data. If the parse
is already in the body, or is complete, this method has
no effect. This is for use when some other
utility has been used to parse headers, and you
have set the headers of the document owned by the
parser accordingly. You can then use the
GSMimeParser to read the body data into the
document.
Returns YES if the document parsing is
known to be completed successfully. Returns
NO if either more data is needed, or if
the parser encountered an error.
This method is called repeatedly to pass raw mime
data into the parser. It returns YES as
long as it wants more data to
complete parsing of a document, and
NO if parsing is complete, either due
to having reached the end of a document or due to an
error.
Since it is not always possible to determine if the
end of a MIME document has been reached from its
content, the method may need to be called with a
nil or empty argument after you have
passed all the data to it... this tells it that
the data is complete.
The parser attempts to be as flexible as possible and
to continue parsing wherever it can. If an error
occurs in parsing, the
-isComplete
method will always return NO, even
after the
-parse:
method has been called with a nil
argument.
A multipart document will be parsed to content
consisting of an NSArray of GSMimeDocument
instances representing each part.
Otherwise, a document will become content of
type NSData, unless it is of content type
text, in which case it will be an NSString.
If a document has no content type
specified, it will be treated as text
, unless it is identifiable as a file (eg. t
has a content-disposition header containing a
filename parameter).
This method is called to parse a header line
for the current document, split its
contents into a GSMimeHeader object, and add
that information to the document. The method
is normally used internally by the
-parse:
method, but you may also call it to parse an
entire header line and add it to the document
(this may be useful in conjunction with the
-expectNoHeaders
method, to parse a document body data into a
document where the headers are available from a
separate source).
The standard implementation of this method scans the
header name and then calls
-scanHeaderBody:into:
to complete the parsing of the header.
This method also performs consistency checks on
headers scanned so it is recommended that it is
not overridden, but that subclasses override
-scanHeaderBody:into:
to implement custom scanning.
As a special case, for HTTP support, this method also
parses lines in the format of HTTP responses as if
they were headers named http. The
resulting header object contains additional
object values -
Parses headers from the supplied data returning
YES if more data is needed before the
end of thge headers are reached. If
body is not NULL and the end of the
headers were reached leaving some unused data, that
remaining data is returned. NB. The
returned data is a reference to part of the
original memory buffer provided in d,
so you must copy it if you intend to use it after
modifying or deallocating the original data.
This method is called to parse a header line and
split its contents into the supplied
GSMimeHeader
instance.
On entry, the header (info) is already
partially filled, the name is a lowercase
representation of the header name. The
the scanner must be set to a scan
location immediately after the colon in the
original header string (ie to the header value
string).
If the header is parsed successfully, the method
should return YES, otherwise
NO.
You would not normally call this method directly
yourself, but may override it to support
parsing of new headers. If you do call
this yourself, you need to be aware that it may
change the state of the document in the parser.
You should be aware of the parsing that the
standard implementation performs, and that
needs to be done for certain headers in
order to permit the parser to work generally -
content-disposition
Value
The content disposition (excluding
parameters) as a lowercase string.
A convenience method to scan past any whitespace in the
scanner in preparation for scanning
something more interesting that comes after it.
Returns YES if any space was read,
NO otherwise.
A convenience method to use a scanner (that is
set up to scan a header line) to scan in a special
character that terminated a token previously
scanned. If the token was terminated by whitespace
and no other special character, the string returned
will contain a single space character.
Method to inform the parser that the data it is
parsing is likely to contain fields with buggy use
of backslash quotes... and it should try to be tolerant
of them and treat them as is they were escaped
backslashes. This is for use with things like
microsoft internet explorer, which puts the
backslashes used as file path separators in
parameters without quoting them.
This is a method to inform the parser that body parts
with no content-type header (which are treated as
text/plain) should use the specified
characterset rather than the default
(us-ascii). This also controls the
parsing of headers... in a legal MIME document
these must consist solely of us-ascii characters, but
setting a different default characterset (such as
latin1) will permit many illegal header lines
(produced by faulty mail clients) to be parsed.
HTTP requests use headers in the latin1
characterset, so this is the header line
characterset used most commonly by faulty
clients.
Method to inform the parser that the data it is
parsing is an HTTP document rather than true MIME.
This method is called internally if the parser detects
an HTTP response line at the start of the headers it is
parsing.
Warning the underscore at the start of the
name of this instance variable indicates that, even
though it is not technically private, it is
intended for internal use within the package, and
you should not use the variable in other code.
Warning the underscore at the start of the
name of this instance variable indicates that, even
though it is not technically private, it is
intended for internal use within the package, and
you should not use the variable in other code.
Tries to flush any queued messages to the SMTP
server, completing by the specified
limit date. If limit is
nil then a date in the distant future
is used. If the queue is emptied in time, this
method returns YES, otherwise it
returns NO.
Add the message to the queue of emails to be
sent by the receiver. Also adds an envelope ID
string to be used to uniquely identify the
message for delivery receipting purposes.
For this to work, the SMTP gateway being used
must support the SMTP service extension for delivery
status notification (RFC 3460).
Set the host for the SMTP server. If this is not set
(or is set to nil) then the
GSMimeSMTPClientHost user default is
used. If the host is nil or an empty
string then 'localhost' is used.
Set the host for the SMTP client to identify itsself to
the server. If this is not set (or is set to
nil) then the GSMimeSMTPClientIdentity
user default is used. If the identity is
nil or an empty string then a name of
the current host is use.
Sets the maximum number of messages which may remain
in the queue. If this is exceeded then any unsuccessful
send attempt results in excess queued messages
discarded as unsent. The method returns
the previous setting.
Set the originator for any emails sent by the SMTP
client. This overrides the value in the
'from' header of an email. If this is not set
(or is set to nil) then the
GSMimeSMTPClientOriginator user
default is used. If the originator is
nil or an empty string then the value
in the 'from' header of the email is used.
Set the port for the SMTP server. If this is not set
(or is set to nil) then the
GSMimeSMTPClientPort user default is
used. If the port is not an integer in the 1-65535
range, then '25' (the default SMTP port) is used.
Set the username for authentication to the SMTP server.
If this is not set (or is set to nil) then
the GSMimeSMTPClientUsername user default is used. If
the username is nil or an empty string
then authentication is not attempted.
Instances of the GSMimeSerializer class are used to
serialise GSMimeDocument objects to NSMutableData
objects, producing data in a form suitable for
sending as an Email over the SMTP protocol or in
other forms.
Sets the content transfer encoding used
when 8bit data needs to be sent in a 7bit safe form.
Setting a nil/empty encoding
reverts to the default (base64). Setting an
unknown/inapplicable
encoding raises an exception.
This method allows you to control the
position at which lines in headers and the
body data are wrapped. RFC 2822 says that the
absolute maximum (except for 'binary' content
transfer encoding) is 998 (excluding CRLF), but
the recommended maximum is 78 so we use that by
default. Setting any ridiculously
short value (less than 20) or an
excessively long value
(greater than the 998 character limit supported by
SMTP) actually sets a value of zero, meaning that
there is no limit.
Sets the content transfer encoding used
when 8bit text needs to be sent in a 7bit safe form.
Setting a nil/empty encoding
reverts to the default (quoted-printable).
Setting an unknown/inapplicable encoding
raises an exception.