![]() |
libtld 2.0.14
A library to determine the Top-Level Domain name of any Internet URI.
|
Parts of one email. More...
Public Member Functions | |
tld_result | parse (const std::string &email) |
Parse one email to a tld_email_t object. | |
tld_result | parse_group (const std::string &group) |
Parse a group including comments. | |
Public Attributes | |
std::string | f_canonicalized_email = std::string() |
The email including the display name. | |
std::string | f_domain = std::string() |
The domain part of the email address. | |
std::string | f_email_only = std::string() |
The complete email address without display name. | |
std::string | f_fullname = std::string() |
The user full or display name. | |
std::string | f_group = std::string() |
The group this emails was defined in. | |
std::string | f_original_email = std::string() |
The email as read from the source. | |
std::string | f_username = std::string() |
The user being named in this email address. | |
When parsing a list of email addresses, one can include a display name, a user name, and a domain. The user name and domain are mandatory, not the display name. Also the list may include comments and group names.
This structure is used internally to store the emails and when someone queries the different emails with the next()
or tld_email_next()
functions.
Note that in the list of emails, a new group is announced by itself. This means an entry may have just and only the f_group field defined.
The fields of this structure use the same encoding as the input which is expected to be UTF-8 unless otherwise defined in the emails themselves. In the current version we do not decode international characters, however, we do plan to do so in a future version. This means the results should always be seen as valid UTF-8 even if for now it is just ASCII.
parse()
function replaces the list of email in effect invalidating all the pointers of all the tld_email objects that still exist. tld_result tld_email_list::tld_email_t::parse | ( | const std::string & | ) |
The email
parameter is expected to represent exactly one email. This function is expected to only be used by the tld_email_list parser with valid data, although it is definitively not forbidden to make use of this function, you may find it more difficult to use directly.
The canonicalized email address in the list of resulting emails has the domain canonicalized using the tld_domain_to_lowercase() function. This means it will be in lowercase and special characters (including UTF-8 characters) will be transformed to XX notation.
std::logic_error | If a quoted string or a comment have an unexpected character in them then this exception is raised. If you are calling this function directly then you may get this exception. If you called the parse() function of the tld_email_list then this exception should never happen because the previous level captures those errors already (hence the exception.) |
[in] | The email to be parsed. |
Definition at line 972 of file tld_emails.cpp.
References tld_email_list::count(), f_canonicalized_email, f_domain, f_email_only, f_fullname, f_original_email, f_username, tld_email_list::quote_string(), tld(), tld_domain_to_lowercase(), TLD_RESULT_INVALID, TLD_RESULT_NULL, and TLD_RESULT_SUCCESS.
Referenced by tld_email_list::parse_all_emails().
tld_result tld_email_list::tld_email_t::parse_group | ( | const std::string & | group | ) |
This function parses a group name and remove comments and double spaces, and replace all white spaces with character 0x20.
The function also verifies that the input string does not include characters that are considered illegal in a group name such as controls.
Note that the name of the group cannot be empty because when this function is called, it is expected to preceed the colon (:) character.
std::logic_error | This exception is raised if the function detects an invalid comment. This function is not expected to be called directly so comments should never be wrong since these are checked in the parse_all_emails() function and thus cannot logically be wrong here. |
[in] | group | The name of the group to be parsed. |
Definition at line 1389 of file tld_emails.cpp.
References tld_email_list::count(), TLD_RESULT_INVALID, and TLD_RESULT_SUCCESS.
Referenced by tld_email_list::parse_all_emails().
tld_email_list::tld_email_t::f_canonicalized_email = std::string() |
This field is the canonicalized email address with its display name. However, the email address still does not include the group name. If you want to reconstruct the entire input, groups have to be added manually before each canonicalized emails.
The display name will be written between double quotes if any of the characters in the display name are not atom characters. This ensures the display can safely be reparsed.
Note that comments are not included here.
Definition at line 236 of file tld.h.
Referenced by parse().
tld_email_list::tld_email_t::f_domain = std::string() |
The parameter is always defined (except in a group definition) and represents the server handling the mail box for the email address. The domain is always checked for validity with the tld()
function. So if the user typed an address such as:
The email parser returns an error because the domain name m2osw is not valid. It should be m2osw.com or some other similar extension.
All the emails are checked in this way so only valid domains are accepted. Note that also prevents someone from using an IP address as the destination server. So email addresses such as:
Are not considered valid and should never be used anyway.
Definition at line 234 of file tld.h.
Referenced by parse().
tld_email_list::tld_email_t::f_email_only = std::string() |
This field holds the complete email address. You can use this email address as is to send emails to that user, although it is customary to include the display name when available. The email is canonical in the sense that it has no fluff added (no group name, no comments, no white spaces.)
Note that if the name includes characters that are not part of the atom set of characters, then it will be written between double quotes (i.e. the name of the user could include a space, a comma, etc.)
Similarly, the domain name could include characters that cannot be represented with an atom, although that's unlikely for a valid domain name. In that case, the domain is written between square brackets.
Definition at line 235 of file tld.h.
Referenced by parse().
tld_email_list::tld_email_t::f_fullname = std::string() |
This parameter is called the display name of the email. In most cases it is the full name of the owner of the email address. For example, in the following email address:
The full name is "Wilke, Alexis".
It is common to find empty full names. Your interpretation as a human of the full name is likely to be correct. However, the assumption for a common format is most certainly incorrect. For example, in "Wilke, Alexis", assuming that "Alexis" is a first name is just and only an assumption. In a display name such as "Albert George, Jr." the "Jr." is not the first name. There is no definition on how the display name should be presented.
Definition at line 232 of file tld.h.
Referenced by parse().
tld_email_list::tld_email_t::f_group = std::string() |
The name of the group is most often empty since not too many people make use of that parameter in lists of emails. However, when defined one of the "emails" will represent the group by itself, meaning that only this field is defined (all others are empty strings.) It is very important to remember because otherwise you will misinterpret an entry. It also means that if you have just one email, but it is defined in a group, then the number of emails returned is 2.
Definition at line 230 of file tld.h.
Referenced by tld_email_list::parse_all_emails().
tld_email_list::tld_email_t::f_original_email = std::string() |
The original email field has the complete email as it appeared in the source. This means this field includes the comments and additional spaces. It can be used to reconstruct the original string except for the possible trimming that was done before and after the email (the parser removes the leading and ending white spaces, new lines, and carriage returns.)
In general this is only used for display so the user can see what one expects to see.
Definition at line 231 of file tld.h.
Referenced by parse().
tld_email_list::tld_email_t::f_username = std::string() |
This parameter is always defined (except in a group definition) and represents the user name of the email address. This is the user as defined on the destination machine. Under a Unix system it is the user as listed in /etc/passwd.
The character set limitations of the target machine are not known when we parse an email. It is expected that the destination generates an error if the character set is not supported. On our end, the final result is always UTF-8.
Definition at line 233 of file tld.h.
Referenced by parse().
This document is part of the Snap! Websites Project.
Copyright by Made to Order Software Corp.