![]() |
libtld 2.0.14
A library to determine the Top-Level Domain name of any Internet URI.
|
The public header of the libtld library. More...
#include <string>
#include <vector>
#include <stdexcept>
Go to the source code of this file.
Classes | |
class | invalid_domain |
Exception thrown when querying for data of an invalid domain. More... | |
struct | tld_email |
Parts of one email. More... | |
class | tld_email_list |
The C++ side of the email list implementation. More... | |
struct | tld_email_list::tld_email_t |
Parts of one email. More... | |
struct | tld_enumeration_state |
struct | tld_info |
Set of information returned by the tld() function. More... | |
class | tld_object |
Class used to ease the use o the tld() function in C++. More... | |
struct | tld_tag_definition |
Macros | |
#define | LIBTLD_EXPORT |
The export API used by MS-Windows DLLs. | |
#define | LIBTLD_VERSION "2.0.14" |
The version of the library as a string. | |
#define | LIBTLD_VERSION_MAJOR 2 |
The major version as a number. | |
#define | LIBTLD_VERSION_MINOR 0 |
The minor version as a number. | |
#define | LIBTLD_VERSION_PATCH 14 |
The patch version as a number. | |
#define | VALID_URI_ASCII_ONLY 0x0001 |
Whether to check that the URI only includes ASCII. | |
#define | VALID_URI_NO_SPACES 0x0002 |
Whether to check that the URI do not include any spaces. | |
Functions | |
LIBTLD_EXPORT enum tld_result | tld (const char *uri, struct tld_info *info) |
Get information about the TLD for the specified URI. | |
LIBTLD_EXPORT enum tld_result | tld_check_uri (const char *uri, struct tld_info *info, const char *protocols, int flags) |
Check that a URI is valid. | |
LIBTLD_EXPORT void | tld_clear_info (struct tld_info *info) |
Clear the info structure. | |
LIBTLD_EXPORT char * | tld_domain_to_lowercase (const char *domain) |
Transform a domain with a TLD to lowercase before processing. | |
LIBTLD_EXPORT struct tld_email_list * | tld_email_alloc () |
Allocate a list of emails object. | |
LIBTLD_EXPORT int | tld_email_count (struct tld_email_list *list) |
Return the number of emails found after a parse. | |
LIBTLD_EXPORT void | tld_email_free (struct tld_email_list *list) |
Free the list of emails. | |
LIBTLD_EXPORT int | tld_email_next (struct tld_email_list *list, struct tld_email *e) |
Retrieve the next email. | |
LIBTLD_EXPORT enum tld_result | tld_email_parse (struct tld_email_list *list, const char *emails, int flags) |
Parse a list of emails in the email list object. | |
LIBTLD_EXPORT void | tld_email_rewind (struct tld_email_list *list) |
Rewind the reading of the emails. | |
LIBTLD_EXPORT void | tld_free_tlds () |
Clear the allocated TLD file. | |
LIBTLD_EXPORT enum tld_result | tld_get_tag (struct tld_info *info, int tag_idx, struct tld_tag_definition *tag) |
LIBTLD_EXPORT const struct tld_file * | tld_get_tlds () |
Return a pointer to the current list of TLDs. | |
LIBTLD_EXPORT enum tld_result | tld_load_tlds (const char *filename, int fallback) |
Load a TLDs file as the file to be used by the tld() function. | |
LIBTLD_EXPORT enum tld_result | tld_next_tld (struct tld_enumeration_state *state, struct tld_info *info) |
Read the next TLD and return its info. | |
LIBTLD_EXPORT const char * | tld_status_to_string (enum tld_status status) |
Transform the status to a string. | |
LIBTLD_EXPORT int | tld_tag_count (struct tld_info *info) |
LIBTLD_EXPORT const char * | tld_version () |
Return the version of the library. | |
LIBTLD_EXPORT enum tld_category | tld_word_to_category (const char *word, int n) |
This is for backward compatibility. | |
This file declares most of the functions, objects, structures, etc. publicly available from the libtld library. The newer version also offers a compiler and file headers to handle .ini files and compile them in a .tld file.
Definition in file tld.h.
#define LIBTLD_EXPORT |
This definition is used to mark functions and classes as exported from the library. This allows other programs to automatically use functions defined in the library.
The LIBTLD_EXPORT may be set to dllexport or dllimport depending on whether you compile the library or you intend to link against it.
#define LIBTLD_VERSION "2.0.14" |
This definition represents the version of the libtld header you are compiling against. You can compare it to the returned value of the tld_version() function to make sure that everything is compatible (i.e. if the version is not the same, then the tld_info structure may have changed.)
#define LIBTLD_VERSION_MAJOR 2 |
#define LIBTLD_VERSION_MINOR 0 |
#define LIBTLD_VERSION_PATCH 14 |
#define VALID_URI_ASCII_ONLY 0x0001 |
By default the tld_check_uri() function accepts any extended character (i.e. characters over 0x80). This flag can be used to refuse such characters.
#define VALID_URI_NO_SPACES 0x0002 |
By default the tld_check_uri() function accepts spaces as valid characters in a URI (whether they are explicit " ", or written as "+" or "%20".) This flag can be used to refuse all spaces (i.e. this means the "+" and "%20" are also refused.)
enum tld_category |
enum tld_email_field_type |
enum tld_result |
Enumerator | |
---|---|
TLD_RESULT_SUCCESS | Success! The TLD of the specified URI is valid. This result is returned when the URI includes a valid TLD. The function further includes valid results in the tld_info structure. You can accept this URI as valid. |
TLD_RESULT_INVALID | The TLD was found, but it is marked as invalid. This result represents a TLD that is not valid as is for a URI, but it was defined in the TLD data. The function includes further information in the tld_info structure. There you can check the category, status, and other parameters to determine what the TLD really represents. It may be possible to use such a TLD, although as far as web addresses are concerned, these are not considered valid. As mentioned in the statuses, some may mean that the TLD can be changed for another and work (i.e. a country name that changed.) |
TLD_RESULT_NULL | The input URI is empty. The tld() function returns this value whenever the input URI pointer is NULL or the empty string (""). Obviously, no TLD is found in this case. |
TLD_RESULT_NO_TLD | The input URI has no TLD defined. Whenever the URI does not include at least one period (.), this error is returned. Local URIs are considered valid and don't generally include a period (i.e. "localhost", "my-computer", "johns-computer", etc.) We expect that the tld() function would not be called with such URIs. A valid Internet URI must include a TLD. |
TLD_RESULT_BAD_URI | The URI includes characters that are not accepted by the function. This value is returned if a character is found to be incompatible or a sequence of characters is found incompatible. At this time, tld() returns this error if two periods (.) are found one after another. The errors will be increased with time to detect invalid characters (anything outside of [-a-zA-Z0-9.%].) Note that the URI should not start or end with a period. This error will also be returned (at some point) when the function detects such problems. |
TLD_RESULT_NOT_FOUND | The URI has a TLD that could not be determined. The TLD of the URI was searched in the TLD data and could not be found there. This means the TLD is not a valid Internet TLD. |
enum tld_status |
|
extern |
The tld() function searches for the specified URI in the TLD descriptions. The results are saved in the info parameter for later interpretetation (i.e. extraction of the domain name, sub-domains and the exact TLD.)
The function extracts the last extension of the URI. For example, in the following:
the function first extracts ".uk". With that extension, it searches the list of official TLDs. If not found, an error is returned and the info parameter is set to unknown.
When found, the function checks whether that TLD (".uk" in our previous example) accepts sub-TLDs (second, third, forth and fifth level TLDs.) If so, it extracts the next TLD entry (the ".co" in our previous example) and searches for that second level TLD. If found, it again tries with the third level, etc. until all the possible TLDs were exhausted. At that point, it returns the last TLD it found. In case of ".co.uk", it returns the information of the ".co" TLD, second-level domain name.
All the comparisons are done in lowercase. This is because all the data is saved in lowercase and we expect the input of the tld() function to already be in lowercase. If you have a doubt and your input may actually be in uppercase, make sure to call the tld_domain_to_lowercase() function first. That function makes a duplicate of your domain name in lowercase. It understands the XX characters (since the URI is expected to still be encoded) and properly handles UTF-8 characters in order to define the lowercase characters of the input. Note that the tld_domain_to_lowercase() function returns a newly allocated pointer that you are responsible to free once you are done with it.
The info
structure includes:
Assuming that you always get valid URIs, you should get one of those results:
Other results are returned when the input string is considered invalid.
[in] | uri | The URI to be checked. |
[out] | info | A pointer to a tld_info structure to save the result. |
Definition at line 1113 of file tld.cpp.
References tld_info::f_offset, tld_info::f_status, tld_info::f_tld, g_tld_file, search(), tld(), tld_clear_info(), tld_load_tlds_if_not_loaded(), TLD_RESULT_BAD_URI, TLD_RESULT_INVALID, TLD_RESULT_NO_TLD, TLD_RESULT_NOT_FOUND, TLD_RESULT_NULL, TLD_RESULT_SUCCESS, TLD_STATUS_EXCEPTION, and TLD_STATUS_VALID.
Referenced by tld_email_list::tld_email_t::parse(), PHP_FUNCTION(), search(), tld_object::set_domain(), tld(), tld_check_uri(), tld_file_to_json(), and tld_next_tld().
|
extern |
This function very quickly parses a URI to determine whether it is valid.
Note that it does not (currently) support local naming conventions which means that a host such as "localhost" will fail the test.
The protocols
variable can be set to a list of protocol names that are considered valid. For example, for HTTP protocol one could use "http,https". To accept any protocol use an asterisk as in: "*". The protocol must be only characters, digits, or underscores ([0-9A-Za-z_]+) and it must be at least one character.
The flags can be set to the following values, or them to set multiple flags at the same time:
The return value is generally TLD_RESULT_BAD_URI when an invalid character is found in the URI string. The TLD_RESULT_NULL is returned if the URI is a NULL pointer or an empty string. Other results may be returned by the tld() function. If a result other than TLD_RESULT_SUCCESS is returned then the info structure may or may not be updated.
[in] | uri | The URI which validity is being checked. |
[out] | info | The resulting information about the URI domain and TLD. |
[in] | protocols | List of comma separated protocols accepted. |
[in] | flags | A set of flags to tell the function what is valid/invalid. |
Definition at line 1311 of file tld.cpp.
References tld_info::f_offset, tld_info::f_tld, tld(), tld_clear_info(), TLD_RESULT_BAD_URI, TLD_RESULT_NULL, VALID_URI_ASCII_ONLY, and VALID_URI_NO_SPACES.
Referenced by check_uri(), and PHP_FUNCTION().
|
extern |
This function initializes the info structure with defaults. The different TLD functions that make use of this structure will generally call this function first to represent a failure case.
Note that by default the category and status are set to undefined (TLD_CATEGORY_UNDEFINED and TLD_STATUS_UNDEFINED). Also the country and tld pointer are set to NULL and thus they cannot be used as strings.
[out] | info | The tld_info structure to clear. |
Definition at line 705 of file tld.cpp.
References tld_info::f_category, tld_info::f_country, tld_info::f_offset, tld_info::f_status, tld_info::f_tld, TLD_CATEGORY_UNDEFINED, and TLD_STATUS_UNDEFINED.
Referenced by tld(), tld_check_uri(), and tld_next_tld().
|
extern |
This function will transform the input domain name to lowercase. You should call this function before you call the tld() function to make sure that the input data is in lowercase.
This function interprets the XX input data and transforms that to characters. The function further converts UTF-8 characters to wide characters to be able to determine the lowercase version.
In C++ you may use an std::unique_ptr<> with free as the deleter to not have to bother with the call by hand (especially if you have possible exceptions in your code):
[in] | domain | The input domain to convert to lowercase. |
Definition at line 489 of file tld_domain_to_lowercase.c.
References tld_mbtowc(), and tld_wctomb().
Referenced by tld_email_list::tld_email_t::parse().
|
extern |
This function allocates a list of emails object that can then be used to parse a string representing a list of emails and retrieve those emails with the use of the tld_email_next() function.
Definition at line 1480 of file tld_emails.cpp.
References tld_email_list::tld_email_list().
Referenced by PHP_FUNCTION().
|
extern |
This function returns the number of emails that were found in the list of emails passed to the tld_email_parse() function.
[in] | list | The email list object. |
Definition at line 1525 of file tld_emails.cpp.
References list().
|
extern |
This function frees the list of emails as allocated by the tld_email_alloc(). Afterward the list
pointer is not valid anymore.
[in] | list | The list to be freed. |
Definition at line 1493 of file tld_emails.cpp.
References list().
Referenced by PHP_FUNCTION().
|
extern |
This function retrieves the next email found when parsing the emails passed to to the tld_email_parse() function. The function returns 1 when another email was defined. It returns 0 when no more emails exist and the e
parameter does not get set. The function can be called any number of times after it returned zero (0).
[in] | list | The list from which the email is to be read. |
[out] | e | The buffer where the email is to be written. |
Definition at line 1559 of file tld_emails.cpp.
References list().
Referenced by PHP_FUNCTION().
|
extern |
This function parses the email listed in the emails
parameter and saves the result in the list parameter. The function saves the information as a list of email list in the list
object.
[in] | list | The list of emails object. |
[in] | emails | The list of emails to be parsed. |
[in] | flags | The flags are used to change the behavior of the parser. |
Definition at line 1511 of file tld_emails.cpp.
References list(), and tld_email_list::parse().
Referenced by PHP_FUNCTION().
|
extern |
This function resets the position to the start of the list. The next call to the tld_email_next() function will return the first email again.
[in] | list | The list of email object to reset. |
Definition at line 1538 of file tld_emails.cpp.
References list().
|
extern |
Once you are done with the library and if you want to make sure you do not have a memory leak, you can use this function to delete the TLD file which resides in memory.
You can also re-use the library later by either calling the tld_load_tlds() function or just functions that call tld() in which case you'll get the default .tld file loaded or the fallback. However, you cannot use the tld_info and other such structures after this call. Some of the pointers found in those structures may not be valid anymore since we use pointers directly to the TLD file data.
Definition at line 828 of file tld.cpp.
References g_tld_file.
|
extern |
|
extern |
This function returns the list of TLDs that were loaded by the tld_load_tlds() function. If the TLDs were not yet loaded, then the function returns a nullptr.
The structure must be considered 100% read-only. It is possible that the TLDs were loaded from the tld_data.c buffer which means it is read-only data from the library.
Definition at line 809 of file tld.cpp.
References g_tld_file.
|
extern |
This function loads the specified filename
as the current set of data to be used by the tld() function.
You generally do not need to call this function, instead, it will be automatically called with a null pointer which will load the default file as expected.
The fallback
flag can be set to true (the default) to fallback to the static version of the data compiled internally. This is used if the specified or default external file cannot be loaded.
[in] | filename | The file to load or NULL to load the default. |
[in] | fallback | Whether to fallback to the internal data if the input file cannot be loaded. |
Definition at line 744 of file tld.cpp.
References g_tld_file, tld_get_static_tlds_buffer_size(), TLD_RESULT_INVALID, TLD_RESULT_NOT_FOUND, and TLD_RESULT_SUCCESS.
Referenced by tld_load_tlds_if_not_loaded().
|
extern |
This function is used to read all the TLDs one at a time.
To read the first TLD, make sure the state structure is cleared the first time you call the tld_next_tld() function:
The function may return various values and it is important to verify those value to know the state of the info
parameter. In particular, the TLD_RESULT_INVALID means that the returned domain name is considered to exist but it is currently not a valid domain name (i.e. it could be a deprecated or unused intermediate).
info
is considered to be a valid domain name. [in] | state | The current state. Reset to get the very first domain name. |
[in] | info | The structure where the information of the next domain name is saved. |
Definition at line 888 of file tld.cpp.
References tld_description::f_end_offset, tld_info::f_offset, tld_info::f_status, tld_info::f_tld, g_tld_file, tld(), tld_clear_info(), tld_load_tlds_if_not_loaded(), TLD_RESULT_BAD_URI, TLD_RESULT_INVALID, TLD_RESULT_NO_TLD, TLD_RESULT_NOT_FOUND, TLD_RESULT_NULL, TLD_RESULT_SUCCESS, and TLD_STATUS_VALID.
|
extern |
The status returned in a tld_info can be converted to a string using this function. This is useful to print out an error message.
[in] | status | The status to convert to a string. |
Definition at line 49 of file tld_strings.c.
References TLD_STATUS_DEPRECATED, TLD_STATUS_EXCEPTION, TLD_STATUS_INFRASTRUCTURE, TLD_STATUS_PROPOSED, TLD_STATUS_RESERVED, TLD_STATUS_UNDEFINED, TLD_STATUS_UNUSED, and TLD_STATUS_VALID.
Referenced by tld_file_to_json().
|
extern |
|
extern |
This functino returns the version of this library. The version is defined with three numbers: <major>.<minor>.<patch>.
You should be able to use the libversion to compare different libtld versions and know which one is the newest version.
Definition at line 1646 of file tld.cpp.
References LIBTLD_VERSION.
Referenced by main().
|
extern |
Many times, a simple category is not useful because one TLD may actually be part of multiple groups (i.e. a groups, a country, a language, an entrepreneurial TLD can very well exist!)
The idea is to be backward compatible for anyone who was using the old category value. This function will convert the specified word
in a category. The word is expected to be a non null terminated string, hence the parameter n
to specify its length.
[in] | word | The word to convert. |
[in] | n | The exact number of characters (bytes) in the word. |
Definition at line 103 of file tld_strings.c.
References TLD_CATEGORY_BRAND, TLD_CATEGORY_CONTACT, TLD_CATEGORY_COUNTRY, TLD_CATEGORY_ENTREPRENEURIAL, TLD_CATEGORY_GROUP, TLD_CATEGORY_INTERNATIONAL, TLD_CATEGORY_LANGUAGE, TLD_CATEGORY_LOCATION, TLD_CATEGORY_PROFESSIONALS, TLD_CATEGORY_REGION, TLD_CATEGORY_TECHNICAL, and TLD_CATEGORY_UNDEFINED.
This document is part of the Snap! Websites Project.
Copyright by Made to Order Software Corp.