libtld 2.0.14
A library to determine the Top-Level Domain name of any Internet URI.
Classes | Macros | Enumerations | Functions
tld.h File Reference

The public header of the libtld library. More...

#include <string>
#include <vector>
#include <stdexcept>
Include dependency graph for tld.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Classes

class  invalid_domain
 Exception thrown when querying for data of an invalid domain. More...
 
struct  tld_email
 Parts of one email. More...
 
class  tld_email_list
 The C++ side of the email list implementation. More...
 
struct  tld_email_list::tld_email_t
 Parts of one email. More...
 
struct  tld_enumeration_state
 
struct  tld_info
 Set of information returned by the tld() function. More...
 
class  tld_object
 Class used to ease the use o the tld() function in C++. More...
 
struct  tld_tag_definition
 

Macros

#define LIBTLD_EXPORT
 The export API used by MS-Windows DLLs.
 
#define LIBTLD_VERSION   "2.0.14"
 The version of the library as a string.
 
#define LIBTLD_VERSION_MAJOR   2
 The major version as a number.
 
#define LIBTLD_VERSION_MINOR   0
 The minor version as a number.
 
#define LIBTLD_VERSION_PATCH   14
 The patch version as a number.
 
#define VALID_URI_ASCII_ONLY   0x0001
 Whether to check that the URI only includes ASCII.
 
#define VALID_URI_NO_SPACES   0x0002
 Whether to check that the URI do not include any spaces.
 

Enumerations

enum  tld_category {
  TLD_CATEGORY_INTERNATIONAL , TLD_CATEGORY_PROFESSIONALS , TLD_CATEGORY_LANGUAGE , TLD_CATEGORY_GROUP ,
  TLD_CATEGORY_REGION , TLD_CATEGORY_TECHNICAL , TLD_CATEGORY_COUNTRY , TLD_CATEGORY_LOCATION ,
  TLD_CATEGORY_ENTREPRENEURIAL , TLD_CATEGORY_BRAND , TLD_CATEGORY_CONTACT , TLD_CATEGORY_UNDEFINED
}
 
enum  tld_email_field_type {
  TLD_EMAIL_FIELD_TYPE_INVALID = -1 , TLD_EMAIL_FIELD_TYPE_UNKNOWN , TLD_EMAIL_FIELD_TYPE_MAILBOX_LIST , TLD_EMAIL_FIELD_TYPE_MAILBOX ,
  TLD_EMAIL_FIELD_TYPE_ADDRESS_LIST , TLD_EMAIL_FIELD_TYPE_ADDRESS_LIST_OPT
}
 
enum  tld_result {
  TLD_RESULT_SUCCESS , TLD_RESULT_INVALID , TLD_RESULT_NULL , TLD_RESULT_NO_TLD ,
  TLD_RESULT_BAD_URI , TLD_RESULT_NOT_FOUND
}
 
enum  tld_status {
  TLD_STATUS_VALID , TLD_STATUS_PROPOSED , TLD_STATUS_DEPRECATED , TLD_STATUS_UNUSED ,
  TLD_STATUS_RESERVED , TLD_STATUS_INFRASTRUCTURE , TLD_STATUS_EXAMPLE , TLD_STATUS_UNDEFINED ,
  TLD_STATUS_EXCEPTION = 100
}
 

Functions

LIBTLD_EXPORT enum tld_result tld (const char *uri, struct tld_info *info)
 Get information about the TLD for the specified URI.
 
LIBTLD_EXPORT enum tld_result tld_check_uri (const char *uri, struct tld_info *info, const char *protocols, int flags)
 Check that a URI is valid.
 
LIBTLD_EXPORT void tld_clear_info (struct tld_info *info)
 Clear the info structure.
 
LIBTLD_EXPORT char * tld_domain_to_lowercase (const char *domain)
 Transform a domain with a TLD to lowercase before processing.
 
LIBTLD_EXPORT struct tld_email_listtld_email_alloc ()
 Allocate a list of emails object.
 
LIBTLD_EXPORT int tld_email_count (struct tld_email_list *list)
 Return the number of emails found after a parse.
 
LIBTLD_EXPORT void tld_email_free (struct tld_email_list *list)
 Free the list of emails.
 
LIBTLD_EXPORT int tld_email_next (struct tld_email_list *list, struct tld_email *e)
 Retrieve the next email.
 
LIBTLD_EXPORT enum tld_result tld_email_parse (struct tld_email_list *list, const char *emails, int flags)
 Parse a list of emails in the email list object.
 
LIBTLD_EXPORT void tld_email_rewind (struct tld_email_list *list)
 Rewind the reading of the emails.
 
LIBTLD_EXPORT void tld_free_tlds ()
 Clear the allocated TLD file.
 
LIBTLD_EXPORT enum tld_result tld_get_tag (struct tld_info *info, int tag_idx, struct tld_tag_definition *tag)
 
LIBTLD_EXPORT const struct tld_filetld_get_tlds ()
 Return a pointer to the current list of TLDs.
 
LIBTLD_EXPORT enum tld_result tld_load_tlds (const char *filename, int fallback)
 Load a TLDs file as the file to be used by the tld() function.
 
LIBTLD_EXPORT enum tld_result tld_next_tld (struct tld_enumeration_state *state, struct tld_info *info)
 Read the next TLD and return its info.
 
LIBTLD_EXPORT const char * tld_status_to_string (enum tld_status status)
 Transform the status to a string.
 
LIBTLD_EXPORT int tld_tag_count (struct tld_info *info)
 
LIBTLD_EXPORT const char * tld_version ()
 Return the version of the library.
 
LIBTLD_EXPORT enum tld_category tld_word_to_category (const char *word, int n)
 This is for backward compatibility.
 

Detailed Description

This file declares most of the functions, objects, structures, etc. publicly available from the libtld library. The newer version also offers a compiler and file headers to handle .ini files and compile them in a .tld file.

Definition in file tld.h.

Macro Definition Documentation

◆ LIBTLD_EXPORT

#define LIBTLD_EXPORT

This definition is used to mark functions and classes as exported from the library. This allows other programs to automatically use functions defined in the library.

The LIBTLD_EXPORT may be set to dllexport or dllimport depending on whether you compile the library or you intend to link against it.

Definition at line 41 of file tld.h.

◆ LIBTLD_VERSION

#define LIBTLD_VERSION   "2.0.14"

This definition represents the version of the libtld header you are compiling against. You can compare it to the returned value of the tld_version() function to make sure that everything is compatible (i.e. if the version is not the same, then the tld_info structure may have changed.)

Definition at line 51 of file tld.h.

◆ LIBTLD_VERSION_MAJOR

#define LIBTLD_VERSION_MAJOR   2

This definition represents the major version of the libtld header you are compiling against.

Definition at line 48 of file tld.h.

◆ LIBTLD_VERSION_MINOR

#define LIBTLD_VERSION_MINOR   0

This definition represents the minor version of the libtld header you are compiling against.

Definition at line 49 of file tld.h.

◆ LIBTLD_VERSION_PATCH

#define LIBTLD_VERSION_PATCH   14

This definition represents the patch version of the libtld header you are compiling against. Some people call this number the release number.

Definition at line 50 of file tld.h.

◆ VALID_URI_ASCII_ONLY

#define VALID_URI_ASCII_ONLY   0x0001

By default the tld_check_uri() function accepts any extended character (i.e. characters over 0x80). This flag can be used to refuse such characters.

Definition at line 126 of file tld.h.

◆ VALID_URI_NO_SPACES

#define VALID_URI_NO_SPACES   0x0002

By default the tld_check_uri() function accepts spaces as valid characters in a URI (whether they are explicit " ", or written as "+" or "%20".) This flag can be used to refuse all spaces (i.e. this means the "+" and "%20" are also refused.)

Definition at line 127 of file tld.h.

Enumeration Type Documentation

◆ tld_category

Enumerator
TLD_CATEGORY_INTERNATIONAL 

International TLDs.

This category represents TLDs that can be used by anyone anywhere in the world. In some cases, these have some limits (i.e. only a museum can register a .museum TLD.) However, the most well known international extension is .com and this one has absolutely no restrictions.

TLD_CATEGORY_PROFESSIONALS 

Professional TLDs.

This category is offered to professionals. Some countries already offer second-level domain name registrations for professionals and either way they are not used very much. These are reserved for people such as accountants, attorneys, and doctors.

Only people who have a lisence with a government can register a .pro domain name.

TLD_CATEGORY_LANGUAGE 

Language specific TLDs.

At time of writing, there is one language extension: .cat for the Catalan language. The idea of the language extensions is to offer a language, rather than a country, a way to have a website that all the people on the Earth can read in their language.

TLD_CATEGORY_GROUP 

Groups specific TLDs.

The concept of groups is similar to the language grouping, but in this case it may reference to a specific group of people (but not based on anything such as ethnicity).

Examples of groups are .kids and .gay.

TLD_CATEGORY_REGION 

Region specific TLDs.

It has been proposed, like the .eu, to have extensions based on well defined regions such as .asia for all of Asia. We currently also have .aq for Antartique (French spelling). Some proposed regions are .africa and city names such as .paris and .wien.

Old TLDs that were for countries but are not assigned to those because the country disappeared (i.e. in general was split in two and both new countries have different names,) and future regions appear in this category.

We keep old TLDs because it is not unlikely that such will be used every now and then and they can, in this way, cleanly be refused by your software.

TLD_CATEGORY_TECHNICAL 

Technical extensions are considered internal.

These are likely valid (i.e. the .arpa is valid) but are used for technical reasons and not for regular URIs. So they are present but must certainly be ignored by your software.

To avoid returning TLD_RESULT_SUCCESS when a TLD with such a category is found, we mark these with the TLD_STATUS_INFRASTRUCTURE.

TLD_CATEGORY_COUNTRY 

A country extension.

Most of the extensions are country extensions. Country extensions are generally further broken down with second-level domain names. Some countries even have third, forth, and fifth level domain names.

TLD_CATEGORY_LOCATION 

Another region specific TLDs.

This category is not currently used and probably won't be since TLD_CATEGORY_REGION is more than sufficient for this purpose.

TLD_CATEGORY_ENTREPRENEURIAL 

A private extension.

Some private companies and individuals purchased domains that they then use as a TLD reselling sub-domains from that main domain name.

For example, the ".blogspot.com" domain is offered by blogspot as a TLD to their users. This gives the users the capability to define a cookie at the ".blogspot.com" level but not directly under ".com". In other words, two distinct site such as:

  • "a.blogspot.com", and
  • "b.blogspot.com"

cannot share their cookies. Yet, ".com" by itself is also a top-level domain name that anyone can use.

TLD_CATEGORY_BRAND 

The TLD is owned and represents a brand.

This category is used to mark top level domain names that are specific to one company. Note that certain TLDs are owned by companies now, but they are not automatically marked as a brand (i.e. ".lol").

TLD_CATEGORY_CONTACT 

The attached TLD has contact information.

Some TLDs are submitted to Mozilla by someone who becomes the point of contact for the corresponding TLDs. In most cases, this is the name and email of that contact person.

TLD_CATEGORY_UNDEFINED 

The TLD was not found.

This category is used to initialize the information structure and is used to show that the TLD was not found.

Definition at line 53 of file tld.h.

◆ tld_email_field_type

Enumerator
TLD_EMAIL_FIELD_TYPE_INVALID 

The input of email_field_type() was not valid.

An email field is expected to be valid ASCII characters. This error is returned if invalid characters are found.

TLD_EMAIL_FIELD_TYPE_UNKNOWN 

The input does not represent valid emails.

The email_field_type() function returns this value if the input field does not represent what is considered a field with email addresses. If you are parsing many email fields, you probably want to see this as a soft error (i.e. an error saying that the field can be skip as far as the TLD library is concerned.)

TLD_EMAIL_FIELD_TYPE_MAILBOX_LIST 

The input represents a mailbox list.

The fields FROM and RESENT-FROM are viewed as mailbox lists. These fields may include a list of email addresses.

TLD_EMAIL_FIELD_TYPE_MAILBOX 

The input represents a mailbox.

The fields SENDER and RESENT-SENDER are viewed as mailbox fields. These are expected to include only one email address.

TLD_EMAIL_FIELD_TYPE_ADDRESS_LIST 

The input represents a mandatory list of mailboxes.

The fields TO, CC, REPLY-TO, RESENT-TO, and RESENT-CC are viewed as mailbox fields. These are expected to include any number of email addresses.

TLD_EMAIL_FIELD_TYPE_ADDRESS_LIST_OPT 

The input represents an optional list of email addresses.

The fields BBC and RESENT-BBC are viewed as optional mailbox fields. These may not exist, be empty, or have one or more email addresses.

Definition at line 161 of file tld.h.

◆ tld_result

enum tld_result
Enumerator
TLD_RESULT_SUCCESS 

Success! The TLD of the specified URI is valid.

This result is returned when the URI includes a valid TLD. The function further includes valid results in the tld_info structure.

You can accept this URI as valid.

TLD_RESULT_INVALID 

The TLD was found, but it is marked as invalid.

This result represents a TLD that is not valid as is for a URI, but it was defined in the TLD data. The function includes further information in the tld_info structure. There you can check the category, status, and other parameters to determine what the TLD really represents.

It may be possible to use such a TLD, although as far as web addresses are concerned, these are not considered valid. As mentioned in the statuses, some may mean that the TLD can be changed for another and work (i.e. a country name that changed.)

TLD_RESULT_NULL 

The input URI is empty.

The tld() function returns this value whenever the input URI pointer is NULL or the empty string (""). Obviously, no TLD is found in this case.

TLD_RESULT_NO_TLD 

The input URI has no TLD defined.

Whenever the URI does not include at least one period (.), this error is returned. Local URIs are considered valid and don't generally include a period (i.e. "localhost", "my-computer", "johns-computer", etc.) We expect that the tld() function would not be called with such URIs.

A valid Internet URI must include a TLD.

TLD_RESULT_BAD_URI 

The URI includes characters that are not accepted by the function.

This value is returned if a character is found to be incompatible or a sequence of characters is found incompatible.

At this time, tld() returns this error if two periods (.) are found one after another. The errors will be increased with time to detect invalid characters (anything outside of [-a-zA-Z0-9.%].)

Note that the URI should not start or end with a period. This error will also be returned (at some point) when the function detects such problems.

TLD_RESULT_NOT_FOUND 

The URI has a TLD that could not be determined.

The TLD of the URI was searched in the TLD data and could not be found there. This means the TLD is not a valid Internet TLD.

Definition at line 91 of file tld.h.

◆ tld_status

enum tld_status
Enumerator
TLD_STATUS_VALID 

The TLD is currently valid.

This status represents a TLD that is currently fully valid and supported by the owners.

These can be part of URIs representing valid resources.

TLD_STATUS_PROPOSED 

The TLD was proposed but not yet accepted.

The TLD is nearly considered valid, at least it is in the process to get accepted. The TLD will not work until officially accepted.

No valid URIs can include this TLD until it becomes TLD_STATUS_VALID.

TLD_STATUS_DEPRECATED 

The TLD was once in use.

This status is used by TLDs that were valid (TLD_STATUS_VALID) at some point in time and was changed to another TLD rendering that one useless (or incorrect in the case of a country name change.)

This status means such URIs are not to be considered valid. However, it may be possible to emit a 301 (in terms of HTTP protocol) to fix the problem.

TLD_STATUS_UNUSED 

The TLD was officially assigned but not put to use.

This special status is used for all the TLDs that were assigned to a specific entity, but never actually put to use. Many smaller countries (especially islands) are assigned this status.

Unused TLDs are not valid in any URI until marked valid.

TLD_STATUS_RESERVED 

The TLD is reserved so no one can use it.

This special case forces the specified TLDs into a "do not use" list. Seeing such TLDs may happen by people who whish it were official, but it is not considered legal.

A reserved TLD may represent a second TLD that was assigned to a specific country or other category. It may be possible to do a transfer from that TLD to the official TLD (i.e. Great Britain was assigned .gb, but instead uses .uk; URIs with .gb could be transformed with .uk and checked for validity.)

TLD_STATUS_INFRASTRUCTURE 

These TLDs are reserved for the Internet infrastructure.

These TLDs cannot be used with standard URIs. These are used to make the Internet functional instead.

All URIs for standard resources must refuse these URIs.

TLD_STATUS_UNDEFINED 

Special status to indicate we did not find the TLD.

The info structure is returned with an undefined status whenever the TLD could not be found in the list of existing TLDs. This means the URI is completely invalid. (The only exception would be if you support some internal TLDs.)

URI what cannot get a TLD_STATUS_VALID should all be considered invalid. But those marked as TLD_STATUS_UNDEFINED are completely invalid. This being said, you may want to make sure you passed the correct string. The URI must be just and only the set of sub-domains, the domain, and the TLDs. No protocol, slashes, colons, paths, query strings, anchors are accepted in the URI.

TLD_STATUS_EXCEPTION 

Special status to indicate an exception which is not directly a TLD.

When a NIC decides to change their setup it can generate exceptions. For example, the UK first made use of .uk and as such offered a few customers to use .uk. Later they decided to only offer second level domain names such as the .co.uk and .ac.uk. This generated a few exceptions on the .uk domain name. For example, the police.uk domain was in use at that time and thus it was an exception. We reference it as ".police.uk" in our data file yet the TLD in that case is just ".uk".

Note
The .uk top domain is now available to anyone. Another example that is still in place is the .ar.

Definition at line 69 of file tld.h.

Function Documentation

◆ tld()

LIBTLD_EXPORT enum tld_result tld ( char const *  uri,
struct tld_info info 
)
extern

The tld() function searches for the specified URI in the TLD descriptions. The results are saved in the info parameter for later interpretetation (i.e. extraction of the domain name, sub-domains and the exact TLD.)

The function extracts the last extension of the URI. For example, in the following:

example.co.uk

the function first extracts ".uk". With that extension, it searches the list of official TLDs. If not found, an error is returned and the info parameter is set to unknown.

When found, the function checks whether that TLD (".uk" in our previous example) accepts sub-TLDs (second, third, forth and fifth level TLDs.) If so, it extracts the next TLD entry (the ".co" in our previous example) and searches for that second level TLD. If found, it again tries with the third level, etc. until all the possible TLDs were exhausted. At that point, it returns the last TLD it found. In case of ".co.uk", it returns the information of the ".co" TLD, second-level domain name.

All the comparisons are done in lowercase. This is because all the data is saved in lowercase and we expect the input of the tld() function to already be in lowercase. If you have a doubt and your input may actually be in uppercase, make sure to call the tld_domain_to_lowercase() function first. That function makes a duplicate of your domain name in lowercase. It understands the XX characters (since the URI is expected to still be encoded) and properly handles UTF-8 characters in order to define the lowercase characters of the input. Note that the tld_domain_to_lowercase() function returns a newly allocated pointer that you are responsible to free once you are done with it.

Warning
If you call tld() with the pointer return by tld_domain_to_lowercase(), keep in mind that the tld() function saves pointers of the input string directly in the tld_info structure. In other words, you want to free() that string AFTER you are done with the tld_info structure.

The info structure includes:

  • f_category – the category of TLD, unless set to TLD_CATEGORY_UNDEFINED, it is considered valid
  • f_status – the status of the TLD, unless set to TLD_STATUS_UNDEFINED, it was defined from the tld_data.xml file; however, only those marked as TLD_STATUS_VALID are considered to currently be in use, all the other statuses can be used by your software, one way or another, but it should not be accepted as valid in a URI
  • f_country – if the category is set to TLD_CATEGORY_COUNTRY then this pointer is set to the name of the country
  • f_tld – is set to the full TLD of your domain name; this is a pointer WITHIN your uri string so make sure you keep your URI string valid if you intend to use this f_tld string
  • f_offset – the offset to the first period within the domain name TLD (i.e. in our previous example, it would be the offset to the first period in ".co.uk", so in "example.co.uk" the offset would be 7. Assuming you prepend "www." to have the URI "www.example.co.uk" then the offset would be 11.)
Note
In our previous example, the ".uk" TLD is properly used: it includes a second level domain name (".co".) The URI "example.uk" should have returned TLD_RESULT_INVALID since .uk by itself was not supposed to be acceptable. This changed a few years ago. The good thing is that it resolves some problems as some companies were given a simple ".uk" TLD and these were exceptions the library does not need to support anymore. There are still some countries, such as ".bd", which do not accept second level names, so "example.bd" does return an error (TLD_RESULT_INVALID).

Assuming that you always get valid URIs, you should get one of those results:

  • TLD_RESULT_SUCCESS – success! the URI is valid and the TLD was properly determined; use the f_tld or f_offset to extract the TLD domain and sub-domains
  • TLD_RESULT_INVALID – known TLD, but not currently valid; this result is returned when we know that the TLD is not to be accepted

Other results are returned when the input string is considered invalid.

Note
The function only accepts a bare URI, in other words: no protocol, no path, no anchor, no query string, and still URI encoded. Also, it should not start and/or end with a period or you are likely to get an invalid response. (i.e. don't use any of ".example.co.uk.", "example.co.uk.", nor ".example.co.uk")
/* TLD library -- TLD example
* Copyright (c) 2011-2025 Made to Order Software Corp. All Rights Reserved
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sublicense, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice shall be included
* in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
* IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
* CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
#include <libtld/tld.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
const char *uri = "WWW.Example.Co.Uk";
char *uri_lowercase;
struct tld_info info;
enum tld_result r;
if(argc > 1)
{
uri = argv[1];
}
// if your input may include uppercase characters and you
// do not have an easy way to compute the lowercase before
// calling tld(), call the tld_domain_to_lowercase() function
uri_lowercase = tld_domain_to_lowercase(uri);
r = tld(uri_lowercase, &info);
{
const char *s = uri_lowercase + info.f_offset - 1;
while(s > uri_lowercase)
{
if(*s == '.')
{
++s;
break;
}
--s;
}
// here uri_lowercase points to your sub-domains, the length is
// "s - uri_lowercase"
// if uri_lowercase == s then there are no sub-domains
// s points to the domain name, the length is "info.f_tld - s"
// and info.f_tld points to the TLD
//
// When TLD_RESULT_SUCCESS is returned the domain cannot be an
// empty string; also the TLD cannot be empty, however, there
// may be no sub-domains.
printf("Sub-domain(s): \"%.*s\"\n", (int)(s - uri_lowercase), uri_lowercase);
printf("Domain: \"%.*s\"\n", (int)(info.f_tld - s), s);
printf("TLD: \"%s\"\n", info.f_tld);
free(uri_lowercase);
return 0;
}
free(uri_lowercase);
return 1;
}
// vim: ts=4 sw=4 et
Set of information returned by the tld() function.
Definition tld.h:102
The public header of the libtld library.
LIBTLD_EXPORT char * tld_domain_to_lowercase(const char *domain)
Transform a domain with a TLD to lowercase before processing.
LIBTLD_EXPORT enum tld_result tld(const char *uri, struct tld_info *info)
Get information about the TLD for the specified URI.
Definition tld.cpp:1113
tld_result
Definition tld.h:92
@ TLD_RESULT_SUCCESS
Success! The TLD of the specified URI is valid.
Definition tld.h:93
Parameters
[in]uriThe URI to be checked.
[out]infoA pointer to a tld_info structure to save the result.
Returns
One of the TLD_RESULT_... enumeration values.

Definition at line 1113 of file tld.cpp.

References tld_info::f_offset, tld_info::f_status, tld_info::f_tld, g_tld_file, search(), tld(), tld_clear_info(), tld_load_tlds_if_not_loaded(), TLD_RESULT_BAD_URI, TLD_RESULT_INVALID, TLD_RESULT_NO_TLD, TLD_RESULT_NOT_FOUND, TLD_RESULT_NULL, TLD_RESULT_SUCCESS, TLD_STATUS_EXCEPTION, and TLD_STATUS_VALID.

Referenced by tld_email_list::tld_email_t::parse(), PHP_FUNCTION(), search(), tld_object::set_domain(), tld(), tld_check_uri(), tld_file_to_json(), and tld_next_tld().

◆ tld_check_uri()

LIBTLD_EXPORT enum tld_result tld_check_uri ( const char *  uri,
struct tld_info info,
const char *  protocols,
int  flags 
)
extern

This function very quickly parses a URI to determine whether it is valid.

Note that it does not (currently) support local naming conventions which means that a host such as "localhost" will fail the test.

The protocols variable can be set to a list of protocol names that are considered valid. For example, for HTTP protocol one could use "http,https". To accept any protocol use an asterisk as in: "*". The protocol must be only characters, digits, or underscores ([0-9A-Za-z_]+) and it must be at least one character.

The flags can be set to the following values, or them to set multiple flags at the same time:

  • VALID_URI_ASCII_ONLY – refuse characters that are not in the first 127 range (we expect the URI to be UTF-8 encoded and any byte with bit 7 set is considered invalid if this flag is set, including encoded bytes such as A0)
  • VALID_URI_NO_SPACES – refuse spaces whether they are encoded with + or %20 or verbatim.

The return value is generally TLD_RESULT_BAD_URI when an invalid character is found in the URI string. The TLD_RESULT_NULL is returned if the URI is a NULL pointer or an empty string. Other results may be returned by the tld() function. If a result other than TLD_RESULT_SUCCESS is returned then the info structure may or may not be updated.

Parameters
[in]uriThe URI which validity is being checked.
[out]infoThe resulting information about the URI domain and TLD.
[in]protocolsList of comma separated protocols accepted.
[in]flagsA set of flags to tell the function what is valid/invalid.
Returns
The result of the operation, TLD_RESULT_SUCCESS if the URI is valid.
See also
tld()
Todo:
The following is WRONG:
  • the domain %XX are not being checked properly, as it stands the characters following % can be anything!
  • the tld() function must be called with the characters still encoded; if you look at the data, you will see that I kept the data encoded (i.e. with the %XX characters)
  • what could be checked (which I guess could be for the entire domain name) is whether the entire string represents valid UTF-8; I don't think I'm currently doing so here. (I have such functions in the tld_domain_to_lowercase() now)

Definition at line 1311 of file tld.cpp.

References tld_info::f_offset, tld_info::f_tld, tld(), tld_clear_info(), TLD_RESULT_BAD_URI, TLD_RESULT_NULL, VALID_URI_ASCII_ONLY, and VALID_URI_NO_SPACES.

Referenced by check_uri(), and PHP_FUNCTION().

◆ tld_clear_info()

LIBTLD_EXPORT void tld_clear_info ( struct tld_info info)
extern

This function initializes the info structure with defaults. The different TLD functions that make use of this structure will generally call this function first to represent a failure case.

Note that by default the category and status are set to undefined (TLD_CATEGORY_UNDEFINED and TLD_STATUS_UNDEFINED). Also the country and tld pointer are set to NULL and thus they cannot be used as strings.

Parameters
[out]infoThe tld_info structure to clear.

Definition at line 705 of file tld.cpp.

References tld_info::f_category, tld_info::f_country, tld_info::f_offset, tld_info::f_status, tld_info::f_tld, TLD_CATEGORY_UNDEFINED, and TLD_STATUS_UNDEFINED.

Referenced by tld(), tld_check_uri(), and tld_next_tld().

◆ tld_domain_to_lowercase()

LIBTLD_EXPORT char * tld_domain_to_lowercase ( const char *  domain)
extern

This function will transform the input domain name to lowercase. You should call this function before you call the tld() function to make sure that the input data is in lowercase.

This function interprets the XX input data and transforms that to characters. The function further converts UTF-8 characters to wide characters to be able to determine the lowercase version.

Warning
The function allocates a new buffer to save the result in it. You are responsible for freeing that buffer. So the following code is wrong:
struct tld_info info;
tld(tld_domain_to_lowercase(domain), &info);
// WRONG: tld_domain_to_lowercase() leaked a heap buffer

In C++ you may use an std::unique_ptr<> with free as the deleter to not have to bother with the call by hand (especially if you have possible exceptions in your code):

std::unique_ptr<char, void(*)(char *)> lowercase_domain(tld_domain_to_lowercase(domain.c_str()), reinterpret_cast<void(*)(char *)>(&::free));
Parameters
[in]domainThe input domain to convert to lowercase.
Returns
A pointer to the resulting conversion, NULL if the buffer cannot be allocated or the input data is considered invalid.

Definition at line 489 of file tld_domain_to_lowercase.c.

References tld_mbtowc(), and tld_wctomb().

Referenced by tld_email_list::tld_email_t::parse().

◆ tld_email_alloc()

LIBTLD_EXPORT struct tld_email_list * tld_email_alloc ( )
extern

This function allocates a list of emails object that can then be used to parse a string representing a list of emails and retrieve those emails with the use of the tld_email_next() function.

Note
The object is a C++ class.
Returns
A pointer to a list of emails object.
See also
tld_email_next()

Definition at line 1480 of file tld_emails.cpp.

References tld_email_list::tld_email_list().

Referenced by PHP_FUNCTION().

◆ tld_email_count()

LIBTLD_EXPORT int tld_email_count ( struct tld_email_list list)
extern

This function returns the number of emails that were found in the list of emails passed to the tld_email_parse() function.

Parameters
[in]listThe email list object.
Returns
The number of emails defined in the object, it may be zero.

Definition at line 1525 of file tld_emails.cpp.

References list().

◆ tld_email_free()

LIBTLD_EXPORT void tld_email_free ( struct tld_email_list list)
extern

This function frees the list of emails as allocated by the tld_email_alloc(). Afterward the list pointer is not valid anymore.

Parameters
[in]listThe list to be freed.

Definition at line 1493 of file tld_emails.cpp.

References list().

Referenced by PHP_FUNCTION().

◆ tld_email_next()

LIBTLD_EXPORT int tld_email_next ( struct tld_email_list list,
struct tld_email e 
)
extern

This function retrieves the next email found when parsing the emails passed to to the tld_email_parse() function. The function returns 1 when another email was defined. It returns 0 when no more emails exist and the e parameter does not get set. The function can be called any number of times after it returned zero (0).

Parameters
[in]listThe list from which the email is to be read.
[out]eThe buffer where the email is to be written.
Returns
The function returns 0 if the end of the list was reached, it returns 1 if e was defined with the next email.
See also
tld_email_parse()

Definition at line 1559 of file tld_emails.cpp.

References list().

Referenced by PHP_FUNCTION().

◆ tld_email_parse()

LIBTLD_EXPORT enum tld_result tld_email_parse ( struct tld_email_list list,
char const *  emails,
int  flags 
)
extern

This function parses the email listed in the emails parameter and saves the result in the list parameter. The function saves the information as a list of email list in the list object.

Parameters
[in]listThe list of emails object.
[in]emailsThe list of emails to be parsed.
[in]flagsThe flags are used to change the behavior of the parser.
Returns
TLD_RESULT_SUCCESS if the email was parsed successfully, another TLD_RESULT_... when an error is detected

Definition at line 1511 of file tld_emails.cpp.

References list(), and tld_email_list::parse().

Referenced by PHP_FUNCTION().

◆ tld_email_rewind()

LIBTLD_EXPORT void tld_email_rewind ( struct tld_email_list list)
extern

This function resets the position to the start of the list. The next call to the tld_email_next() function will return the first email again.

Parameters
[in]listThe list of email object to reset.

Definition at line 1538 of file tld_emails.cpp.

References list().

◆ tld_free_tlds()

LIBTLD_EXPORT void tld_free_tlds ( )
extern

Once you are done with the library and if you want to make sure you do not have a memory leak, you can use this function to delete the TLD file which resides in memory.

You can also re-use the library later by either calling the tld_load_tlds() function or just functions that call tld() in which case you'll get the default .tld file loaded or the fallback. However, you cannot use the tld_info and other such structures after this call. Some of the pointers found in those structures may not be valid anymore since we use pointers directly to the TLD file data.

Definition at line 828 of file tld.cpp.

References g_tld_file.

◆ tld_get_tag()

LIBTLD_EXPORT enum tld_result tld_get_tag ( struct tld_info info,
int  tag_idx,
struct tld_tag_definition tag 
)
extern

Definition at line 1695 of file tld.cpp.

◆ tld_get_tlds()

LIBTLD_EXPORT const struct tld_file * tld_get_tlds ( )
extern

This function returns the list of TLDs that were loaded by the tld_load_tlds() function. If the TLDs were not yet loaded, then the function returns a nullptr.

The structure must be considered 100% read-only. It is possible that the TLDs were loaded from the tld_data.c buffer which means it is read-only data from the library.

Warning
Calling the tld_free_tlds() invalidates the pointer returned by this file since it releases all the allocated buffers including the pointer returned by this function.
Returns
A pointer to the in memory tld_file structure or nullptr.

Definition at line 809 of file tld.cpp.

References g_tld_file.

◆ tld_load_tlds()

LIBTLD_EXPORT enum tld_result tld_load_tlds ( char const *  filename,
int  fallback 
)
extern

This function loads the specified filename as the current set of data to be used by the tld() function.

You generally do not need to call this function, instead, it will be automatically called with a null pointer which will load the default file as expected.

The fallback flag can be set to true (the default) to fallback to the static version of the data compiled internally. This is used if the specified or default external file cannot be loaded.

Warning
You can call this function at any time to switch between .tld files. However, any structure loaded with this function prior to a call to this function must all be considered invalid since some string pointers in those structures may still point in the old buffer.
Parameters
[in]filenameThe file to load or NULL to load the default.
[in]fallbackWhether to fallback to the internal data if the input file cannot be loaded.
Returns
A tld_result representing the success or failure: TLD_RESULT_SUCCESS for success, TLD_RESULT_INVALID for errors where the file could not be read, and TLD_RESULT_NOT_FOUND if the file is not found.

Definition at line 744 of file tld.cpp.

References g_tld_file, tld_get_static_tlds_buffer_size(), TLD_RESULT_INVALID, TLD_RESULT_NOT_FOUND, and TLD_RESULT_SUCCESS.

Referenced by tld_load_tlds_if_not_loaded().

◆ tld_next_tld()

LIBTLD_EXPORT enum tld_result tld_next_tld ( struct tld_enumeration_state state,
struct tld_info info 
)
extern

This function is used to read all the TLDs one at a time.

To read the first TLD, make sure the state structure is cleared the first time you call the tld_next_tld() function:

struct tld_enumeration_state state = {};
struct tld_info info;
for(;;)
{
enum tld_result r = tld_next_tld(&state, &info);
{
// you already found the last TLD
return;
}
...
}
LIBTLD_EXPORT enum tld_result tld_next_tld(struct tld_enumeration_state *state, struct tld_info *info)
Read the next TLD and return its info.
Definition tld.cpp:888
@ TLD_RESULT_NOT_FOUND
The URI has a TLD that could not be determined.
Definition tld.h:98

The function may return various values and it is important to verify those value to know the state of the info parameter. In particular, the TLD_RESULT_INVALID means that the returned domain name is considered to exist but it is currently not a valid domain name (i.e. it could be a deprecated or unused intermediate).

  • TLD_RESULT_SUCCESS – if the returned info is considered to be a valid domain name.
  • TLD_RESULT_INVALID – if the code found a domain name which is not currently considered valid (deprecated, unused, reserved, etc.)
  • TLD_RESULT_NULL – if one of the input pointers is null, return this and nothing happened.
  • TLD_RESULT_NO_TLD – if the file includes more levels than available in the state structure
  • TLD_RESULT_BAD_URI – if some error is detected which is neither a NULL or too many levels in the files
  • TLD_RESULT_NOT_FOUND – if no more results are available (i.e. you reached the end of the list)
Note
The tld_info.f_tld will be a pointer to the tld_enumeration_state.f_domain and the tld_info.f_offset is changed to point at the start of the computed domain name.
Parameters
[in]stateThe current state. Reset to get the very first domain name.
[in]infoThe structure where the information of the next domain name is saved.
Returns
This function returns one of the TLD_RESULT_... values as indicated above.

Definition at line 888 of file tld.cpp.

References tld_description::f_end_offset, tld_info::f_offset, tld_info::f_status, tld_info::f_tld, g_tld_file, tld(), tld_clear_info(), tld_load_tlds_if_not_loaded(), TLD_RESULT_BAD_URI, TLD_RESULT_INVALID, TLD_RESULT_NO_TLD, TLD_RESULT_NOT_FOUND, TLD_RESULT_NULL, TLD_RESULT_SUCCESS, and TLD_STATUS_VALID.

◆ tld_status_to_string()

LIBTLD_EXPORT const char * tld_status_to_string ( enum tld_status  status)
extern

The status returned in a tld_info can be converted to a string using this function. This is useful to print out an error message.

Parameters
[in]statusThe status to convert to a string.
Returns
A string representing the input tld_status.

Definition at line 49 of file tld_strings.c.

References TLD_STATUS_DEPRECATED, TLD_STATUS_EXCEPTION, TLD_STATUS_INFRASTRUCTURE, TLD_STATUS_PROPOSED, TLD_STATUS_RESERVED, TLD_STATUS_UNDEFINED, TLD_STATUS_UNUSED, and TLD_STATUS_VALID.

Referenced by tld_file_to_json().

◆ tld_tag_count()

LIBTLD_EXPORT int tld_tag_count ( struct tld_info info)
extern

Definition at line 1675 of file tld.cpp.

◆ tld_version()

LIBTLD_EXPORT const char * tld_version ( )
extern

This functino returns the version of this library. The version is defined with three numbers: <major>.<minor>.<patch>.

You should be able to use the libversion to compare different libtld versions and know which one is the newest version.

Returns
A constant string with the version of the library.

Definition at line 1646 of file tld.cpp.

References LIBTLD_VERSION.

Referenced by main().

◆ tld_word_to_category()

LIBTLD_EXPORT enum tld_category tld_word_to_category ( const char *  word,
int  n 
)
extern

Many times, a simple category is not useful because one TLD may actually be part of multiple groups (i.e. a groups, a country, a language, an entrepreneurial TLD can very well exist!)

The idea is to be backward compatible for anyone who was using the old category value. This function will convert the specified word in a category. The word is expected to be a non null terminated string, hence the parameter n to specify its length.

Parameters
[in]wordThe word to convert.
[in]nThe exact number of characters (bytes) in the word.
Returns
The corresponding TLD_CATEGORY_... or TLD_CATEGORY_UNDEFINED if the word could not be converted.

Definition at line 103 of file tld_strings.c.

References TLD_CATEGORY_BRAND, TLD_CATEGORY_CONTACT, TLD_CATEGORY_COUNTRY, TLD_CATEGORY_ENTREPRENEURIAL, TLD_CATEGORY_GROUP, TLD_CATEGORY_INTERNATIONAL, TLD_CATEGORY_LANGUAGE, TLD_CATEGORY_LOCATION, TLD_CATEGORY_PROFESSIONALS, TLD_CATEGORY_REGION, TLD_CATEGORY_TECHNICAL, and TLD_CATEGORY_UNDEFINED.

This document is part of the Snap! Websites Project.

Copyright by Made to Order Software Corp.