libtld 2.0.14
A library to determine the Top-Level Domain name of any Internet URI.
Functions
tld_domain_to_lowercase.c File Reference

Force lowercase for all characters in the domain name. More...

#include "libtld/tld.h"
#include "tld_data.h"
#include <malloc.h>
#include <stdlib.h>
#include <string.h>
#include <wctype.h>
Include dependency graph for tld_domain_to_lowercase.c:

Go to the source code of this file.

Functions

static int tld_byte_in (const char **s)
 Read one byte of data.
 
static int tld_byte_out (char **s, int *max_length, char byte)
 The tld_byte_out() outputs a character.
 
static int tld_dec2hex (int d)
 Transform a number to a hexadecimal digit.
 
char * tld_domain_to_lowercase (const char *domain)
 Transform a domain with a TLD to lowercase before processing.
 
static int tld_hex2dec (char c)
 Transform a hexadecimal digit to a number.
 
static wint_t tld_mbtowc (const char **s)
 Transform a multi-byte UTF-8 character to a wide character.
 
static int tld_wctomb (wint_t wc, char **s, int *max_length)
 Convert a wide character to UTF-8.
 

Detailed Description

This file includes the functions used to convert a domain name from whatever case it comes in as to lowercase only. The input domain name is expected to still be URL encoded and be valid UTF-8.

Definition in file tld_domain_to_lowercase.c.

Function Documentation

◆ tld_byte_in()

static int tld_byte_in ( const char **  s)
static

The tld_byte_in() function reads one byte. The byte may either be a XX or a plain byte. The input may be UTF-8 characters.

The input pointer (s) get incremented automatically as required.

Parameters
[in]sThe pointer to a string pointer where the byte the read is.
Returns
The byte or -1 if an error occurs.

Definition at line 115 of file tld_domain_to_lowercase.c.

References tld_hex2dec().

Referenced by tld_mbtowc().

◆ tld_byte_out()

static int tld_byte_out ( char **  s,
int *  max_length,
char  byte 
)
static

This function ensures that the byte being output is properly defined according to URI encoding rules. This means all the characters get converted to XX except the few that can be encoded as is (i.e. some of the ASCII characters.)

Parameters
[in,out]sThe output string where the character is saved.
[in,out]max_lengthThe length of s, adjusted each time s is incremented.
[in]byteThe byte to output in s.
Returns
0 if no error occurs, -1 on buffer overflow.

Definition at line 166 of file tld_domain_to_lowercase.c.

References tld_dec2hex().

Referenced by tld_wctomb().

◆ tld_dec2hex()

static int tld_dec2hex ( int  d)
static

This function transforms the specified number in a hexadecimal digit. The number must be a value between 0 and 15.

Parameters
[in]dA number from 0 to 15 to convert to a hexadecimal digit.
Returns
The character 0-9 or A-F.

Definition at line 92 of file tld_domain_to_lowercase.c.

Referenced by tld_byte_out().

◆ tld_domain_to_lowercase()

char * tld_domain_to_lowercase ( const char *  domain)

This function will transform the input domain name to lowercase. You should call this function before you call the tld() function to make sure that the input data is in lowercase.

This function interprets the XX input data and transforms that to characters. The function further converts UTF-8 characters to wide characters to be able to determine the lowercase version.

Warning
The function allocates a new buffer to save the result in it. You are responsible for freeing that buffer. So the following code is wrong:
struct tld_info info;
tld(tld_domain_to_lowercase(domain), &info);
// WRONG: tld_domain_to_lowercase() leaked a heap buffer
Set of information returned by the tld() function.
Definition tld.h:102
LIBTLD_EXPORT char * tld_domain_to_lowercase(const char *domain)
Transform a domain with a TLD to lowercase before processing.
LIBTLD_EXPORT enum tld_result tld(const char *uri, struct tld_info *info)
Get information about the TLD for the specified URI.
Definition tld.cpp:1113

In C++ you may use an std::unique_ptr<> with free as the deleter to not have to bother with the call by hand (especially if you have possible exceptions in your code):

std::unique_ptr<char, void(*)(char *)> lowercase_domain(tld_domain_to_lowercase(domain.c_str()), reinterpret_cast<void(*)(char *)>(&::free));
Parameters
[in]domainThe input domain to convert to lowercase.
Returns
A pointer to the resulting conversion, NULL if the buffer cannot be allocated or the input data is considered invalid.

Definition at line 489 of file tld_domain_to_lowercase.c.

References tld_mbtowc(), and tld_wctomb().

Referenced by tld_email_list::tld_email_t::parse().

◆ tld_hex2dec()

static int tld_hex2dec ( char  c)
static

This function transforms the specified character c to a number from 0 to 15.

The function supports upper and lower case.

Parameters
[in]cAn hexadecimal character to transform to a number.
Returns
The number corresponding to the hexadecimal character or -1 if the character is not 0-9, A-F, nor a-f.

Definition at line 61 of file tld_domain_to_lowercase.c.

Referenced by tld_byte_in().

◆ tld_mbtowc()

static wint_t tld_mbtowc ( const char **  s)
static

This function transforms a UTF-8 encoded character, which may use 1 to 4 bytes, to a wide character (31 bit).

Bug:
This function transforms letters to lowercase on the fly (one by one) which may not always be correct in Unicode (some languages make use of multiple characters to properly calculate various things such as uppercase and lowercase characters.)
Parameters
[in]sA pointer to string with possible UTF-8 bytes.
Returns
The corresponding UTF-32 character in lowercase, NUL character ('\0' when the end of the string is reached, or -1 if the input is invalid.

Definition at line 298 of file tld_domain_to_lowercase.c.

References tld_byte_in().

Referenced by tld_domain_to_lowercase().

◆ tld_wctomb()

static int tld_wctomb ( wint_t  wc,
char **  s,
int *  max_length 
)
static

This function quickly transforms a wide character to UTF-8. The output buffer is pointed by s and has max_length byte left for output.

The function returns -1 if the character cannot be converted. There are the main reasons for failure:

  • the input wide character is not valid (out of bounds)
  • the input wide character represents a UTF-16 encoding value
  • the output buffer is full
  • the character ends with 0xFFFE or 0xFFFF

The function automatically adjusts the output buffer and max_length parameters.

Parameters
[in]wcThe wide character to convert
[in,out]sThe pointer to the output string pointer.
[in,out]max_lengthThe size of the output string buffer.
Returns
Zero on success, -1 on error.

Definition at line 388 of file tld_domain_to_lowercase.c.

References tld_byte_out().

Referenced by tld_domain_to_lowercase().

This document is part of the Snap! Websites Project.

Copyright by Made to Order Software Corp.