Official Mafiree Blogs

Keeping you informed about Databases

 Home  /  MySQL  
Choosing the right character set
By Murali   |   June 14, 2019   |   Posted in : MySQL

                 
                Modern applications often uses data in different languages,those who know how computer encoding works.ASCII (American Standard Code for Information Interchange) was the default encoding standard. The 8-bit/1 byte character encoding covered all the characters in the English Language, numbers and the most commonly used special characters (!,.* and so on…). But with time, many Non-English speakers also started using computers and eventually computers started supporting these foreign languages.So as to support these foreign languages, a good byte stream encoding system was required, which would support a wide-range of characters (English and foreign languages) that is called utf8/utf8mb3.

In this blog post, we’ll look at available character set options we have and how to choose the right character set.


What is character set?
          
           MySQL includes character set support that enables you to store data using a variety of character sets and perform comparisons according to a variety of collations. You can specify character sets at the server, database, table, and column level.


What is collation?
 
         A collation is a set of rules that defines how to compare and sort character strings. Each collation in MySQL belongs to a single character set. Every character set has at least one collation, and most have two or more collations.


How to differ character set from collations?

          A character set is a set of symbols and encodings. A collation is a set of rules for comparing characters in a character set.

What is unicode character set?

        The Unicode Standard includes characters from the Basic Multilingual Plane (BMP) and supplementary characters that lie outside the BMP.

MySQL supports these Unicode character sets:

Utf 8    - A UTF-8 encoding of the Unicode character set using one to three bytes per character.
Ucs 2   -The UCS-2 encoding of the Unicode character set using two bytes per character.
Utf 16  -The UTF-16 encoding for the Unicode character set using two or four bytes per character.
Utf 32  -The UTF-32 encoding for the Unicode character set using four bytes per character.
 

Comparison over UTF-8,UTF-16 and UTF-32

 
Why utf8mb4 is better than utf 8?


         For a very long time, MySQL’s default encoding was latin1; this supports basic English text and common punctuation reasonably well. However, it has limited support for other languages, and it does not support modern emoji characters. Eventually, MySQL very reasonably changed it’s default to UTF8.
           
       Utf8mb3 is an alias for utf8. The older versions of MySQL only provided support for storing UTF8 encoded characters that used 1 to 3 bytes. This was enough to cover the most commonly used characters, but is not suitable for applications that accept user input where any character can be submitted (like emojis, which use 4 bytes). Newer versions of MySQL provide a character encoding called utf8mb4.It fully supports Unicode, including astral symbols.It handles emoji and some chinese characters that are missing from utf8.

          While we need to insert the characters that occupies 4 bytes (like Emoji and chinese characters),MySQL throughs the error below.

 

Incorrect string value: '\xF0\x9F\x98\x83 <...' for column 'summary' at row 1

 

You can move from “utf8” to “utf8mb4” using the following SQL commands:
 

ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;

ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

ALTER TABLE table_name CHANGE column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

 

Verdict!

        If you need to use MySQL or MariaDB, Please use utf8mb4 instead of uff8 for using supplementary characters as well as for better performance.

 



0 Comments


Leave a Comment

Name *

Email *

Comment *



Search All Blogs



Need Support?


Solutions      Services      Resources      About Mafiree

More than 3000 servers monitored

Certified database Administrators

24X7X365 Support

Happy Clients


ENQUIRE NOW
For Sales Related Queries, Please Call Our Sales Experts at

 +91-80-41155993


Meet Mafiree on social networks!

     

PCI Certificate nasscom member

Copyright © 2019 - All Rights Reserved - Mafiree