Resolve Question Mark Character in DB (RQMCD)

This incident happened to me when I was restoring user comment data on one of my blog posts. It so happened that the user used Russian characters for his nickname, so what happened was when the data entered the database the result was like this.

Yup, it turned into a question mark, why is that? It turns out that after I studied it, the problem lies in the "COLLATION" settings of the database & table on phpmyadmin which are still default (latin1_swedish_ci), as a result special characters such as Arabic & Russian alphabet suddenly turned into question marks (????).

Solution

I changed the default setting (latin1_swedish_ci) to utf8_general_ci by changing the charset to utf8 via SQL console query in phpMyadmin. Here is the code:

ALTER TABLE `your_table_name` CHANGE `your_column_name` `

The result is special characters / symbols such as Greek alphabet can be displayed. Overview as the image below, while the actual is in the following link HERE .

Explanation

The default character set for MySQL on (mt) Media Temple is latin1, with the default set latin1_swedish_ci. This is a common encoding for Latin characters. You can also change the encoding. utf8 is a common character set for non-Latin characters.

You can change the collation character set of your database and tables via the phpMyAdmin "Operations" menu or from the command line. Here's how:

Login to phpMyAdmin.
Select your database from the list on the left.
Click on “Operations” from the top set of tabs.
In the Collation box, use the dropdown menu to select utf8_general_ci
Click Go
Do the same for your tables as necessary.

Command Line

mysql -u admin -p`cat /etc/psa/.psa.shadow`

Enter your database password when prompted.

Check the availability of the character set using the following command.

mysql> show character set;
+----------+-----------------------------+---------------------+--------+
| Charset  | Description                 | Default collation   | Maxlen |
+----------+-----------------------------+---------------------+--------+
| big5     | Big5 Traditional Chinese    | big5_chinese_ci     |      2 |
| dec8     | DEC West European           | dec8_swedish_ci     |      1 |
| cp850    | DOS West European           | cp850_general_ci    |      1 |
| hp8      | HP West European            | hp8_english_ci      |      1 |
| koi8r    | KOI8-R Relcom Russian       | koi8r_general_ci    |      1 |
| latin1   | cp1252 West European        | latin1_swedish_ci   |      1 |
| latin2   | ISO 8859-2 Central European | latin2_general_ci   |      1 |
| swe7     | 7bit Swedish                | swe7_swedish_ci     |      1 |
| ascii    | US ASCII                    | ascii_general_ci    |      1 |
| ujis     | EUC-JP Japanese             | ujis_japanese_ci    |      3 |
| sjis     | Shift-JIS Japanese          | sjis_japanese_ci    |      2 |
| hebrew   | ISO 8859-8 Hebrew           | hebrew_general_ci   |      1 |
| tis620   | TIS620 Thai                 | tis620_thai_ci      |      1 |
| euckr    | EUC-KR Korean               | euckr_korean_ci     |      2 |
| koi8u    | KOI8-U Ukrainian            | koi8u_general_ci    |      1 |
| gb2312   | GB2312 Simplified Chinese   | gb2312_chinese_ci   |      2 |
| greek    | ISO 8859-7 Greek            | greek_general_ci    |      1 |
| cp1250   | Windows Central European    | cp1250_general_ci   |      1 |
| gbk      | GBK Simplified Chinese      | gbk_chinese_ci      |      2 |
| latin5   | ISO 8859-9 Turkish          | latin5_turkish_ci   |      1 |
| armscii8 | ARMSCII-8 Armenian          | armscii8_general_ci |      1 |
| utf8     | UTF-8 Unicode               | utf8_general_ci     |      3 |
| ucs2     | UCS-2 Unicode               | ucs2_general_ci     |      2 |
| cp866    | DOS Russian                 | cp866_general_ci    |      1 |
| keybcs2  | DOS Kamenicky Czech-Slovak  | keybcs2_general_ci  |      1 |
| macce    | Mac Central European        | macce_general_ci    |      1 |
| macroman | Mac West European           | macroman_general_ci |      1 |
| cp852    | DOS Central European        | cp852_general_ci    |      1 |
| latin7   | ISO 8859-13 Baltic          | latin7_general_ci   |      1 |
| cp1251   | Windows Cyrillic            | cp1251_general_ci   |      1 |
| cp1256   | Windows Arabic              | cp1256_general_ci   |      1 |
| cp1257   | Windows Baltic              | cp1257_general_ci   |      1 |
| binary   | Binary pseudo charset       | binary              |      1 |
| geostd8  | GEOSTD8 Georgian            | geostd8_general_ci  |      1 |
| cp932    | SJIS for Windows Japanese   | cp932_japanese_ci   |      2 |
| eucjpms  | UJIS for Windows Japanese   | eucjpms_japanese_ci |      3 |
+----------+-----------------------------+---------------------+--------+

Run the following commands to change your database's character set and collation:

ALTER DATABASE dbname CHARACTER SET utf8 COLLATE utf8_general_ci;

Run the following commands to change your character set and table collation:

ALTER TABLE tablename CHARACTER SET utf8 COLLATE utf8_general_ci;

About utf8_general_ci

utf8_unicode_ci uses the standard Unicode Collation Algorithm, supporting so-called expansions and ligatures, for example: German letter ß (U+00DF SHARP LETTER S) is sorted near "ss" Letter Œ (U+0152 LATIN CAPITAL LIGATURE OE) is sorted near "OE".

In the generally applicable standard, the utf8 character set is able to accommodate all types of characters that exist in this world, starting from 1 byte characters, such as in latin1, to 4 bytes such as in Arabic, Chinese, etc.

In MySQL, the utf8 character set, with its default collation utf8_general_ci, can only accommodate characters with a size of 1 to 3 bytes and cannot accommodate characters with a size of 4 bytes.

For space usage, UTF8 in MySQL uses space dynamically, for a character with a size of 1 byte, the space required is also 1 byte, unlike UTF32 which uses 2 bytes of storage space.

Resolve Question Mark Character in DB (RQMCD)

Solution

Explanation

Command Line

About utf8_general_ci

Reference

Post a Comment

نموذج الاتصال