If you see ‘?’ mark files with ‘ls’ or ‘find’, most likely they are different lang chars (chinise,hebrew,japanes,russian)

Im not just talking about ?, also im talking about chars like this é

For example:

# ls -l
total 0
drwxrwx---+ 1 admin admin 34 Nov 1 07:27 data1
drwxrwx---+ 1 guest guest 20 Oct 31 22:57 ??????????????????
drwxrwx---+ 1 guest guest 20 Oct 31 22:57 ??????
drwxrwx---+ 1 guest guest 20 Oct 31 22:57 ??????
drwxrwx---+ 1 guest guest 20 Oct 31 22:57 ??????
drwxrwx---+ 1 guest guest 20 Oct 31 22:57 ??????

Should show up like this (our fix will show you how to get to this point):

# ls -l
total 0
drwxrwx---+ 1 guest guest 20 Oct 31 22:57 バックアップ
drwxrwx---+ 1 admin admin 34 Nov 1 07:27 data1
drwxrwx---+ 1 guest guest 20 Oct 31 22:57 写真
drwxrwx---+ 1 guest guest 20 Oct 31 22:57 動画
drwxrwx---+ 1 guest guest 20 Oct 31 22:57 文書
drwxrwx---+ 1 guest guest 20 Oct 31 22:57 音楽

SIDENOTE:
I apologize if the above has bad language, im using an example, and I dont know the language.

To fix this: There are 2 steps

You need to set UTF8 in putty and on bash’s environment variables

SIDENOTE:
Your system might have already completed step1 or step2. But both steps need to be complete.

Step 1) “Putty / terminal settings to set the encoding of the terminal program.”

Setting UTF8 on putty is half the solution (just so that putty program can show you the special characters)

Without doing Step1, and continuing to Step2, then all different language chars will still show up as ?? or other bad characters.

SIDENOTE:
if your not using putty, then you must figure out how to set the locale of your terminal to UTF8

SIDENOTE:
utf8 support alot of letters 64000 of them, where as ascii only support 127 or 255 depending on the year of ascii your talking about. For english we can get away with the 127 or 255 default asciis. But for other langs we need utf8, or else different language chars will try to be shown using the ascii letters.

Now your done with this step.
You will now see UTF-8 characters if programs in the linux environment are using UTF-8.
For example “journalctl” or “cat” will show you UTF-8 characters. But things like “ls” and “find” will not.

What to do:
Configuration -> Window -> Translation -> UTF-8

Sidenote: I made the things you need to do, to start a clean utf8 environment red so that they stand out.

Step 2) “Setting Environment Language/Locale Variables for Apps that look at those varibles to find out their encoding type.”

Now we need to set UTF8 inside the shell. Lots of programs like “ls” and “find” use the environment variables “LC_ALL” & “LANG” & “LANGUAGE” to know what encoding to output in. They have been programmed to look at those variables to know which encoding to use. Some programs use LC_ALL, some use LANG, and maybe some use all. Dont ask me why we have to set 3 variables, it just because linux has been through many upgrades, and perhaps some programs/tools didnt upgrade to the new standards (new/different variable names) while others did.

What to do: set the environment vars to UTF8, by doing this.
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
export LANGUAGE=en_US.UTF-8

Now your done. Your “ls”, and other tools should be able to find out what encoding to use by looking at those system variables (they already look at them, you dont need to tell the apps to look those variables, they have been programmed to look at those variables)

To undo step2, so to get back to normal (seeing question mark files)

unset LC_ALL
unset LANG
unset LANGUAGE

SIDENOTE:
export sets the variables in the system. unset removes the variables. you cant

SIDENOTE:
you can make this a persistent through different shell session, by putting the export variables into your ~/.bashrc

SIDENOTE:
you need to export the variables, not just set them. So doing this has no effect:

LC_ALL=en_US.UTF-8
LANG=en_US.UTF-8
LANGUAGE=en_US.UTF-8

So you must do this:

export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
export LANGUAGE=en_US.UTF-8

 Why do we need to export and not simply set a variable? This article explains it well: http://askubuntu.com/questions/205688/whats-the-difference-between-set-export-and-env-and-when-should-i-use-each. When you use export, it lets the current shell and any child shell use the variables you just exported. So export is you setting a variable for your shell, and any future children shells (no parent shells are set). Many programs envoke subshells in their programmming, so you need to export variables for them to see it, or else they wont see it. The program unset, unsets exported and regular variables.

SIDENOTE:
here is an article on Step2, http://perlgeek.de/en/article/set-up-a-clean-utf8-environment, most of the articles on the web for setting encoding in linux assume that you already completed step1, so dont expect them to mention anything about setting the Translation correctly in putty – as there are other ways to login into a system.

SIDENOTE:
if you did step2 but not step1, you might get more than just ??? marks like this, but it will still be incorrect characters

# ls -l
total 0
drwxrwx---+ 1 guest guest 20 Oct 31 22:57 ããã¯ã¢ãã
drwxrwx---+ 1 admin admin 34 Nov 1 07:27 data1
drwxrwx---+ 1 guest guest 20 Oct 31 22:57 åç
drwxrwx---+ 1 guest guest 20 Oct 31 22:57 åç»
drwxrwx---+ 1 guest guest 20 Oct 31 22:57 ææ
drwxrwx---+ 1 guest guest 20 Oct 31 22:57 é³æ¥½

Leave a Reply

Your email address will not be published. Required fields are marked *