We also have StandardCharsets
. It’s great to have a class with these constants. But UTF-8, UTF-16BE and UTF-16LE are all the same set of charaters: Unicode
Why didn’t they just call it Encoding
and StandardEncodings
? Those would be the correct names. Beginners are already confused enough about these things.
The Javadoc of the class has a section about the Terminology:
The name of this class is taken from the terms used in RFC 2278. In that document a charset is defined as the combination of one or more coded character sets and a character-encoding scheme. (This definition is confusing; some other software systems define charset as a synonym for coded character set.)
Java documentation of Charset
And there you can find a historical note, which explains why “charset” was used instead of “encoding”:
HISTORICAL NOTE: The term “character set” was originally used in MIME to describe such straightforward schemes as US-ASCII and ISO-8859-1 which consist of a small set of characters and a simple one-to-one mapping from single octets to single characters. Multi-octet character encoding schemes and switching techniques make the situation much more complex. As such, the definition of this term was revised to emphasize both the conversion aspect of the process, and the term itself has been changed to “charset” to emphasize that it is not, after all, just a set of characters. A discussion of these issues as well as specification of standard terminology for use in the IETF appears in RFC 2130.
RFC 2278
So this is just because in early days there were no encodings. A character set simply was an ordered set of characters and you would always just save the binary representation of each character as a byte (often only 7 bits).
Now we have UTF, which doesn’t work like that. The distinction between charset and encoding is important.