This is an
excerpt from the book
Advanced PL/SQL: The Definitive Reference by Boobal Ganesan.
Unicode is a universal encoded
character set that allows us to store characters from
multiple languages. Unicode groups all the characters,
irrespective of the program, language or the platform and
assigns a unique code value to them for processing.
Supplementary Characters
The initial version of Unicode used
16 bits for encoding each character. By using 2 bytes for
the encode process, a total of 65,536 characters only could
be represented. This was not sufficient to represent all the
characters in the world. To overcome this limitation, the
supplementary characters were defined by the Unicode
standard. The supplementary characters are the characters in
the Unicode character set outside of the Basic Multilingual
Plane (BMP). The Basic Multilingual Plane (BMP) consists of
the first 65,536 characters in the Unicode character set and
the rest are used by the supplementary characters.
Unicode Encodings
There are more than one Unicode
encoding implementation standards based on the ways in which
the characters are represented by the binary codes.
Converting between these standards can be done using a
simple algorithm based bit wise operation. The different
encoding schemes are described below.
UTF-8 Encoding
UTF-8 is an 8-bit variable width
encoding methodology which is a strict superset of the 7-bit
ASCII implementation. This means that all the characters
from the 7-bit ASCII implementation is available in UTF-8
with the same code values. One Unicode character in this
encoding method can be of either 8-bit, 16-bit, 24-bit or
32-bit. This encoding method is vastly used on the UNIX
platforms, HTML and most of the internet browsers. The main
advantage of UTF-8 encoding is the ease of migration as it
is the same as that of the 7-bit ASCII method.
UCS-2 Encoding
This is a 16-bit fixed width encoding
method, where each character is 16-bit in size regardless of
the language. UCS-2 can encode characters defined up to
Unicode standard 3.0 only, so there is no possibility of
adding any supplementary characters additionally. The main
advantage of this encoding method is the faster processing
of string as all characters are of the same size.
UTF-16 Encoding
UTF-16 is a 16-bit variable width
encoding of Unicode. This is basically an extension of the
UCS-2 method, providing support for supplementary
character's addition which are defined in the Unicode 3.1.
One character can be of either 16-bit
or 32-bit in this encoding and the supplementary characters
are represented in 32-bit. The main advantage of this
encoding method is the memory consumption, most of the Asian
characters stored in this method is about 16-bit in size,
whereas UTF-8 occupied a minimum of 24-bit for the same
character storage.
|
Need to learn to program with PL/SQL?
For complete notes on programming in PL/SQL, we
recommend the book
Advanced PL/SQL: The Definitive Reference by Boobal Ganesan.
This is a complete book on PL/SQL with
everything you need to know to write efficient and
complex PL/SQL code. |
|
|
|
Oracle Training from Don Burleson
The best on site
"Oracle
training classes" are just a phone call away! You can get personalized Oracle training by Donald Burleson, right at your shop!
|
|
|
|
|
Burleson is the American Team
Note:
This Oracle
documentation was created as a support and Oracle training reference for use by our
DBA performance tuning consulting professionals.
Feel free to ask questions on our
Oracle forum.
Verify
experience!
Anyone
considering using the services of an Oracle support expert should
independently investigate their credentials and experience, and not rely on
advertisements and self-proclaimed expertise. All legitimate Oracle experts
publish
their Oracle
qualifications.
Errata?
Oracle technology is changing and we
strive to update our BC Oracle support information. If you find an error
or have a suggestion for improving our content, we would appreciate your
feedback. Just
e-mail:
and include the URL for the page.
Copyright © 1996 - 2020
All rights reserved by
Burleson
Oracle ®
is the registered trademark of Oracle Corporation.
|
|