Invalid data usually occurs in a database because the
NLS_LANG parameter is not set properly on the
client. The NLS_LANG value should reflect the
client operating system code page. For example, in an
English Windows environment, the code page is WE8MSWIN1252.
When the NLS_LANG parameter is set properly,
the database can automatically convert incoming data from
the client operating system. When the NLS_LANG
parameter is not set properly, then the data coming into the
database is not converted properly. For example, suppose
that the database character set is AL32UTF8, the client is
an English Windows operating system, and the NLS_LANG
setting on the client is AL32UTF8. Data coming into the
database is encoded in WE8MSWIN1252 and is not converted to
AL32UTF8 data because the NLS_LANG setting on
the client matches the database character set. Thus Oracle
assumes that no conversion is necessary, and invalid data is
entered into the database.
This can lead to
two possible data inconsistency problems. One problem occurs
when a database contains data from a character set that is
different from the database character set but the same code
points exist in both character sets. For example, if the
database character set is WE8ISO8859P1 and the
NLS_LANG setting of the Chinese Windows NT client is
SIMPLIFIED CHINESE_CHINA.WE8ISO8859P1, then all multibyte
Chinese data (from the ZHS16GBK character set) is stored as
multiples of single-byte WE8ISO8859P1 data. This means that
Oracle treats these characters as single-byte WE8ISO8859P1
characters.