Friday, March 4, 2011

Some Important Points in JAVA


Internationalization is the process of designing software so that it can be adapted (localized) to various languages and regions easily, cost-effectively, and in particular without engineering changes to the software. This generally involves isolating the parts of a program that are dependent on language and culture. For example, the text of error messages must be kept separate from program source code because they must be translated during localization.
Localization is the process of adapting a program for use in a specific locale. A locale is a geographic or political region that shares the same language and customs. Localization includes the translation of text such as user interface labels, error messages, and online help. It also includes the culture-specific formatting of data items such as monetary values, times, dates, and numbers.
It's best to start internationalization right from the beginning, when you determine the requirements for your software. To design the flexibility into your software that's necessary to enable easy localization, you need to understand how the requirements differ among all the countries and languages (locales) that you plan to support. You can use the Sun Software Product Internationalization Taxonomy to guide you in this process. The Java Tutorial also provides a simple Checklist that helps you identify some common issues. Once you have identified the requirements, the Internationalization Trail of the Java Tutorial and other materials referenced from the Java Internationalization page can help you find appropriate solutions for design and implementation.
Yes, Sun's JREs let you type the euro character, render it, convert it from and to numerous character encodings, and use it when formatting numeric values as currency. For text input and rendering, you need the appropriate support in the host operating system - see the documentation for Windows andSolaris. For formatting with a currency symbol, Sun's JREs from version 1.4 use the euro as the default currency for the member countries of the European Monetary Union, while for Sun's JRE 1.3.1 you need to select locales with the "EURO" variant.

The Java programming language is based on the Unicode character set, and several libraries implement the Unicode standard. The primitive data type char in the Java programming language is an unsigned 16-bit integer that can represent a Unicode code point in the range U+0000 to U+FFFF, or the code units of UTF-16. The various types and classes in the Java platform that represent character sequences - char[], implementations of java.lang.CharSequence (such as the Stringclass), and implementations of java.text.CharacterIterator - are UTF-16 sequences.
Unicode is an international character set standard which supports all of the major scripts of the world, as well as common technical symbols. The original Unicode specification defined characters as fixed-width 16-bit entities, but the Unicode standard has since been changed to allow for characters whose representation requires more than 16 bits. The range of legal code points is now U+0000 to U+10FFFF. You can learn more about the Unicode standard at the Unicode Consortium web site.
Character handling in J2SE 5 is based on version 4.0 of the Unicode standard. This includes support for supplementary characters, which has been specified by the JSR 204 expert group and implemented throughout the JDK. See the article Supplementary Characters in the Java Platform, theJava Specification Request 204 or the Character class documentation for more information.
J2SE 1.4 uses version 3.0 of the Unicode standard, and J2SE 1.3 uses version 2.1. They generally don't support supplementary characters.
coded character set is a character set (a collection of characters) where each character has been assigned a unique number. At the core of the Unicode standard is a coded character set that assigns the letter "A" the number 004116 and the letter "€" (the symbol for the euro currency) the number 20AC16. The Unicode standard always uses hexadecimal numbers, and writes them with the prefix "U+", so the number for "A" is written as "U+0041".
Code points are the numbers that can be used in a coded character set. A coded character set defines a range of valid code points, but doesn't necessarily assign characters to all those code points. The valid code points for Unicode are U+0000 to U+10FFFF. Unicode 4.0 assigns characters to 96,382 of these more than a million code points.
Supplementary characters are characters with code points in the range U+10000 to U+10FFFF, that is, those characters that could not be represented in the original 16-bit design of Unicode. The set of characters from U+0000 to U+FFFF is sometimes referred to as the Basic Multilingual Plane (BMP). Thus, each Unicode character is either in the BMP or a supplementary character.
character encoding scheme is a mapping from the numbers of one or more coded character sets to sequences of one or more fixed-width code units. The most commonly used code units are 8-bit bytes, but 16-bit or 32-bit integers can also be used for internal processing. UTF-32, UTF-16, and UTF-8 are character encoding schemes for the coded character set of the Unicode standard.
character encoding is a mapping from a set of characters to sequences of code units. They apply a character encoding scheme to one or more coded character sets. Some commonly used character encodings are UTF-8, ISO-8859-1, GB18030, Shift_JIS.
UTF-16 uses sequences of one or two unsigned 16-bit code units to encode Unicode code points. Values U+0000 to U+FFFF are encoded in one 16-bit unit with the same value. Supplementary characters are encoded in two code units, the first from the high-surrogates range (U+D800 to U+DBFF), the second from the low-surrogates range (U+DC00 to U+DFFF). This may seem similar in concept to multi-byte encodings, but there is an important difference: The values U+D800 to U+DFFF are reserved for use in UTF-16; no characters are assigned to them as code points. This means, software can tell for each individual code unit in a string whether it represents a one-unit character or whether it is the first or second unit of a two-unit character. This is a significant improvement over some traditional multi-byte character encodings, where the byte value 0x41 could mean the letter "A" or be the second byte of a two-byte character.

A locale is a geographic or political region that shares the same language and customs. In the Java platform, a locale is represented by a Locale object. Locale-sensitive operations, such as collation and date formatting, vary according to locale.

No comments: