UTF-16 byte arrays must be encoded and decoded into and out of java.lang.String. I've been handed byte arrays with a byte order marker (BOM), and I need to encrypt them with a BOM.
Additionally, I'd like to emit the encoding in little endian (together with the LE BOM) since I'm working with a Microsoft client/server to prevent any misunderstandings. Although I am aware that the BOM should operate big endian, I don't want to go against the grain in the Windows environment.
Here is a method that encodes a java.lang as an illustration. Little endian, UTF-16 string with a BOM:
public static byte[] encodeString(String message) {
byte[] tmp = null;
try {
tmp = message.getBytes("UTF-16LE");
} catch(UnsupportedEncodingException e) {
// should not possible
AssertionError ae =
new AssertionError("Could not encode UTF-16LE");
ae.initCause(e);
throw ae;
}
// use brute force method to add BOM
byte[] utf16lemessage = new byte[2 + tmp.length];
utf16lemessage[0] = (byte)0xFF;
utf16lemessage[1] = (byte)0xFE;
System.arraycopy(tmp, 0,
utf16lemessage, 2,
tmp.length);
return utf16lemessage;
}
How should I approach this in Java? In a perfect world, I wouldn't want to transfer the entire byte array into a new one that has two more bytes created at the start.
The same is true for decoding such a string, however java.lang makes this process considerably simpler.
public String(byte[] bytes,
int offset,
int length,
String charsetName)