Byte-order modifiers are one of the Perl 5.10 features farther along in perl5100delta, after the really big features. To any pack format, you can append a < or a > to specify that the format is little-endian or big-endian, respectively. This allows you to handle endianness in the formats that don’t have specify versions for each architecture already, as well as apply endianness to groups.
Before you think about the < and > modifiers, consider those that already specify the endianness. The n and N formats specify an unsigned short or long in “network order”, which is big-endian. The v and V formats specify the same things, but in “VAX order”, which is little endian.
Here’s a test program which takes some bytes, which you specify in a string using the hex representation of each charater (just like pack would). Once you have the string, you use both N and V to unpack that, finding out which one works on your system. The L format always does it using the local architecture:
use 5.010;
my $string = "\xAA\xBB\xCC\xDD";
foreach my $format ( qw(N V) ) {
my $number = unpack $format, $string;
say sprintf "%s is 0x%X", $format, $number;
say "Your native format is $format" if $number == pack 'L', $string;
}
The output shows that the little-endian order switches the bytes around, and that this program ran on a little-endian machine (in this case, a MacBook Air, which uses Intel processors):
N is 0xAABBCCDD V is 0xDDCCBBAA Your native format is V
For those, you need to know which order you have, either by knowing the architecture or getting the producer of the data to tell you the format. For instance, UTF-16 text files can have a byte order mark, 0xFEFF; that’s a short integer (two bytes). If you are using a big endian machine, when you read that short you get 0xFEFF. If you are using a little endian machine, you get 0xFFFE because it switches the bytes around as you saw before.
The other pack formats use the native format so you haven’t had a way to specify which order to interpret the bytes. These formats have always used the native architecture (meaning they will get it wrong on the other architecture):
| Format | Description |
|---|---|
| s, S | signed and unsigned shorts (two bytes) |
| i, I | signed and unsigned integers (at least four bytes) |
| l, L | signed and unsigned longs |
| q, Q | signed and unsigned quads (if you have a 64-bit perl) |
| j, J | signed and unsigned Perl internal integers |
| f | single-precision floating-point value |
| d | double-precision floating-point value |
| F | Perl internal floating−point value |
| D | long-double-precision floating-point value |
| p, P | pointers to a null-terminated string and a structure |
Perl 5.10 let’s you specify the architecture these formats should use. You can use big-endian values even if you are using a little-endian machine. Suppose you have π encoded as a single-precision floating point value in big-endian even though you have a little-endian machine. The native format
use 5.010;
my $pi_string = "\x40\x49\x0F\xDA"; # 3.14159250259399 in big-endians
foreach my $format ( qw(f f< f>) ) {
my $number = unpack $format, $pi_string;
say sprintf "%s is %f", $format, $number;
}
The f and f< give the non-π results. The f assumes the native, little-endian format while the f< makes it explicit. The f> specifies big-endian format despite the native architecture, and it gets the right value (with normal floating-point rounding error):
f is -10082865224089600.000000 f< is -10082865224089600.000000 f> is 3.141593
You can also apply these modifiers to groups so that all of the modifiable formats in that group. This example tries combinations of unsigned shorts in either format:
use 5.010;
my $string = "\xAA\xBB\xCC\xDD";
foreach my $format ( qw| SS S<S> S>S< (SS)> (SS)< | ) {
my( $first, $second ) = unpack $format, $string;
say sprintf "%5s is 0x%X 0x%X", $format, $first, $second;
}
The output shows you show the S format changes based on which architecture you tell pack to use:
SS is 0xBBAA 0xDDCC S<S> is 0xBBAA 0xCCDD S>S< is 0xAABB 0xDDCC (SS)> is 0xAABB 0xCCDD (SS)< is 0xBBAA 0xDDCC
You still have to know which architecture your data are in, but at least you can tell Perl which format you want.
Things to remember
- Most
packformats rely on the native architecture - Perl 5.10 introduces the
<and>modifiers
so you can specify the architecture - The
<specifies little-endian because the little side touches the specifier - The
>specifies big-endian because the big side touches the specifier