How to find a file’s image type by looking at its bytes (in Java)

Byte arrays and byte streams get used a lot in java, but it’s rare to do anything with individual bytes. Many file types start with certain bytes, though, and can be quickly identified by them.

In my case, I had a byte array, extracted from a file with the guava library’s Files.toByteArray(file), and wanted to make sure it was a jpeg before sending it down to a flex front-end. Jpegs start with the two bytes, 0xFFD8. So I printed out the first two bytes in my array and found… -1 and -40.

Java stores the byte type signed, that is, -128 to 127, rather than 0 to 255. So what do the values of -1 and -40 mean? Java uses 2′s complement for its negative numbers. Take the number -1. 1 is 0000 0001 in binary, so to calculate its negative, -1, with 2′s complement, you invert  0000 0001, getting 1111 1110, then increment by 1, yielding 11111111. That’s the binary value represented by a -1 java byte.

In your code, to get a positive int from a 2′s complement negative byte, you can do a bitwise & with 0xFF (1111 1111):

int unsignedByte = myByte & 0xFF;

So, if you had, say, -3, you would take the value of -3 (3 in binary is 0000 0011, invert to get 1111 1100, then increment + 1 for 1111 1101) and evaluate it with & 0xFF (in binary, 11111111). The bitwise & operator evalutes the bits like:

1111 1101
1111 1111
—– —-
1111 1101

The java byte value -3 equals 1111 1101, or 253 in decimal. If you try to simply cast your byte to an int

int wrongByte = (int)b;

It will just convert it straight from a -3 byte to a -3 int.

Going back to the bytes I needed, a -1 java byte equals 1111 1111, or 255 in decimal. For the second byte of my file, I got -40, which is 1101 1000, or 216. The simplest, laziest way to convert an int to a hex string in java is probably Integer.toHexString, which gives us 0xFF and 0xD8 for 255 and 216. So my test file was, in fact, a jpeg.

This entry was posted in java, software and tagged . Bookmark the permalink.

6 Responses to How to find a file’s image type by looking at its bytes (in Java)

  1. Do you know ImageIO can do that ?
    Moreover, if the image is remote, it only downloads the first bytes to tell you if it’s a jpeg.

    The bad thing is you don’t play with bits, and you seems to like it.

    • tborthwick says:

      Sure, with ImageIO, you can do something like

      ImageInputStream iis = ImageIO.createImageInputStream(imageFile);
      Iterator iter = ImageIO.getImageReaders(iis);
      if (iter.hasNext()) {
      ImageReader reader = (ImageReader)iter.next();
      reader.setInput(iis);
      BufferedImage image = reader.read(0);
      String formatName = reader.getFormatName();

      And then look at formatName. There’s probably a more elegant way to do it with ImageIO but I think that’ll work.

  2. Mr President says:

    There is a Java based document type validation framework JHove . You can explore that.

  3. Pingback: Java로 파일의 바이트 값을 이용해 포맷 알아내는 방법 « turtle9

  4. Pingback: How to find an image MIME type using JDK API | Moises Trovo

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>