// Protocol Buffers - Google's data interchange format
// Copyright 2008 Google Inc.  All rights reserved.
// https://developers.google.com/protocol-buffers/
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
//     * Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//     * Redistributions in binary form must reproduce the above
// copyright notice, this list of conditions and the following disclaimer
// in the documentation and/or other materials provided with the
// distribution.
//     * Neither the name of Google Inc. nor the names of its
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

package com.google.protobuf;

RawMessageInfo stores the same amount of information as MessageInfo but in a more compact format.
/** * RawMessageInfo stores the same amount of information as {@link MessageInfo} but in a more compact * format. */
final class RawMessageInfo implements MessageInfo { private final MessageLite defaultInstance;
The compact format packs everything in a String object and a Object[] array. The String object is encoded with field number, field type, hasbits offset, oneof index, etc., whereas the Object[] array contains field references, class references, instance references, etc.

The String object encodes a sequence of integers into UTF-16 characters. For each int, it will be encoding into 1 to 3 UTF-16 characters depending on its unsigned value:

  • 1 char: [c1: 0x0000 - 0xD7FF] = int of the same value.
  • 2 chars: [c1: 0xE000 - 0xFFFF], [c2: 0x0000 - 0xD7FF] = (c2 << 13) | (c1 & 0x1FFF)
  • 3 chars: [c1: 0xE000 - 0xFFFF], [c2: 0xE000 - 0xFFFF], [c3: 0x0000 - 0xD7FF] = (c3 << 26) | ((c2 & 0x1FFF) << 13) | (c1 & 0x1FFF)

Note that we don't use UTF-16 surrogate pairs [0xD800 - 0xDFFF] because they have to come in pairs to form a valid UTF-16char sequence and don't help us encode values more efficiently.

The integer sequence encoded in the String object has the following layout:

  • [0]: flags, flags & 0x1 = is proto2?, flags & 0x2 = is message?.
  • [1]: field count, if 0, this is the end of the integer sequence and the corresponding Object[] array should be null.
  • [2]: oneof count
  • [3]: hasbits count, how many hasbits integers are generated.
  • [4]: min field number
  • [5]: max field number
  • [6]: total number of entries need to allocate
  • [7]: map field count
  • [8]: repeated field count, this doesn't include map fields.
  • [9]: size of checkInitialized array
  • [...]: field entries

Each field entry starts with a field number and the field type:

  • [0]: field number
  • [1]: field type with extra bits:
    • v & 0xFF = field type as defined in the FieldType class
    • v & 0x100 = is required?
    • v & 0x200 = is checkUtf8?
    • v & 0x400 = needs isInitialized check?
    • v & 0x800 = is map field with proto2 enum value?
If the file is proto2 and this is a singular field:
  • [2]: hasbits offset
If the field is in an oneof:
  • [2]: oenof index
For other types, the field entry only has field number and field type.

The Object[] array has 3 sections:

  • ---- oneof section ----
    • [0]: value field for oneof 1.
    • [1]: case field for oneof 1.
    • ...
    • [.]: value field for oneof n.
    • [.]: case field for oneof n.
  • ---- hasbits section ----
    • [.]: hasbits field 1
    • [.]: hasbits field 2
    • ...
    • [.]: hasbits field n
  • ---- field section ----
    • [...]: field entries

In the Object[] array, field entries are ordered in the same way as field entries in the String object. The size of each entry is determined by the field type.

  • Oneof field:
    • Oneof message field:
      • [0]: message class reference.
    • Oneof enum fieldin proto2:
      • [0]: EnumLiteMap
    • For all other oneof fields, field entry in the Object[] array is empty.
  • Repeated message field:
    • [0]: field reference
    • [1]: message class reference
  • Proto2 singular/repeated enum field:
    • [0]: field reference
    • [1]: EnumLiteMap
  • Map field with a proto2 enum value:
    • [0]: field reference
    • [1]: map default entry instance
    • [2]: EnumLiteMap
  • Map field with other value types:
    • [0]: field reference
    • [1]: map default entry instance
  • All other field type:
    • [0]: field reference

In order to read the field info from this compact format, a reader needs to progress through the String object and the Object[] array simultaneously.

/** * The compact format packs everything in a String object and a Object[] array. The String object * is encoded with field number, field type, hasbits offset, oneof index, etc., whereas the * Object[] array contains field references, class references, instance references, etc. * * <p>The String object encodes a sequence of integers into UTF-16 characters. For each int, it * will be encoding into 1 to 3 UTF-16 characters depending on its unsigned value: * * <ul> * <li>1 char: [c1: 0x0000 - 0xD7FF] = int of the same value. * <li>2 chars: [c1: 0xE000 - 0xFFFF], [c2: 0x0000 - 0xD7FF] = (c2 << 13) | (c1 & 0x1FFF) * <li>3 chars: [c1: 0xE000 - 0xFFFF], [c2: 0xE000 - 0xFFFF], [c3: 0x0000 - 0xD7FF] = (c3 << 26) * | ((c2 & 0x1FFF) << 13) | (c1 & 0x1FFF) * </ul> * * <p>Note that we don't use UTF-16 surrogate pairs [0xD800 - 0xDFFF] because they have to come in * pairs to form a valid UTF-16char sequence and don't help us encode values more efficiently. * * <p>The integer sequence encoded in the String object has the following layout: * * <ul> * <li>[0]: flags, flags & 0x1 = is proto2?, flags & 0x2 = is message?. * <li>[1]: field count, if 0, this is the end of the integer sequence and the corresponding * Object[] array should be null. * <li>[2]: oneof count * <li>[3]: hasbits count, how many hasbits integers are generated. * <li>[4]: min field number * <li>[5]: max field number * <li>[6]: total number of entries need to allocate * <li>[7]: map field count * <li>[8]: repeated field count, this doesn't include map fields. * <li>[9]: size of checkInitialized array * <li>[...]: field entries * </ul> * * <p>Each field entry starts with a field number and the field type: * * <ul> * <li>[0]: field number * <li>[1]: field type with extra bits: * <ul> * <li>v & 0xFF = field type as defined in the FieldType class * <li>v & 0x100 = is required? * <li>v & 0x200 = is checkUtf8? * <li>v & 0x400 = needs isInitialized check? * <li>v & 0x800 = is map field with proto2 enum value? * </ul> * </ul> * * If the file is proto2 and this is a singular field: * * <ul> * <li>[2]: hasbits offset * </ul> * * If the field is in an oneof: * * <ul> * <li>[2]: oenof index * </ul> * * For other types, the field entry only has field number and field type. * * <p>The Object[] array has 3 sections: * * <ul> * <li>---- oneof section ---- * <ul> * <li>[0]: value field for oneof 1. * <li>[1]: case field for oneof 1. * <li>... * <li>[.]: value field for oneof n. * <li>[.]: case field for oneof n. * </ul> * <li>---- hasbits section ---- * <ul> * <li>[.]: hasbits field 1 * <li>[.]: hasbits field 2 * <li>... * <li>[.]: hasbits field n * </ul> * <li>---- field section ---- * <ul> * <li>[...]: field entries * </ul> * </ul> * * <p>In the Object[] array, field entries are ordered in the same way as field entries in the * String object. The size of each entry is determined by the field type. * * <ul> * <li>Oneof field: * <ul> * <li>Oneof message field: * <ul> * <li>[0]: message class reference. * </ul> * <li>Oneof enum fieldin proto2: * <ul> * <li>[0]: EnumLiteMap * </ul> * <li>For all other oneof fields, field entry in the Object[] array is empty. * </ul> * <li>Repeated message field: * <ul> * <li>[0]: field reference * <li>[1]: message class reference * </ul> * <li>Proto2 singular/repeated enum field: * <ul> * <li>[0]: field reference * <li>[1]: EnumLiteMap * </ul> * <li>Map field with a proto2 enum value: * <ul> * <li>[0]: field reference * <li>[1]: map default entry instance * <li>[2]: EnumLiteMap * </ul> * <li>Map field with other value types: * <ul> * <li>[0]: field reference * <li>[1]: map default entry instance * </ul> * <li>All other field type: * <ul> * <li>[0]: field reference * </ul> * </ul> * * <p>In order to read the field info from this compact format, a reader needs to progress through * the String object and the Object[] array simultaneously. */
private final String info; private final Object[] objects; private final int flags; RawMessageInfo(MessageLite defaultInstance, String info, Object[] objects) { this.defaultInstance = defaultInstance; this.info = info; this.objects = objects; int position = 0; int value = (int) info.charAt(position++); if (value < 0xD800) { flags = value; } else { int result = value & 0x1FFF; int shift = 13; while ((value = info.charAt(position++)) >= 0xD800) { result |= (value & 0x1FFF) << shift; shift += 13; } flags = result | (value << shift); } } String getStringInfo() { return info; } Object[] getObjects() { return objects; } @Override public MessageLite getDefaultInstance() { return defaultInstance; } @Override public ProtoSyntax getSyntax() { return (flags & 0x1) == 0x1 ? ProtoSyntax.PROTO2 : ProtoSyntax.PROTO3; } @Override public boolean isMessageSetWireFormat() { return (flags & 0x2) == 0x2; } }