Jump to content

UBJSON: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
No edit summary
Importing Wikidata short description: "Data serialization format"
 
(33 intermediate revisions by 15 users not shown)
Line 1: Line 1:
{{Short description|Data serialization format}}
{{notability|Products|date=March 2014}}
{{unreferenced|date=March 2014}}
{{refimprove|date=October 2019}}

{{Infobox software
{{Infobox software
| name = UBJSON
| name = UBJSON
| author = The Buzz Media, LLC
| author = Riyad Kalla
| genre = Data interchange
| genre = Data interchange
| license = [http://ubjson.org/#license Apache 2.0]
| license = [http://ubjson.org/#license Apache 2.0]
Line 9: Line 10:
| operating_system = Any
| operating_system = Any
| platform = [[Cross-platform]]
| platform = [[Cross-platform]]
| latest_release_version = Draft 12
| status = Active
| latest_release_version = Draft 9
| website = {{URL|http://ubjson.org/}}
| website = {{URL|http://ubjson.org/}}
}}
}}
Line 17: Line 17:


==Rationale and Objectives==
==Rationale and Objectives==
UBJSON is a proposed successor to [[BSON]], [[BJSON]] and others, all of which may have one or both of the following problems:
UBJSON is a proposed successor to [[BSON]], [[BJSON]] and others. UBJSON has the following goals:


* Complete compatibility with the JSON specification there is a 1:1 mapping between standard JSON and UBJSON.
* Extra data types have been included that have no equivalent in the original JSON specification, leaving room for incompatibilities between implementations.
* Ease of implementation only including data types that are widely supported in popular programming languages so that there are no problems with certain languages not being supported well.
* Higher performance or smaller representations have been achieved at the cost of extra complexity, making implementations more difficult and impeding adoption.
* Ease of use it can be quickly understood and adopted.
* Speed and efficiency UBJSON uses data representations that are (roughly) 30% smaller than their compacted JSON counterparts and are optimized for fast parsing. Streamed serialisation is supported, meaning that the transfer of UBJSON over a network connection can start sending data before the final size of the data is known.


==Data types and syntax==
UBJSON has these goals:
UBJSON data can be either a ''value'' or a ''container''.


===Value types===
* Complete compatibility with the JSON specification - there is a 1:1 mapping between standard JSON and UBJSON.
UBJSON uses a single binary tuple to represent all JSON value types:<ref name="ValueTypesSpecification">{{cite web|accessdate=20 July 2019|title=Value Types &#124; Universal Binary JSON Specification|url=http://ubjson.org/type-reference/value-types/}}</ref>


type [length] [data]
* Ease of implementation - only including data types that are widely supported in popular programming languages so that there are no problems with certain languages not being supported well.


Each element in the tuple is defined as:
* Ease of use - it can be quickly understood and adopted.


====type====
* Speed and efficiency - UBJSON uses data representations that are (roughly) 30% smaller than their compacted JSON counterparts and are optimized for fast parsing. Streamed serialisation is supported, meaning that the transfer of UBJSON over a network connection can start sending data before the final size of the data is known.
The type is a 1-byte [[ASCII]] character used to indicate the type of the data following it. The ASCII characters were chosen to make manually walking and debugging data stored in the UBJSON format as easy as possible (e.g. making the data relatively readable in a hex editor). Types are available for the five JSON value types. There is also a [[no-op]] type used for stream keep-alive.


* [[Nullable type|Null]]: <code>Z</code>
==Data types and syntax==
* [[No-op]]: <code>N</code> - no operation, to be ignored by the receiving end
The current (incomplete) specification of UBJSON can be summarised as follows.
* [[Boolean algebra|Boolean types]]: true (<code>T</code>) and false (<code>F</code>)
* Numeric types: [[8-bit|int8]] (<code>i</code>), uint8 (<code>U</code>), [[16-bit|int16]] (<code>I</code>), [[32-bit|int32]] (<code>l</code>), [[64-bit|int64]] (<code>L</code>), [[Single-precision floating-point format|float32]] (<code>d</code>), [[Double-precision floating-point format|float64]] (<code>D</code>), and [[Arbitrary-precision arithmetic|high-precision]] (<code>H</code>)
* [[ASCII]] character: <code>C</code>
* [[UTF-8]] string: <code>S</code>


High-precision numbers are represented as an arbitrarily long, UTF-8 string-encoded numeric value.
UBJSON uses a single binary tuple to represent all JSON data types (both value and container types):


type [length] [data]
====length (optional)====
The length is an integer number (e.g. uint8, or int64) encoding the size of the data payload in bytes. It is used for strings, high-precision numbers and optionally containers. They are omitted for other types.


Length is encoded following the same convention as integers, thus including its own type. For example, the string <code>hello</code> is encoded as <code>S</code>,<code>U</code>,0x05,<code>h</code>,<code>e</code>,<code>l</code>,<code>l</code>,<code>o</code>.
Each element in the tuple is defined as:


===type===
====data (optional)====
A sequence of bytes representing the actual [[binary data]] for this type of value. All numbers are in [[big-endian]] order.
The type is a 1-byte [[ASCII]] character used to indicate the type of the data following it. The ASCII characters were chosen to make manually walking and debugging data stored in the UBJSON format as easy as possible (e.g. making the data relatively readable in a hex editor). Types are available for the five JSON value types and the two JSON container types. There is also a no-op (used for stream keep-alive) and an end-of-container marker, used when a container of (as yet) unknown size had previously been started.


===Container types===
* string: s or S
Similarly to JSON, UBJSON defines two container types: [[Array data type|''array'']] and [[Associative array|''object'']].<ref name="ContainerTypesSpecification">{{cite web|accessdate=20 July 2019|title=Container Types &#124; Universal Binary JSON Specification|url=http://ubjson.org/type-reference/container-types/}}</ref>
* number: B, i, I, L, d, D, h or H - there are seven specialisations: byte (B), int16 (i), int32 (I), int64 (L), float (d), double (D), huge (H)
* true: T
* false: F
* null: Z
* object container: o or O
* array container: a or A
* no-op: N - no operation, to be ignored by the receiving end
* end of container: E


Arrays are ordered sequences of elements, represented as a <code>[</code> followed by zero or more elements of value and container type and a trailing <code>]</code>.
Huge numbers are represented as an arbitrarily long, UTF-8 string-encoded numeric value.


Objects are labeled sets of elements, represented as a <code>{</code> followed by zero or more key-value pairs and a trailing <code>}</code>. Each key is a string with the <code>S</code> character omitted, and each "value" can be any element of value or container type.
===length (optional)===
The length is a 1-byte or 4-byte value based on the type specified. These are used for strings, huge numbers and container/array blocks. They are omitted for other types.


Alternatively, arrays and objects may indicate the number of elements they contain as <code>#</code> followed by an integer number before their first element, in which case the trailing <code>]</code> or <code>}</code> is omitted. Additionally, if all elements have the same type, the types can be omitted and replaced by a single <code>$</code> followed by the type, in which case the element count must follow immediately. For example, the array <nowiki>["a","b","c"]</nowiki> may be represented as <code>[</code>,<code>$</code>,<code>C</code>,<code>#</code>,<code>U</code>,0x03,<code>a</code>,<code>b</code>,<code>c</code>.
* 1-byte: An unsigned byte (0 to 254) indicating the length of the data payload following it, for small items.
* 1-byte: The byte value 255 indicating the container that follows has an (as yet) unknown size.
* 4-byte: An unsigned integer (0 to 2<sup>31</sup>-1) indicating the length of the data payload following it, for larger items.


===Binary data===
The 1 and 4 byte lengths are easily differentiated because lower-case type characters are used in the 1-byte case, otherwise upper-case type characters are used.
While there is no explicit binary type, binary data is stored in a [[Strong and weak typing|strongly typed]] array of uint8 values. This ensures binary efficiency while maintaining compatibility with JSON, even though JSON has no direct support for binary data.<ref name="BinaryDataSpecification">{{cite web|accessdate=20 July 2019|title=Binary Data &#124; Universal Binary JSON Specification|url=http://ubjson.org/type-reference/binary-data/}}</ref><ref name="Wolfram"/>

===data (optional)===
A sequence of bytes representing the actual binary data for this type of value. All numbers are sent in [[big-endian]] order.


==Representation==
==Representation==
The [[MIME type]] 'application/ubjson' is recommended, as is the file extension '.ubj' when stored in a file-system.
The [[MIME type]] 'application/ubjson' is recommended, as is the file extension '.ubj' when stored in a file-system.<ref name="Wolfram"/>

==Software support==
* [[Teradata]] Database<ref name="Teradata">{{cite web|accessdate=20 July 2019|title=UBJSON Storage Format|url=https://docs.teradata.com/reader/C8cVEJ54PO4~YXWXeXGvsA/b9kd0QOTMB3uZp9z5QT2aw}}</ref>
* The [[Wolfram Language]] introduced support for UBJSON in 2017, with version 11.1 of the language.<ref name="Wolfram">{{cite web|accessdate=20 July 2019|title=UBJSON (.ubj)&mdash;Wolfram Language Documentation|url=https://reference.wolfram.com/language/ref/format/UBJSON.html}}</ref>


==See also==
==See also==
* [[Comparison of data serialization formats]]
*[[JSON]]
*[[BSON]]
* [[JSON]]
*[[Protocol Buffers]]
* [[CBOR]]
* [[Smile_(data_interchange_format)|Smile]] (binary JSON)
*[[Apache Thrift]]
* [[Protocol Buffers]]
* [[Action Message Format]]
* [[Apache Thrift]]
* [[MessagePack]]
* [[Document-oriented database]] (e.g. [[MongoDB]])
* [[Abstract Syntax Notation One]] (ASN.1)
* [[Wireless Binary XML]] (WBXML)
* [[Efficient XML Interchange]]

==References==
{{Reflist}}


==External links==
==External links==
*{{Official website|ubjson.org}}
* {{Official website}}

{{Data Exchange}}


[[Category:Data serialization formats]]
[[Category:Data serialization formats]]

Latest revision as of 07:34, 16 January 2024

UBJSON
Original author(s)Riyad Kalla
Stable release
Draft 12
Written inVarious languages
Operating systemAny
PlatformCross-platform
TypeData interchange
LicenseApache 2.0
Websiteubjson.org

Universal Binary JSON (UBJSON) is a computer data interchange format. It is a binary form directly imitating JSON, but requiring fewer bytes of data. It aims to achieve the generality of JSON, combined with being much easier to process than JSON.

Rationale and Objectives[edit]

UBJSON is a proposed successor to BSON, BJSON and others. UBJSON has the following goals:

  • Complete compatibility with the JSON specification – there is a 1:1 mapping between standard JSON and UBJSON.
  • Ease of implementation – only including data types that are widely supported in popular programming languages so that there are no problems with certain languages not being supported well.
  • Ease of use – it can be quickly understood and adopted.
  • Speed and efficiency – UBJSON uses data representations that are (roughly) 30% smaller than their compacted JSON counterparts and are optimized for fast parsing. Streamed serialisation is supported, meaning that the transfer of UBJSON over a network connection can start sending data before the final size of the data is known.

Data types and syntax[edit]

UBJSON data can be either a value or a container.

Value types[edit]

UBJSON uses a single binary tuple to represent all JSON value types:[1]

   type [length] [data]

Each element in the tuple is defined as:

type[edit]

The type is a 1-byte ASCII character used to indicate the type of the data following it. The ASCII characters were chosen to make manually walking and debugging data stored in the UBJSON format as easy as possible (e.g. making the data relatively readable in a hex editor). Types are available for the five JSON value types. There is also a no-op type used for stream keep-alive.

High-precision numbers are represented as an arbitrarily long, UTF-8 string-encoded numeric value.

length (optional)[edit]

The length is an integer number (e.g. uint8, or int64) encoding the size of the data payload in bytes. It is used for strings, high-precision numbers and optionally containers. They are omitted for other types.

Length is encoded following the same convention as integers, thus including its own type. For example, the string hello is encoded as S,U,0x05,h,e,l,l,o.

data (optional)[edit]

A sequence of bytes representing the actual binary data for this type of value. All numbers are in big-endian order.

Container types[edit]

Similarly to JSON, UBJSON defines two container types: array and object.[2]

Arrays are ordered sequences of elements, represented as a [ followed by zero or more elements of value and container type and a trailing ].

Objects are labeled sets of elements, represented as a { followed by zero or more key-value pairs and a trailing }. Each key is a string with the S character omitted, and each "value" can be any element of value or container type.

Alternatively, arrays and objects may indicate the number of elements they contain as # followed by an integer number before their first element, in which case the trailing ] or } is omitted. Additionally, if all elements have the same type, the types can be omitted and replaced by a single $ followed by the type, in which case the element count must follow immediately. For example, the array ["a","b","c"] may be represented as [,$,C,#,U,0x03,a,b,c.

Binary data[edit]

While there is no explicit binary type, binary data is stored in a strongly typed array of uint8 values. This ensures binary efficiency while maintaining compatibility with JSON, even though JSON has no direct support for binary data.[3][4]

Representation[edit]

The MIME type 'application/ubjson' is recommended, as is the file extension '.ubj' when stored in a file-system.[4]

Software support[edit]

See also[edit]

References[edit]

  1. ^ "Value Types | Universal Binary JSON Specification". Retrieved 20 July 2019.
  2. ^ "Container Types | Universal Binary JSON Specification". Retrieved 20 July 2019.
  3. ^ "Binary Data | Universal Binary JSON Specification". Retrieved 20 July 2019.
  4. ^ a b c "UBJSON (.ubj)—Wolfram Language Documentation". Retrieved 20 July 2019.
  5. ^ "UBJSON Storage Format". Retrieved 20 July 2019.

External links[edit]