My Technical Explorations: February 2012

Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards. The design goals of XML emphasize simplicity, generality, and usability over the Internet. It is a textual data format with strong support via Unicode for the languages of the world. Although the design of XML focuses on documents, it is widely used for the representation of arbitrary data structures, for example in web services. Many application programming interfaces (APIs) have been developed for software developers to use to process XML data, and several schema systems exist to aid in the definition of XML-based languages.

Processor and Application

The processor analyzes the markup and passes structured information to an application. The specification places requirements on what an XML processor must do and not do, but the application is outside its scope. The processor (as the specification calls it) is often referred to as an XML parser.
As the document is parsed, the data in the document becomes available to the application using the parser, and suddenly we are within an XML-aware application!
An XML parser converts an XML document into an XML DOM object - which can then be manipulated with JavaScript.
In web applications over XMLHttpRequest, communication is permitted only to the same origin that provide that page, so it is trusted. But it might not be well qualified. If the server is not strict in its JSON encoding, or if it does not exactly validate all of its inputs, then it could deliver invalid JSON text that could be carrying dangerous script. The malicious function would execute the script, unleashing its malice. To defend against this, a JSON parser should be used. A JSON parser will recognize only JSON text, rejecting all scripts.

XML parser Both SAX and DOM are used to parse the XML document. Both has advantages and disadvantages and can be used in our programming depending on the situation.

SAX • Parses node by node
• Doesn’t store the XML in memory
• We cant insert or delete a node
• SAX is an event based parser
• SAX is a Simple API for XML
• Doesn’t preserve comments
• SAX generally runs a little faster than DOM

DOM • Stores the entire XML document into memory before processing
• Occupies more memory
• We Can insert or delete nodes
• Traverse in any direction.
• DOM is a tree model parser
• Document Object Model (DOM) API
• Preserves comments
• SAX generally runs a little faster than DOM

If we need to find a node and doesn’t need to insert or delete we can go with SAX itself otherwise DOM provided we have more memory.

JavaScript Object Notation (JSON) is a lightweight text-based open standard designed for human-readable data interchange. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming When working together with "jQuery" and "ASP.NET MVC" in building web applications, it provides an efficient mechanism to exchange data between the web browser and the web server.

At the browser side, the data is stored and manipulated as "JavaScript" "JSON objects". At the server side, if ASP.NET MVC is used, the data is stored and manipulated as ".NET objects".
• When the browser loads data from the server, the .NET objects need to be "serialized" into "JSON strings" and passed to the browser. The browser will "de-serialize" the "JSON strings" into easy to use JavaScript "JSONobjects".
• When the browser sends data to the server, the "JSON objects" need to be "serialized" into "JSON strings" and then "de-serialized" into ".NET objects" at the server side.

• Comparing JSON to XML • Both JSON and XML can be used to represent native, in-memory objects in a text-based, human-readable, data exchange format. Furthermore, the two data exchange formats are isomorphic—given text in one format, an equivalent one is conceivable in the other. For example, when calling one of Yahoo!'s publicly accessible web services, you can indicate via a querystring parameter whether the response should be formatted as XML or JSON. Therefore, when deciding upon a data exchange format, it's not a simple matter of choosing one over the other as a silver bullet, but rather what format has the characteristics that make it the best choice for a particular application. For example, XML has its roots in marking-up document text and tends to shine very well in that space (as is evident with XHTML). JSON, on the other hand, has its roots in programming language types and structures and therefore provides a more natural and readily available mapping to exchange structured data. Beyond these two starting points, the following table will help you to understand and compare the key characteristics of XML and JSON.

• Key Characteristic Differences between XML and JSON
Differences between XML and JSON

Characteristic	XML	JSON
Data types	Does not provide any notion of data types. One must rely on XML Schema for adding type information.	Provides scalar data types and the ability to express structured data through arrays and objects.
Support for arrays	Arrays have to be expressed by conventions, for example through the use of an outer placeholder element that models the arrays contents as inner elements. Typically, the outer element uses the plural form of the name used for inner elements.	Native array support.
Support for objects	Objects have to be expressed by conventions, often through a mixed use of attributes and elements.	Native object support.
Null support	Requires use of xsi:nil on elements in an XML instance document plus an import of the corresponding namespace.	Natively recognizes the null value.
Comments	Native support and usually available through APIs.	Not supported.
Namespaces	Supports namespaces, which eliminates the risk of name collisions when combining documents. Namespaces also allow existing XML-based standards to be safely extended.	No concept of namespaces. Naming collisions are usually avoided by nesting objects or using a prefix in an object member name (the former is preferred in practice).
Formatting decisions	Complex. Requires a greater effort to decide how to map application types to XML elements and attributes. Can create heated debates whether an element-centric or attribute-centric approach is better.	Simple. Provides a much more direct mapping for application data. The only exception may be the absence of date/time literal.
Size	Documents tend to be lengthy in size, especially when an element-centric approach to formatting is used.	Syntax is very terse and yields formatted text where most of the space is consumed (rightly so) by the represented data.
Parsing in JavaScript	Requires an XML DOM implementation and additional application code to map text back into JavaScript objects.	No additional application code required to parse text; can use JavaScript's eval function.
Learning curve	Generally tends to require use of several technologies in concert: XPath, XML Schema, XSLT, XML Namespaces, the DOM, and so on.	Very simple technology stack that is already familiar to developers with a background in JavaScript or other dynamic programming languages.

JSON is a relatively new data exchange format and does not have the years of adoption or vendor support that XML enjoys today (although JSON is catching up quickly). The following table highlights the current state of affairs in the XML and JSON spaces.

Support Differences between XML and JSON

Support	XML	JSON
Tools	Enjoys a mature set of tools widely available from many industry vendors.	Rich tool support—such as editors and formatters—is scarce.
Microsoft .NET Framework	Very good and mature support since version 1.0 of the .NET Framework. XML support is available as part of the Base Class Library (BCL). For unmanaged environments, there is MSXML.	None so far, except an initial implementation as part of ASP.NET AJAX.
Platform and language	Parsers and formatters are widely available on many platforms and languages (commercial and open source implementations).	Parsers and formatters are available already on many platforms and in many languages. Consult json.org for a good set of references. Most implementations for now tend to be open source projects.
Integrated language	Industry vendors are currently experimenting with support literally within languages. See Microsoft's LINQ project for more information.	Is natively supported in JavaScript/ECMAScript only.

• Note Neither table is meant to be a comprehensive list of comparison points. There are further angles on which both data formats can be compared, but we felt that these key points should be sufficient to build an initial impression.

Jackson is a:

• Streaming (reading, writing)
• FAST (measured to be faster than any other Java json parser and data binder)
• Powerful (full data binding for common JDK classes as well as any Java bean class, Collection, Map or Enum)
• Zero-dependency (does not rely on other packages beyond JDK)
• Open Source (LGPL or AL)
• Fully conformant
• Extremely configurable

JSON processor (JSON parser + JSON generator) written in Java. Beyond basic JSON reading/writing (parsing, generating), it also offers full node-based Tree Model, as well as full OJM (Object/Json Mapper) data binding functionality. It’s a streaming API, as well as a DOM like Tree Model and JAXB style object bindings. Jackson offers a range of parsing approaches that allows developers to choose the right approach to the task at hand

Yet Another JSON Library. YAJL is a small event-driven (SAX-style) JSON parser written in ANSI C, and a small validating JSON generator. YAJL is released under the ISC license. Features Simple Interface Largely because YAJL is event driven, the interface is very concise object oriented C. The interface is not cluttered with data representation, that bit is left up to higher level code. Indeed it should be possible to port most existing JSON libraries onto YAJL if so desired. portable It's all ANSI C. It's been successfully compiled on debian linux, OSX 10.4 i386 & ppc, OSX 10.5 i386, winXP, FreeBSD 4.10, FreeBSD 6.1 amd64, FreeBSD 7 i386, and windows vista. More platforms and binaries as time permits. Stream parsing YAJL remembers all state required to support restarting parsing. This allows parsing to occur incrementally as data is read off a disk or network. Fast A second motivation for writing YAJL, was that many available free JSON parsers fall over on large or complex inputs. YAJL is careful to minimize memory copying and input re-scanning when possible. The result is a parser that should be fast enough for most applications or tunable for any application. On my mac pro (2.66 ghz) it takes 1s to verify a 60meg json file. Minimizing that same file with json_reformat takes 4s. Low resource consumption Largely because YAJL deals with streams, it's possible to parse JSON in low memory environments. Oftentimes with other parsers an application must hold both the input text and the memory representation of the tree in memory at one time. With YAJL you can incrementally read the input stream and hold only the in memory representation. Or for filtering or validation tasks, it's not required to hold the entire input text in memory.

Gson is a Java library that can be used to convert Java Objects into their JSON representation. It can also be used to convert a JSON string to an equivalent Java object. Gson is an open-source project hosted athttp://code.google.com/p/google-gson. Gson can work with arbitrary Java objects including pre-existing objects that you do not have source-code of. Goals for Gson • Provide easy to use mechanisms like toString() and constructor (factory method) to convert Java to JSON and vice-versa
• Allow pre-existing unmodifiable objects to be converted to and from JSON
• Allow custom representations for objects
• Support arbitrarily complex object
• Generate compact and readability JSON output

FlexJson
There are a few catches when using this library. In order to serialize POJO right you have to exclude serialization of a class property deep-wise and include serialization of collections as they are not serialized by default. Library has neither problems with Date handling nor with using generated Groovy classes. I haven’t noticed any support for resolution of circular references. Library is easily reachable in maven repos. Main advantage of this library according authors is ability to pick and choose specific properties and structures that should be converted to/from JSON.

JSON-Lib
Very configurable, but has some glitches – for example deserializing Dates doesn’t work out of the box and you have to provide type handlers. It has several strategies how to cope with circular referencies. Very good documentation. Integration with Groovy. Easily reachable from maven repos (but beware you have to provide classifier=jdk15). This library burned me at the production – as you can see it has really bad performance stats.

JsonMarshaller
It has problems with serialization / deserialization of Groovy objects (throws exception regarding ASM). It has no Date support built-in. Requires you to place annotations into your model (or DTO) classes what might be rather uncomfortable (and maybe unacceptable) in some cases. Documentation is quite poor. It is not placed in Maven repos.

Json Smart
Very simplistic and small library – POJO deserialization comes first in version 2, which is currently beta. Almost nothing is configurable, documentation is poor. Library is not reachable in Maven repos. Currently not possible to deserialize Dates, more than that there is no configurable option to add custom type handler, so that deserialization of object containing dates is not possible at all.

Protostuff – JSON
Powerful library requiring rather complicated setup when not using RuntimeSchema generator. In standard setup I believe library is used to do much more stuff than I’ve used it for. JSON transformations are just piece of work it can do (it can convert to YAML, XML and more). It had no problems with Dates and Groovy classes. Library is accessible in Maven repos.

XStream
Library formerly used to serialize and deserialize to / from XML internally using Jettison to transfrom data to / from JSON. Easy to use, highly customizable. Supports resolution of circular references. Library handles Dates out of the box, has no problem with Groovy classes and is placed in Maven repos.

Jackson and Gson are the most complete Java JSON packages regarding actual data binding support; many other packages only provide primitive Map/List (or equivalent tree model) binding. Both have complete support for generic types, as well, as enough configurability for many common use cases.
Since I am more familiar with Jackson, here are some aspects where I think Jackson has more complete support than Gson (apologies if I miss a Gson feature):
• Extensive annotation support; including full inheritance, and advanced "mix-in" annotations (associate annotations with a class for cases where you can not directly add them)
• Streaming (incremental) reading, writing, for ultra-high performance (or memory-limited) use cases; can mix with data binding (bind sub-trees)
• Tree model (DOM-like access); can convert between various models (tree <-> java object <-> stream)
• Can use any constructors (or static factory methods), not just default constructor
• Field and getter/setter access (earlier gson versions only used fields, this may have changed)
• Out-of-box JAX-RS support
• Interoperability: can also use JAXB annotations, has support/work-arounds for common packages (joda, ibatis, cglib), JVM languages (groovy, clojure, scala)
• Ability to force static (declared) type handling for output
• Support for deserializing polymorphic types (Jackson 1.5) -- can serialize AND deserialize things like List correctly (with additional type information
• Integrated support for binary content (base64 to/from JSON Strings)

Above content is shared from following
http://www.codeproject.com/KB/ajax/JsonMVC.aspx?display=Print
http://www.json.org/
http://jackson.codehaus.org/
http://msdn.microsoft.com/en-us/library/bb299886.aspx
http://lloyd.github.com/yajl/ https://sites.google.com/site/gson/gson-user-guide#TOC-Overview http://en.wikipedia.org/wiki/XML

For more statistic information
http://blog.novoj.net/2012/02/05/json-java-parsers-generators-microbenchmark/
http://www.cowtowncoder.com/blog/archives/cat_performance.html
http://zoomquiet.org/res/scrapbook/ZqFLOSS/data/20091111011019/index.html

My Technical Explorations

Sunday, February 12, 2012

XML vs JSON