Monday, July 7, 2008

Protocol Buffers, our serialized structured data, released as Open Source



One of the core pieces of infrastructure at Google is something called Protocol Buffers. We are really pleased to be open sourcing the system, but what are these buffers?
Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages. You can even update your data structure without breaking deployed programs that are compiled against the "old" format
It is probably best to take a peak at some code behind this. The first thing you need to do is define a message type, which can look like the following .proto file:
message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;

enum PhoneType {
MOBILE = 0;
HOME = 1;
WORK = 2;
}

message PhoneNumber {
required string number = 1;
optional PhoneType type = 2 [default = HOME];
}

repeated PhoneNumber phone = 4;
}
There is detailed documentation on this language for you to learn more.

Once you have defined a message type, you run a protocol buffer compiler on the file to create data access classes for your platform of choice (Java, C++, Python in this release).

Then you can easily work with the data, for example in C++:
Person person;
person.set_name("John Doe");
person.set_id(1234);
person.set_email("jdoe@example.com");
fstream output("myfile", ios::out | ios::binary);
person.SerializeToOstream(&output);
We sat down with Kenton Varda, a software engineer who worked on the open source effort, to get his take on Protocol Buffers, how we ended up with them, how they compare to other solutions, and more:

No comments:

Post a Comment