Design in PostgreSQL, document-oriented API (Part 1)

This article is a translation, the original article is here here, author of Rob Conery.

Postgres as many of you know, supports a JSON as the storage type of the data, and with the release of 9.4, Postgres now supports storing JSON of jsonb — binary format.

That's great news for those who want to step beyond simply "storing JSON as text". jsonb now supports indexing using a GIN index, and also has a special query operator that allows you to get the benefits of GIN index.

Who cares?

it Was fun to discover jsonb in Postgres and see what he can do. That, in its own way, is a problem: it is only familiarity and reflection to accomplish some work, this is not enough.

Refers to the fact that other systems (such as RethinkDB) has a huge, already built-in functionality to help You save documents, send queries to these documents and to perform optimization. Postgres also, there are some interesting possibilities in this direction, but writing the queries out of the box a tiny bit... not enough to be honest.

Look at this query:

the

select document_field -> 'my_key' from my_docs
where document_field @> '{"some_key" : "some_value"}';

He slightly reveals the strangeness of the moment when it comes to JSON and Postgres: it's all strings. Obviously, SQL is not able to recognize JSON, so you'll have to format it as a string. That in turn means that working with JSON directly in SQL is a pain. Of course, if you have a good means of making queries, the problem is simplified to a certain extent... but it still exists.

Moreover, storing of the document rather freely. To use one field that is jsonb? Or more fields in the larger table structure? It all depends on You, which, of course, good, but too much freedom of choice can also be paralyzing factor.

So why worry about it? If You want to use a document-oriented database, use the document-oriented database. I agree with this... but there is one really compelling reason to use Postgres (at least for me)...

Postgres ACID-compliant. So you can count that she will take Your information and, likely, not lost.

In addition, Postgres is a relational database, which in turn means that if you want eventually to move to more rigorous scheme possible. There are a number of reasons why you may want to choose Postgres, at the same time, suppose that the choice is made and it's time to start working with documents and jsonb.

API

As for me, I would like to see more features that support the idea of working with documents. At the moment we have built-in tools that allow you to deal with JSON types, but nothing that supports a higher level of abstraction.

This does not mean that we cannot build such an API with your own hands... Like I did. Begins...

Document-oriented table

I want to store documents in a table that contains meta-data, as well as additional ways of working with the information, namely full-text search (Full Text Search).

The structure of the table may vary — why don't we build this abstraction! Let's start with this:

the

create table my_docs(
id serial primary key,
body jsonb not null,
search tsvector,
created_at timestamptz not null default now(),

)

There will be some duplication. The document itself will be stored in the field body, including the id, which, in turn, is stored as the primary key (this is necessary because it's still Postgres). I'm using duplication, however, for the following reasons:

the

This API belongs to me and I can be sure that everything is synchronized the

it is done in a document-oriented systems

Saving a document

What I'd like from save_document options...

the

to Create tables on the fly
to Create proper indexes
to Create timestamp's and in the search box (full text index)

This can be achieved by making your own save_document and, for fun, I'll use PLV8 — the javascript inside the database. In fact I will create two functions — one is a specific way to create my table, the other will retain the document.

First, create_document_table:

the

create function create_document_table(name varchar, out boolean)
as $$
var sql = "create table" + name + "(" + 
"id serial primary key," + 
"body <b>jsonb</b> not null," + 
"search tsvector," + 
"created_at timestamptz default now() not null," + 
"updated_at timestamptz default now() not null);";

plv8.execute(sql);
plv8.execute("create index idx_" + name + " on docs using GIN(body <b>jsonb</b>_path_ops)");
plv8.execute("create index idx_" + name + "_search on docs using GIN(search)");
return true;
$$ language plv8;

This function creates a table and appropriate indexes — one for jsonb field in our document-oriented table, the other is for tsvector full-text index. Please note that I build the SQL string on the fly and perform with the help of plv8 — this is how we should behave with javascript in the Postgres.

Next, let's create our save_document function:

the

create function save_document(tbl varchar, doc_string jsonb)
returns jsonb
as $$
var doc = JSON.parse(doc_string);
var result = null;
var id = doc.id;
var exists = plv8.execute("select table_name from information_schema.tables where table_name = $1", tbl)[0];

if(!exists){
plv8.execute("select create_document_table('" + tbl + "');");
}

if(id){
result = plv8.execute("update" + tbl + " set body=$1, updated_at = now() where id=$2 returning *;",doc_string,id);
}else{
result = plv8.execute("insert into" + tbl + "(body) values($1) returning *;", doc_string);
id = result[0].id;
doc.id = id;
result = plv8.execute("update" + tbl + " set body=$1 where id=$2 returning *",JSON.stringify(doc),id);
}

return result[0] ? result[0].body : null;

$$ language plv8;

I'm sure that this feature looks a bit weird but if you read it line by line, you can understand some things. But why is called JSON.parse()?

This is due to the fact that the Postgres'ovsky type jsonb is JSONom — it line. Beyond our PLV8 plot is still the world Postgres and it works with JSON as a string (keeping it in jsonb in binary format). Thus, when the document gets into the function as a string you want to parse, if we want to work with it as with JSON object in JavaScript.

In the case of a insert'you may notice that I have to sync the ID with the primary key that was created. A bit cumbersome, but it works well.

In the end, you will notice that when insert'e of the original, as well as the update, as the input argument for plv8.execute served doc_string. This is also due to the fact that JSON values must be treated as strings in Postgres.

It really can be confusing. If I try to apply to the input doc (our JSON.parsed object) then it will be converted plv8 in [Object object]. That is strange.

Moreover, if I try to return a javascript object from this function (let's assume our doc variable), I get error that this is the wrong format for the type JSON. What drives into a stupor.

As a result, I simply return data from a query result and is a string, want — believe, want — no, and I can just pass it directly as a result. It is worth noting that all the results plv8.execute return items with which you can work with javascript objects.

Result

Works really well! And quickly. If You want to try it — You will need to install PLV8 extension and then write your query according to:
the

create extension plv8;
select * from save_document('test_run', '{"name" : "Test"}');

You should see a new table and a new entry in this table:

future Plans

following article I will add some additional features, namely:

the

Automatic update search fields
Insert multiple documents using arrays

This is a good start!

Article based on information from habrahabr.ru

Поиск по этому блогу

computer express