Notes on Couch DB

  • Couch DB is a database, a NoSQL database (non relational database)
  • Documents are stored, uniquely named in the db
  • It provides a Restful API for creating, reading, updating and deleting documents
  • CouchDB is stored in semi-structured documents

Documents are the primary unit of data

The Couch DB update model is lockless and optimistic

Single document updates either succeed or fail

Documents are indexed in B-trees by their name (Doc id) and a sequence ID. Each update generates a new sequential number

Getting Started

Install couch db using one of the installation guides

Setup couch db, I will be using the single node setup.

Visit Fauxton at: http://127.0.0.1:5984/_utils#setup

Ensure it is running by issuing a GET request to port 5984

$ http :5984
HTTP/1.1 200 OK
Cache-Control: must-revalidate
Content-Length: 208
Content-Type: application/json
Date: Mon, 29 Apr 2019 08:16:53 GMT
Server: CouchDB/2.3.1 (Erlang OTP/21)
X-Couch-Request-ID: 3d97d2fecc
X-CouchDB-Body-Time: 0

{
    "couchdb": "Welcome",
    "features": [
        "pluggable-storage-engines",
        "scheduler"
    ],
    "git_sha": "c298091a4",
    "uuid": "d13db32f8059f98e73f8b88cd88b3cfa",
    "vendor": {
        "name": "The Apache Software Foundation"
    },
    "version": "2.3.1"
}

Get a list of databases

$ http :5984/_all_dbs

[
    "_global_changes",
    "_replicator",
    "_users"
]

Create a database

$ http -a couch:pass PUT :5984/cricket

{
    "ok": true
}

Delete a database

$ http -a couch:pass DELETE :5984/whale

{
    "ok": true
}

Use Fauxton to create a database and a document

When you write your first programs, we recommend assigning your own UUIDs. Generating your own UUIDs makes sure that you’ll never end up with duplicate documents.

Running Queries

Traditional relational databases allow you to run any queries you like as long as your data is structured correctly. In contrast, CouchDB uses predefined map and reduce functions in a style known as MapReduce.

  • Map functions are called once with each document as the argument
  • When writing CouchDB map functions, your primary goal is to build an index that stores related data under nearby keys

Example document:

{
    "_id": "a611132e5c11476f1363ffdb35001b8a",
    "_rev": "1-be5d5870c9ef76734789df431d0ffe7b",
    "item": "apple",
    "prices": {
        "Fresh Mart": 1.59,
        "Price Max": 5.99,
        "Apples Express": 0.79
    }
}

Map function:

function(doc) {
    var shop, price, key;
    if (doc.item && doc.prices) {
        for (shop in doc.prices) {
            price = doc.prices[shop];
            key = [doc.item, price];
            emit(key, shop);
        }
    }
}

It’s important to check for the existence of any fields before you use them

Couch DB Core API

  • CouchDB is a database management system (DMS) - it can hold mutilple databases
  • A database is a bucket that holds related data

Create a DB

$ http -a couch:pass PUT :5984/test
{
    "ok": true
}

If it fails a second time:

http -a couch:pass PUT :5984/test
{
    "error": "file_exists",
    "reason": "The database could not be created, the file already exists."
}

Couchdb stores each database in a single file

Delete a db

$ http -a couch:pass delete :5984/test

Be careful with this, it is hard to bring your data back without a backup

Documents

  • Couch DB’s central data structure
  • Couch DB uses JSON to store documents
  • Each document in CouchDB has an ID, unique per database.
  • UUID’s: UUIDs are random numbers that have such a low collision probability that everybody can make thousands of UUIDs a minute for millions of years without ever creating a duplicate.

Creating a document

http PUT :5984/hello-world/6e1295ed6c29495e54cc05947f18c8af title='There is Nothing Left to Lose' artist='Foo Fighters'

{
    "id": "6e1295ed6c29495e54cc05947f18c8af",
    "ok": true,
    "rev": "1-4b39c2971c9ad54cb37e08fa02fec636"
}

Get a UUID

You can get a uuid with:

http :5984/_uuids
{
    "uuids": [
        "a611132e5c11476f1363ffdb350051c1"
    ]
}

You can get more than 1 uuid with:

http :5984/_uuids?count=10
{
    "uuids": [
        "a611132e5c11476f1363ffdb35005cb4",
        "a611132e5c11476f1363ffdb350068d0",
        "a611132e5c11476f1363ffdb35006b23",
        "a611132e5c11476f1363ffdb3500787f",
        "a611132e5c11476f1363ffdb3500815f",
        "a611132e5c11476f1363ffdb35008d9a",
        "a611132e5c11476f1363ffdb350095fb",
        "a611132e5c11476f1363ffdb3500a5ad",
        "a611132e5c11476f1363ffdb3500b3cc",
        "a611132e5c11476f1363ffdb3500bc0c"
    ]
}

Get a document

$ http GET :5984/hello-world/6e1295ed6c29495e54cc05947f18c8af
{
    "_id": "6e1295ed6c29495e54cc05947f18c8af",
    "_rev": "1-4b39c2971c9ad54cb37e08fa02fec636",
    "artist": "Foo Fighters",
    "title": "There is Nothing Left to Lose"
}

_rev stands for revision

Revisions

Whenever you change a field in couch you load and save an entire new revision (or version) of the document

If you want to update or delete a document, couchdb expects you to include the _rev field of the revision you wish to change. This prevents you from overwriting data you didn’t know existed - or whoever changes the file first…wins.

If you don’t pride a _rev field:

$ http PUT :5984/hello-world/6e1295ed6c29495e54cc05947f18c8af title='There is Nothing Left to Lose' artist='Foo Fighters' year=1997
{
    "error": "conflict",
    "reason": "Document update conflict."
}

If you add the revision version

$ http PUT :5984/hello-world/6e1295ed6c29495e54cc05947f18c8af title='There is Nothing Left to Lose' artist='Foo Fighters' year=1997 _rev=1-4b39c2971c9ad54cb37e08fa02fec636
{
    "id": "6e1295ed6c29495e54cc05947f18c8af",
    "ok": true,
    "rev": "2-a0ecd0b4133f5d5824078835d510c231"
}

CouchDB accepted your write and also generated a new revision number. The revision number is the MD5 hash of the transport representation of a document with an N- prefix denoting the number of times a document got updated

This is called MVCC (Multi-Version Concurrency Control) - chosen because HTTP is stateless.

CouchDB does not guarantee that older versions are kept around. Don’t use the _rev token in CouchDB as a revision control system for your documents.

Documents in Detail

Get a UUID

$ http :5984/_uuids

Create the document

$ http PUT :5984/hello-world/a611132e5c11476f1363ffdb3500bcd0 title="Blackened Sky" artist="Biffy Clyro" year=2002
HTTP/1.1 201 Created
Cache-Control: must-revalidate
Content-Length: 95
Content-Type: application/json
Date: Mon, 29 Apr 2019 10:12:29 GMT
ETag: "1-c593a87983eabbc39bb70f04cb0e57a6"
Location: http://localhost:5984/hello-world/a611132e5c11476f1363ffdb3500bcd0
Server: CouchDB/2.3.1 (Erlang OTP/21)
X-Couch-Request-ID: bd37be00af
X-CouchDB-Body-Time: 0
{
    "id": "a611132e5c11476f1363ffdb3500bcd0",
    "ok": true,
    "rev": "1-c593a87983eabbc39bb70f04cb0e57a6"
}

An ETag is returned and is the same as rev

Attachments

  • Files attached to a document
  • Attachments get their own URL where you can upload data

Adding an attachment:

$ http put :5984/hello-world/a611132e5c11476f1363ffdb3500bcd0/chart.png?rev=1-c593a87983eabbc39bb70f04cb0e57a6 @~/Desktop/chart.png Content-Type:image/png
{
    "id": "a611132e5c11476f1363ffdb3500bcd0",
    "ok": true,
    "rev": "2-3b33267677cceecb6c209ac2fb391abf"
}

The attachment will be added to the document:

$ http :5984/hello-world/a611132e5c11476f1363ffdb3500bcd0
{
    "_attachments": {
        "chart.png": {
            "content_type": "image/png",
            "digest": "md5-y9V09vx/4l7/UWfzwTaDmw==",
            "length": 422288,
            "revpos": 2,
            "stub": true
        }
    },
    "_id": "a611132e5c11476f1363ffdb3500bcd0",
    "_rev": "2-3b33267677cceecb6c209ac2fb391abf",
    "artist": "Biffy Clyro",
    "title": "Blackened Sky",
    "year": "2002"
}

_attachments a list of keys and values of JSON objects containing attachment data

A request with ?attachments=true will return a base64 encoded attachment

Source