2025-02-19 Elasticsearch

Elastic in Action Notes

ElasticSearch In Action#

4. Mapping#

4.1 Mapping Oveview#

Mapping is the schema definition

In elasticsearch one does nt need to define the schema/mapping before inserting documents (although it is recommeneded).

Sending this request (without anything else being done prior):

PUT movies/_doc/1
{
    "title": "Godfather",
    "rating": 4.5,
    "release_year": "1972/08/01"
}

does:

A new index movies is created automatically
A new schema (mapping) is created for the index. For example, title is set to a text and keyword types, rating to a float, and release_year to a date type.
The document is indexed and stored into elasticsearch data store
Subsequent documents are stored without prior steps

The dynamically generated schema can be acquired for the mapping api:

GET movies/_mapping
{
    "test": {
        "mappings": {
            "properties": {
                "rating": {
                    "type": "float"
                },
                "release_year": {
                    "type": "date",
                    "format": "yyyy/MM/dd HH:mm:ss||yyyy/MM/dd||epoch_millis",
                    "print_format": "yyyy/MM/dd HH:mm:ss"
                },
                "title": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                        }
                    }
                }
            }
        }
    }
}

This process is called dynamic mapping - when a document is indexed for the first time. This is a best guess and can often be wrong.

Note the title is multi-types. title.keyword is how it is to be accessed.

keyword fields are used for exact value searches - they are untouched and won’t go through an analysis phase. Neither tokenised, synonymised, nor stemmed. They use a special analyser called a no-op (No operation) analyser. The analyser spits out the entire field as a token. One can customise and enable a normaliser on keyword fields (expected to be done prior to indexing).

4.2 Dynamic Mapping#

Elasticsearch can deduce whether a field is float or matches the format of a date.

Should you declare the field as a date type explicitly (using explicit mapping - more on this later), unless you provide a custom format, by default the field would be adhering to strict_date_optional_time format. The strict_date_optional_time conforms to a ISO date format, i.e.,yyyy-MM-dd or yyyy-MM-ddTHH:mm:ss.

If the format does not match exactly Elasticsearch will consider that value as a text field rather than a date field

Also if instead of a rating of 4.5 the value was 4. Elasticsearch would set the data type as long.

Having incorrect types will cause potential problems in an application, making the fields ineligible for sorting, filtering, and aggregations on data

The main downside of dynamic mapping is: Elasticsearch could misinterpret the document field values and derive an incorrect mapping which voids the fields eligibility for appropriate search, sort and aggregation capabilities

Example:

PUT students/_doc/1
{
    "name": "John",
    "age": "12"
}

GET students/_mapping
{
    "test": {
        "mappings": {
        "properties": {
            "age": {
            "type": "text",
            "fields": {
                "keyword": {
                "type": "keyword",
                "ignore_above": 256
                }
            }
            },
            "name": {
            "type": "text",
            "fields": {
                "keyword": {
                "type": "keyword",
                "ignore_above": 256
                }
            }
            }
        }
        }
    }
    }

The age being enclosed in quotes means elasticsearch treats it as a text

Then attempting to search and sort on age:

GET students/_search
{
    "sort": [
        {
            "age": {
                "order": "desc"
            }
        }
    ]
}

Returns an error:

{
"error": {
    "root_cause": [
    {
        "type": "illegal_argument_exception",
        "reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [age] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
    }

Elasticsearch cannot sort on fields that are of type text by default

One can enable fielddata:

PUT students_with_fielddata_setting
{
    "mappings": {
        "properties": {
            "age": {
                "type": "text",
                "fielddata": true
            }
        }
    }
}

fielddata leads to expensive computation using heap field data cache, suggestion is to rather use keyword fields.

Using Keyword#

Fortunately Elasticsearch creates any field as a multi-field with keyword as the second type by default

Changing the search to works:

GET students/_search
{
    "sort": [
        {
            "age.keyword": {
                "order": "desc"
            }
        }
    ]
}

Sorting can be applied to keyword type fields

Date formats#

Differing date formats from yyyy-MM-dd or yyyy/MM/dd - will be considered as text and ineligible for sorting filtering and aggregation.

4.3 Explicit Mapping#

There are 2 ways to create the mappings explcitly:

Indexing APIs - at the time of index creation
Mapping APIs - add and modify mapping with the _mapping endpoint

indexing api

PUT movies
{
    "mappings": {
        "properties": {
            "title": {
                "type": "text"
            }
        }
    }
}

Put command for index and then pass the mappings object with all the required fields

mapping api

PUT movies/_mapping
{
    "properties": {
        "release_date": {
            "type": "date",
            "format dd-mm-yyyy"
        }
    }
}

Put command for index/_mapping endpoint then pass a properties object with the new fields

4.3.3 Modifying Existing Fields is Not Allowed#

Once an index is live - any modification of existing fields in the live index is prohibited. Modifying in place would lead to search failures.

To make updates one needs to use the reindexing technique:

Create a new index with the updated schema
Copy the data from the old index to the new index
When new index ready, application switches to use the new index
Remove the old index when confirmed new index works

To reindex, we issue:

POST _reindex
{
"source": {"index": "orders"},
"dest": {"index": "orders_new"}
}

Use aliases:

If your application is rigidly tied up with an existing index, moving to the new index may require a code or configuration change. The ideal way to avoid such situations is to use aliases. Aliases are the alternate names given to indices. Aliasing helps us switch between indices seamlessly with zero downtime.

Source#

Elasticsearch in Action - Radu Gheorghe, Matthew Lee Hinman, and Roy Russo