Practical considerations

[[parent-child-performance]] === Practical Considerations

Parent-child joins can be a useful technique for managing relationships when index-time performance((("parent-child relationship", "performance and"))) is more important than search-time performance, but it comes at a significant cost. Parent-child queries can be 5 to 10 times slower than the equivalent nested query!

==== Memory Use

At the time of going to press, the parent-child ID map is still held in memory.((("parent-child relationship", "memory usage")))((("memory usage", "parent-child ID map"))) There are plans to change the map to use doc values instead, which will be a big memory saving. Until that happens, you need to be aware of the following: the string _id field of every parent document has to be held in memory, and every child document requires 8 bytes (a long value) of memory. Actually, it's a bit less thanks to compression, but this gives you a rough idea.

You can check how much memory is being used by the parent-child cache by consulting ((("indices-stats API")))the indices-stats API (for a summary at the index level) or the node-stats API (for a summary at the node level):

[source,json]

GET /_nodes/stats/indices/id_cache?human <1>

<1> Returns memory use of the ID cache summarized by node in a human-friendly format.

==== Global Ordinals and Latency

Parent-child uses <> to speed((("global ordinals")))((("parent-child relationship", "global ordinals and latency"))) up joins. Regardless of whether the parent-child map uses an in-memory cache or on-disk doc values, global ordinals still need to be rebuilt after any change to the index.

The more parents in a shard, the longer global ordinals will take to build. Parent-child is best suited to situations where there are many children for each parent, rather than many parents and few children.

Global ordinals, by default, are built lazily: the first parent-child query or aggregation after a refresh will trigger building of global ordinals. This can introduce a significant latency spike for your users. You can use <> to((("eager global ordinals"))) shift the cost of building global ordinals from query time to refresh time, by mapping the _parent field as follows:

[source,json]

PUT /company { "mappings": { "branch": {}, "employee": { "_parent": { "type": "branch", "fielddata": { "loading": "eager_global_ordinals" <1> } } } }

}

<1> Global ordinals for the _parent field will be built before a new segment becomes visible to search.

With many parents, global ordinals can take several seconds to build. In this case, it makes sense to increase the refresh_interval so((("refresh_interval setting"))) that refreshes happen less often and global ordinals remain valid for longer. This will greatly reduce the CPU cost of rebuilding global ordinals every second.

==== Multigenerations and Concluding Thoughts

The ability to join multiple generations (see <>) sounds attractive until ((("grandparents and grandchildren")))((("parent-child relationship", "multi-generations")))you think of the costs involved:

  • The more joins you have, the worse performance will be.
  • Each generation of parents needs to have their string _id fields stored in memory, which can consume a lot of RAM.

As you consider your relationship schemes and whether parent-child is right for you, consider this advice ((("parent-child relationship", "guidelines for using")))about parent-child relationships:

  • Use parent-child relationships sparingly, and only when there are many more children than parents.
  • Avoid using multiple parent-child joins in a single query.
  • Avoid scoring by using the has_child filter, or the has_child query with score_mode set to none.
  • Keep the parent IDs short, so that they require less memory.

Above all: think about the other relationship techniques that we have discussed before reaching for parent-child.