Keywords: Kibana | Fielddata | Elasticsearch mapping
Abstract: This paper provides an in-depth analysis of the Fielddata disabling issue encountered when aggregating text fields in Elasticsearch 5.x and Kibana. It begins by explaining the fundamental concepts of Fielddata and its role in memory management, then details three implementation methods for enabling fielddata=true through mapping modifications: using Sense UI, cURL commands, and the Node.js client. Additionally, the paper compares the recommended keyword field alternative in Elasticsearch 5.x, analyzing the advantages, disadvantages, and applicable scenarios of both approaches. Finally, practical code examples demonstrate how to integrate mapping modifications into data indexing workflows, offering developers comprehensive technical solutions.
When using Elasticsearch 5.0.0-alpha3 and Kibana 5.0.0-alpha3 for data visualization, many developers encounter a common issue: when attempting to perform aggregation operations on text fields (such as creating histograms or word clouds), Kibana displays the error message "Fielddata is disabled on text fields by default." This problem stems from significant changes in text field handling in Elasticsearch 5.x versions.
Fundamental Concepts of Fielddata and Memory Management
Fielddata is an in-memory data structure used by Elasticsearch to support field sorting, aggregation, and script calculations. For text fields, Elasticsearch defaults to performing tokenization, breaking text into individual terms and building inverted indices. When aggregation operations are needed on such fields, the system must "uninvert" the inverted index, mapping terms back to original documents—a process that consumes substantial memory. Consequently, starting with Elasticsearch 5.x, Fielddata for text fields is disabled by default to optimize memory usage and system stability.
Three Implementation Methods for Enabling Fielddata
To resolve visualization issues in Kibana, Fielddata must be explicitly enabled for specific text fields in Elasticsearch mappings. The following are three primary implementation approaches:
Method 1: Using Sense UI or Kibana Dev Tools
Directly modifying mappings via Elasticsearch's REST API is the most straightforward method. Assuming we have an index named "your_index" containing type "your_type," and need to enable Fielddata for the "publisher" field, execute the following PUT request:
PUT your_index/_mapping/your_type
{
"your_type": {
"properties": {
"publisher": {
"type": "text",
"fielddata": true
}
}
}
}
After performing this operation, data must be reindexed for the changes to take effect. Once reindexed, Kibana will be able to perform aggregation operations on the "publisher" field normally.
Method 2: Using cURL Command-Line Tool
For developers accustomed to command-line interfaces, the same request can be sent via cURL:
curl -XPUT http://localhost:9200/index -d '{
"mappings": {
"type": {
"properties": {
"publisher": {
"type": "text",
"fielddata": true
}
}
}
}
}'
Method 3: Integrating into Node.js Indexing Scripts
For scenarios using Node.js clients for data indexing, mappings can be defined directly when creating the index. The following code demonstrates how to modify the original data indexing script:
client.indices.create({
index: 'index',
body: {
"mappings": {
"type": {
"properties": {
"publisher": {
"type": "text",
"fielddata": true
}
}
}
}
}
});
This code should be executed before data indexing to ensure the index is created with the correct mapping settings.
Alternative Approach in Elasticsearch 5.x: Keyword Fields
While enabling Fielddata solves the problem, Elasticsearch officially recommends using keyword fields as a superior alternative in 5.x versions. This method is implemented through multi-field mappings:
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"my_field": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
In this configuration, "my_field" is used for full-text search, while "my_field.keyword" is used for aggregation, sorting, and script operations. In Kibana, "publisher.keyword" should be used instead of "publisher" for visualizations.
Comparison of Both Methods and Selection Recommendations
Both enabling Fielddata and using keyword fields have their advantages and disadvantages:
- Fielddata Method: Direct and simple, but may consume significant memory, especially for text fields with high cardinality. Suitable for temporary solutions or scenarios with low field value cardinality.
- Keyword Field Method: More aligned with Elasticsearch 5.x design principles, with more efficient memory usage, but requires modifying field names in queries and visualizations. Suitable for long-term projects and performance-sensitive applications.
For users migrating from older versions, note the changes in query syntax. For example, in aggregation queries, "terms" : { "field" : "interests" } should be changed to "terms" : { "field" : "interests.keyword" }.
Practical Considerations in Implementation
When implementing the above solutions, several important factors must be considered:
- Memory Monitoring: After enabling Fielddata, closely monitor Elasticsearch cluster memory usage to avoid performance degradation or node crashes due to insufficient memory.
- Data Reindexing: After modifying mappings, existing data is not automatically updated; reindexing is necessary for changes to take effect.
- Version Compatibility: The solutions discussed primarily target Elasticsearch 5.x versions. Different versions may have different best practices.
- Field Analysis: If text fields have been analyzed into subfields, analyzer configurations may need to be checked to ensure aggregation results meet expectations.
By understanding how Fielddata works and the text processing mechanisms in Elasticsearch 5.x, developers can make more informed technical choices, balancing functional requirements with system performance to build more stable and efficient data visualization solutions.