79733429

Date: 2025-08-12 16:14:31
Score: 5.5
Natty:
Report link

You can detect and correct duplicate records with a two-step process.

  1. Find duplicates by aggregation

  2. Review and correct them

Let me demonstrate it with dummy data.

enter image description here

---

POST translations_test/_bulk
{ "index": {} }
{ "raw_body_text": "¿Hola, cómo estás?", "translated_body_text": "Hello, how are you?" }
{ "index": {} }
{ "raw_body_text": "Muy bien, ¡gracias!", "translated_body_text": "Hello, how are you?" }
{ "index": {} }
{ "raw_body_text": "¿Cómo te va?", "translated_body_text": "Hello, how are you?" }
{ "index": {} }
{ "raw_body_text": "Estoy bien.", "translated_body_text": "I am fine." }

GET translations_test/_search
{
  "size": 0,
  "aggs": {
    "translations": {
      "terms": {
        "field": "translated_body_text.keyword",
        "min_doc_count": 2,
        "size": 10000
      },
      "aggs": {
        "unique_sources": {
          "terms": {
            "field": "raw_body_text.keyword",
            "size": 10000
          }
        },
        "having_multiple_sources": {
          "bucket_selector": {
            "buckets_path": {
              "uniqueSourceCount": "unique_sources._bucket_count"
            },
            "script": "params.uniqueSourceCount > 1"
          }
        }
      }
    }
  }
}

Tips:

Extra Tip:

Reasons:
  • Blacklisted phrase (1): ¿
  • Blacklisted phrase (1): está
  • Blacklisted phrase (1): cómo
  • Blacklisted phrase (1): Cómo
  • Blacklisted phrase (2): gracias
  • Blacklisted phrase (2): Estoy
  • Long answer (-1):
  • Has code block (-0.5):
  • High reputation (-1):
Posted by: Musab Dogan