79774497

Date: 2025-09-25 07:33:39
Score: 0.5
Natty:
Report link

Duplicating the partition key as a clustering column is technically valid in CQL, but it usually doesn’t give you much benefit and can even introduce unnecessary overhead.

A few points to consider:

1. Partition key vs clustering key

If you duplicate the partition key as a clustering key, every row in the partition will have the same value for that clustering column. That means it adds no real ordering value, and every query that filters on that key is already bound by the partition key anyway.

2. Indexing

3. Analytics with Spark

For Spark workloads, it’s normal to scan multiple partitions:

So in practice you don’t need to “duplicate” keys for Spark — if your jobs are supposed to span multiple partitions, Spark already handles that efficiently.

4. Trade-offs

Reasons:
  • Long answer (-1):
  • No code block (0.5):
  • Low reputation (1):
Posted by: Ishwar Desai