79187211

Date: 2024-11-14 02:29:32
Score: 0.5
Natty:
Report link

In python, a fast way to get the number of entities is: print(collection.num_entities)

But this method is not accurate because it only calculates the number from persisted segments, by quickly picking the number from etcd. Every time a segment is persisted, the basic information of the segment is recorded in Etcd, including its row number. collection.num_entities sums up row numbers of all the persisted segments. But this number doesn't count the deleted items. Let's say a segment has 1000 rows, and you call collection.delete() to delete 50 rows from the segment, the collection.num_entities always shows 1000 rows for you. And collection.num_entities doesn't know which entity is overwritten. Milvus storage is column-based, all the new data is appended to a new segment. If you use upsert() to overwrite an existing entity, it also appends the new entity to a new segment, and creates a delete action at the same time, the delete action is executed asynchronously. A delete action doesn't change the original number of this segment recorded in etcd because we don't intend to update etcd frequently(large amount of update action to etcd will show down the entire system performance). So, the collection.num_entities doesn't know which entity is deleted since the original number in etcd is not updated. Furthermore, collection.num_entities doesn't count non-persisted segments.

collection.query(output_fields=["count(*)"]) is a query request, executed by query nodes. It counts deleted items, and all segments including non-persisted. And collection.query() is slower than collection.num_entities.

If you have no delete/upsert actions to delete or overwrite the existing entities in a collection, then it is a fast way to check the row number of this collection by collection.num_entities. Otherwise, you should use collection.query(output_fields=["count(*)"]) to get the accurate row number.

Reasons:
  • Long answer (-1):
  • No code block (0.5):
  • Low reputation (1):
Posted by: groot