If your paths have a limit to how nested they are, it might be a good idea to unroll them into separate columns, with the higher paths as SYMBOL
s (if there aren’t too many distinct values). Filtering on SYMBOL
columns is much faster than on strings.
Do you have an estimate of the cardinality i.e. how many entries you expect to be in those columns?
If each are less than perhaps 100,000, symbols could be a good option.
Filtering on SYMBOL
columns (or numbers) can be JIT
ed i.e. compiled to native code. Filtering against VARCHAR
will use Java-side string comparisons.
You can also try searching using LIKE
on the single path column, and see how fast it is. Maybe it performs better than a substring.
It is better to use VARCHAR
rather than STRING
if you are not using SYMBOL
. This will half the amount of storage you need if your text is mostly ASCII.