@Ulrich Eckhardt
You want to have polymorphic string types that use the different UTF encodings for representation of data and that share a common base class. - YES
You want to write code based solely on the base class generally. - YES, most of code should use the base class, without caring about actual content
You want a class representing a codepoint within such a string. You need that for both reading and writing, not necessarily in one class though. - YES, for codepoint the polymorphic was easier to achieve, so at current implementation, I have a base class and can use it as generic data type using pointers or reference to parse strings.
The problem is with defining the class representing the codepoint in a way that is convenient to work with and that performs reasonably at runtime.
NO, the codepoint code is optimized and memory efficient, it's basicaly a pointer with virtual API, most functions are constexpr so optimized at compile time, and the virtual function "pointers" in the memory object are rather restricted, so it's a fairly compact. The problem is within the API of the base class, that can only return base types, so
if I return them by value, I'll loose the polymorphism,
if I return them by reference, there will lot of conflict (not even speaking of thread matter, if someone do utfcodepoint&a=string[19]; utfcodepoint&b=string[20]; you modify "a" without noticing (the utfcodpoint reference returned reference an object within the string object) while you think that a will access the 19th codepoint, and b the 20th. Although the utfcodepoint a=string[19]; utfcodepoint b=string[20]; code will works, because the reference returned will by copy constructor/assing operator accessed and then a will contain a copy of 19th and b a copy of 20th. for codepoint the issue won't be much, but for string/substring, it will be a nightmare : to avoid copy in function, reference will be likely used so funct(str.substr(29,4), str.sbstr(18,5)) will certainly not do what expected....
to rephrase the problem : I would like to return something that keep the polymorphic information but without the pointer/reference lifetime problem.