79712226

Date: 2025-07-23 16:02:06
Score: 1
Natty:
Report link

πŸ‘‹

You're asking a great question β€” and it's a very common challenge when bridging string normalization between Python and SQL-based systems like MariaDB.


βœ… Answering your question directly:

Unfortunately, MariaDB collations such as utf8mb4_general_ci are not exactly equivalent to Python's str.lower() or str.casefold(). While utf8mb4_general_ci provides case-insensitive comparison, it does not handle Unicode normalization (like removing accents or special casing from some scripts), and it’s less aggressive than str.casefold() which is meant for caseless matching across different languages and scripts.


🧠 What's the difference?


πŸ’‘ Recommendations

  1. Use utf8mb4_unicode_ci or utf8mb4_0900_ai_ci (if available):

    • These are more Unicode-compliant than general_ci.
    • Still, they won’t match Python's str.casefold() completely.

    Example:

    CREATE TABLE example (
        name VARCHAR(255)
    ) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
    
  2. Normalize in Python before insert: If exact normalization (like casefold() or unicodedata.normalize()) is critical, consider pre-processing strings before storing them:

    import unicodedata
    
    def normalize(s):
        return unicodedata.normalize('NFKC', s.casefold())
    
  3. Store a normalized column: Add a second column that stores the normalized value and index it for fast equality comparison.

    ALTER TABLE users ADD COLUMN name_normalized VARCHAR(255);
    CREATE INDEX idx_normalized_name ON users(name_normalized);
    
  4. Use generated columns (MariaDB 10.2+): With a bit of trickery (though limited to SQL functions), you might offload normalization to the DB via generated columns β€” but it won't replicate Python's casefold/Unicode normalization fully.


🚫 TL;DR

There is no MariaDB collation that is fully equivalent to str.casefold(). Your best bet is to:


Hope that helps β€” and if anyone found a closer match for casefold() in SQL, I’d love to hear it too!

Reasons:
  • Blacklisted phrase (2): anyone found
  • Whitelisted phrase (-1): Hope that helps
  • Long answer (-1):
  • Has code block (-0.5):
  • Contains question mark (0.5):
  • Low reputation (1):
Posted by: Caio Jordan Siqueira