π
You're asking a great question β and it's a very common challenge when bridging string normalization between Python and SQL-based systems like MariaDB.
Unfortunately, MariaDB collations such as utf8mb4_general_ci
are not exactly equivalent to Python's str.lower()
or str.casefold()
. While utf8mb4_general_ci
provides case-insensitive comparison, it does not handle Unicode normalization (like removing accents or special casing from some scripts), and itβs less aggressive than str.casefold()
which is meant for caseless matching across different languages and scripts.
str.lower()
only lowercases characters, but it's limited (e.g. doesn't handle German Γ correctly).str.casefold()
is a more aggressive, Unicode-aware version of lower()
, intended for caseless string comparisons.utf8mb4_general_ci
is a case-insensitive collation but doesn't support Unicode normalization like NFKC
or NFKD
.Use utf8mb4_unicode_ci
or utf8mb4_0900_ai_ci
(if available):
general_ci
.str.casefold()
completely.Example:
CREATE TABLE example (
name VARCHAR(255)
) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Normalize in Python before insert:
If exact normalization (like casefold()
or unicodedata.normalize()
) is critical, consider pre-processing strings before storing them:
import unicodedata
def normalize(s):
return unicodedata.normalize('NFKC', s.casefold())
Store a normalized column: Add a second column that stores the normalized value and index it for fast equality comparison.
ALTER TABLE users ADD COLUMN name_normalized VARCHAR(255);
CREATE INDEX idx_normalized_name ON users(name_normalized);
Use generated columns (MariaDB 10.2+): With a bit of trickery (though limited to SQL functions), you might offload normalization to the DB via generated columns β but it won't replicate Python's casefold/Unicode normalization fully.
There is no MariaDB collation that is fully equivalent to str.casefold()
. Your best bet is to:
utf8mb4_unicode_ci
for better Unicode-aware comparisons than general_ci
.Hope that helps β and if anyone found a closer match for casefold()
in SQL, Iβd love to hear it too!