Do the genes have to be strictly consecutive (i.e. adjacent) ? if not:
You can get the duplicated genes, then for each of them get all the rows that match it, then loop over them to add a suffix
import pandas as pd
df_genes_data = {"gene_id": ["g0", "g1", "g1", "g2", "g3", "g4", "g4", "g4"]}
df_genes = pd.DataFrame.from_dict(df_genes_data)
print(df_genes.to_string())
duplicated_genes = df_genes[df_genes["gene_id"].duplicated()]["gene_id"]
for gene in duplicated_genes:
df_gene = df_genes[df_genes["gene_id"] == gene]
for i, (idx, row) in enumerate(df_gene.iterrows()):
df_genes.loc[idx, "gene_id"] = row["gene_id"] + f"_TE{i+1}"
print(df_genes)
out:
gene_id
0 g0
1 g1_TE1
2 g1_TE2
3 g2
4 g3
5 g4_TE1
6 g4_TE2
7 g4_TE3
if they have to be strictly adjacent then the answer would change