Thanks to what @mehdi-sahraei suggested, I changed the dtype
to None
and this permitted to parse other rows (any row after the header line) correctly. Finally, it seems that there is no bug about how the header line is treated but rather a lack of clarity in the documentation. As indicated in my original post, the documentation says:
... if the optional argument names=True, the first commented line will be examined for names ...
But what the documentation doesn't tell you, is that in that case, the detected header is stored in dtype.names
and not beside other rows that come after the header in the file. So the header line is actually there but it is not directly accessible like other rows in the file. Here is a working test case for those who might be interested to check how this works in preactice:
C:\tmp\data.txt
#firstName|LastName
Anthony|Quinn
Harry|POTTER
George|WASHINGTON
And the program:
with open("C:/tmp/data.txt", "r", encoding="UTF-8") as fd:
result = np.genfromtxt(
fd,
delimiter="|",
comments="#",
dtype=None,
names=True,
skip_header=0,
autostrip=True,
)
print(f"result = {result}\n\n")
print("".join([
"After parsing the file entirely, the detected ", "header line is: ",
f"{result.dtype.names}"
]))
Which gives the expected result:
result = [('Anthony', 'Quinn') ('Harry', 'POTTER') ('George', 'WASHINGTON')]
After parsing the file entirely, the detected header line is: ('firstName', 'LastName')
Thanks everyone for your time and your help and I hope this might clarify the issue for those who have encountered the same problem.