Utf 8 codec cant decode byte 0xed4/28/2023 You use 'rb',but then still need to decode or the Unicode will look like this M\xc3\x9aSICA. '' Quote:How can I pass the raw data in binary format of a read file without "altering" the encoding With open('test.srt', encoding='utf-8') as f: Test.srt: utf-8 with confidence 0.505Test. It should not if the file is saved as utf-8. Myfile.txt: ascii with confidence 1.0 Quote:This excerpt does raise the exact same exception. I tried this:It also install chardetect that can be command line(cmd) or cmder as i use. (Sep-06-2019, 02:02 PM)karkas Wrote: This has taken me some time and I haven't found a way to figure it out. Thank you very much for your examples and explanations. I don't even need to convert ints to strings, right? I've been reading a little about it and, from that and what I can understand from your example, I don't even need the quotes along with the . Quote:Use string formatting f-string,then it look much nicer than all . What I still don't understand is why, if I'm not rewriting that character and it's read normally by the first function that uses that file, it gets messed up and then, even if I read with 'utf-8', I still get this error. However, doing this alters the file and it won't be correctly converted. I tried this and realized the problem is with the Ú in 'MÚSICA' in the third subtitle. Quote:Can also read in with utf-8 and errors='ignore' or errors='replace'. * The italics were just to try something with the converter I'm writing. With open('test.srt', encoding='utf-8f') as f:įile_list = This excerpt does raise the exact same exception. By the way, the 'f' after the '8' in the encoding raises another itself, I guess it's just a typo. I tried this and realized that the excerpt I pasted and you used to run this test, doesn't raise this exception. Quote:If i save test from your example,it will work as i always save files in utf-8. How can I pass the raw data in binary format of a read file without "altering" the encoding, which is what this appears to be doing? I got several errors before I made this "work." Output:But then, with other tests, I've realized that tect is actually detecting the encoding I pass to the bytes function. I've been working for long hours and I'm kind of stuck. PD: Please excuse me if I'm not being very clear about some things, just let me know and I'll clarify. HoursEnd ':' minutesEnd ':' secondsEnd ',' millisecondsEnd '\n'Thanks in advance. InList = hoursBegin ':' minutesBegin ':' secondsBegin ',' millisecondsBegin ' -> ' \ The lines where I do the replacement are the following:
0 Comments
Leave a Reply.AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |