hello,
I'm working on a sentiment analysis project where I'm dealing with the Arabic language. I downloaded an excel sheet that contains two columns, text and labels. and I'm getting this error 'utf-8' codec can't decode byte 0x82 in position 16: invalid start byte. The file itself can open, but when I want to tokenize the text the error occurs!
please help me very soon!!!
this is my code
import nltk
nltk.download('punkt')
token_data= open("data try.xlsx").read()
tokens = nltk.sent_tokenize(token_data)
sent_tokenize(token_data)