How do I prepare different txt files to analyse and compare the language used in texts i e sentences and paragraphs in RStudio

0 votes
Hi, I am quite new to using RStudio and I need some help getting language data into a processable format. My general interest relates to Natural Language Processing.

My data consists of different sets of texts, produced by different people.  I want to compare these sets, using e.g. the tokenizer and the Stylo() package. So I would like to see Text 1; 2; 3; 4 all by Person 1; and then Texts 1;2;3;4 by Person 2 etc.

I currently have each passage in a separate .txt file. I know how to import them; I know how to specify a working directory.

I would like to know:

1) how to get my data into a frame in RStudio so that I can identify and specify lines or texts for processing. When using Stylo(), my output is not organised in a way that I could, for example, identify which line belongs to which text and person.

Also,

2) When I simply import the data files and try to use tm(), for example,  I get an error message saying that there are more rows than data points in line 1.  Is this a major issue, if that is how the original data is structured?

Note that I cannot use CSV files as the data contains commas that are meaningful.

I'd appreciate any advice or directions to useful tutorials in this regard.

Thanks in advance.
May 19, 2022 in Data Science by KARIEN

edited Mar 4 113 views

No answer to this question. Be the first to respond.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP