How do I prepare different txt files to analyse and compare the language used in texts i e sentences and paragraphs in RStudio

Hi, I am quite new to using RStudio and I need some help getting language data into a processable format. My general interest relates to Natural Language Processing.

My data consists of different sets of texts, produced by different people. I want to compare these sets, using e.g. the tokenizer and the Stylo() package. So I would like to see Text 1; 2; 3; 4 all by Person 1; and then Texts 1;2;3;4 by Person 2 etc.

I currently have each passage in a separate .txt file. I know how to import them; I know how to specify a working directory.

I would like to know:

1) how to get my data into a frame in RStudio so that I can identify and specify lines or texts for processing. When using Stylo(), my output is not organised in a way that I could, for example, identify which line belongs to which text and person.

Also,

2) When I simply import the data files and try to use tm(), for example, I get an error message saying that there are more rows than data points in line 1. Is this a major issue, if that is how the original data is structured?

Note that I cannot use CSV files as the data contains commas that are meaningful.

I'd appreciate any advice or directions to useful tutorials in this regard.

Thanks in advance.

No answer to this question. Be the first to respond.