web scraping on Tripadvisor s review

Question

I tried to web scraping tripadvisor's airlines review

https://www.tripadvisor.com/Airline_Review-d8729017-Reviews-Alaska-Airlines#REVIEWS

tried to extract rating:

rating <- page01 %>% html_node(".bubble_30 span") %>% html_text()

it shows (NA)

Tried to extract page:

pageNum <- page01 %>% html_nodes(".pageNum") %>% html_text()

it shows:character(0)

Please guide me.

Thanks

Cherukuri · Answer 1 · Sep 15, 2019

Hey, I don't know much about web scraping but check out this blog for an exact scenario as yours.

The below blog/article has an example to extract a page from the website.

https://www.datacamp.com/community/tutorials/r-web-scraping-rvest

Hope it helps!

answered Sep 15, 2019 by anonymous
• 33,050 points

Hi, thanks.

I did check this site before I posted this thread.

The html part gets me confusion.

for example:

I would like to get the last page number on this site :https://www.tripadvisor.com/Airline_Review-d8729017-Reviews-Alaska-Airlines

My code is

url %>%

html_nodes(".pageNum")

but I encounter the href in the node, so I've tried many times still not able to get the page number back.

commented Sep 16, 2019 by MarkYo

Hey Markyo,

I tried it with https://www.tripadvisor.com/Airline_Review-d8729017-Reviews-Alaska-Airlines website but couldn't find the problem.

So I tried doing web scraping with similar syntax as yours with edureka community website. It worked fine.

Here is the code is used.

page01 <-read_html('https://www.edureka.co/community/')
pageNum <- page01 %>% 
  html_nodes('.qa-page-link') %>% 
  html_text()                   
pageNum
htmlpage <- paste(url, '?page=', pageNum[1])
data = read_html(htmlpage)

Hope it helps!

commented Sep 16, 2019 by Cherukuri
• 33,050 points

Cherukuri · Answer 2 · Sep 16, 2019

Hi, I used the below code to find the review.

Check it out!!

> tripadvsor = read_html('https://www.tripadvisor.com/Airline_Review-d8729017-Reviews-Alaska-Airlines#REVIEWS')
> tripadvsor %>% html_nodes('.flights-airline-review-page-overview-module-OverviewModule__review_num--2Ga7T') %>% html_text() %>% as.numeric()
[1] 4