I'm trying to use beautiful soups and requests to program a website scraper in Python. I can easily collect all of the text I want but some of the text I'm trying to download has inline images that are important. I want to replace the image with it's title, and add that to a string I can parse later, but I'm not sure how to do this.
This is an example of the kind of HTML I'm trying to parse:
<td colspan="3"><b>"Assemble under Siegfried!"</b>
<a href="/wiki/index.php/File:Continuous.png" class="image" title="CONT"><img alt="CONT" src="/wiki/images/thumb/7/78/Continuous.png/14px-Continuous.png" width="14" height="17" srcset="/wiki/images/thumb/7/78/Continuous.png/21px-Continuous.png 1.5x, /wiki/images/7/78/Continuous.png 2x">
</a> This unit gains +10 attack for each
<a href="/wiki/index.php/File:Black.png" class="image" title="Black"><img alt="Black" src="/wiki/images/thumb/7/71/Black.png/15px-Black.png" width="15" height="15" srcset="/wiki/images/thumb/7/71/Black.png/23px-Black.png 1.5x, /wiki/images/thumb/7/71/Black.png/30px-Black.png 2x">
</a> and
<a href="/wiki/index.php/File:White.png" class="image" title="White"><img alt="White" src="/wiki/images/thumb/8/80/White.png/15px-White.png" width="15" height="15" srcset="/wiki/images/thumb/8/80/White.png/23px-White.png 1.5x, /wiki/images/thumb/8/80/White.png/30px-White.png 2x">
</a> ally besides this unit.
</td>
From this HTML I want to pull:
"Assemble under Siegfried! CONT This unit gains +10 attack for each Black and White ally besides this unit."
Using the normal get_text() method does not include the titles of the images, which is the problem.