blog

beautifulsoupでHTMLのattribute名を指定してもタグが取れない

Published:

By nob

Category: Posts

Tags: Python BeautifulSoup

前提

software version
Python 3.11.5
BeautifulSoup 4.12.2

事象

>>> from bs4 import BeautifulSoup
>>> def my_tag(tag):
...     return (tag.attrs.get("data-urlRoot") == "/json")
...
>>> data = "<div data-urlRoot='/json'></div>"
>>> soup = BeautifulSoup(data, "html.parser")
>>> tags = soup.find_all(my_tag)
>>> tags
[]

原因

BeautifulSoupでは読み込んだHTMLのタグ/属性を小文字に変換している

HTML tags and attributes are case-insensitive

対策

小文字でattributeを検索する

>>> from bs4 import BeautifulSoup
>>> def my_tag(tag):
...     return (tag.attrs.get("data-urlroot") == "/json")
...
>>> tags = soup.find_all(my_tag)
>>> tags
[<div data-urlroot="/json"></div>]