beautifulsoupでHTMLのattribute名を指定してもタグが取れない
Published:
By nobCategory: Posts
Tags: Python BeautifulSoup
前提
software | version |
---|---|
Python | 3.11.5 |
BeautifulSoup | 4.12.2 |
事象
>>> from bs4 import BeautifulSoup
>>> def my_tag(tag):
... return (tag.attrs.get("data-urlRoot") == "/json")
...
>>> data = "<div data-urlRoot='/json'></div>"
>>> soup = BeautifulSoup(data, "html.parser")
>>> tags = soup.find_all(my_tag)
>>> tags
[]
原因
BeautifulSoupでは読み込んだHTMLのタグ/属性を小文字に変換している
HTML tags and attributes are case-insensitive
対策
小文字でattributeを検索する
>>> from bs4 import BeautifulSoup
>>> def my_tag(tag):
... return (tag.attrs.get("data-urlroot") == "/json")
...
>>> tags = soup.find_all(my_tag)
>>> tags
[<div data-urlroot="/json"></div>]