BeautifulSoup4

## 参考链接
> https://github.com/DeronW/beautifulsoup/blob/v4.4.0/docs/index.rst

## 安装
```
pip install beautifulsoup4
```

## 创建一个bs实例
```python
#  直接打开文件
soup = BeautifulSoup(open("index.html"))
#  使用字符串创建
soup = BeautifulSoup("<html>xxx</html>")
```

## 解析器
```python
#  Python标准库
BeautifulSoup(markup, "html.parser")

#  lxml
#  html解析器
BeautifulSoup(markup, "lxml")
#  xml解析器
BeautifulSoup(markup, ["lxml-xml"])
BeautifulSoup(markup, "xml")

#  htmll5lib
BeautifulSoup(markup, "html5lib")
```

## Tag对象属性
```python
#  获取子tag，变量名与html或xml标签相同，只获取第一个
#  例如h2，p
Tag.tag_name

#  的标签名
Tag.name

#  html属性
#  例如id，class
tag['id']
#  获取所有属性，返回一个字典
tag.attrs

#  获取tag中的字符串（当tag中只有一个字符串时生效，否则返回None）
#  如果tag只有一个子节点，也会输出这个子节点（字符串相当于一个子节点）
tag.string

#  遍历获取字符串，返回一个列表
tag.strings

#  遍历获取字符串，删除空格与换行
tag.stripped_strings

#  获取所有子节点，返回一个列表
tag.contents

#  子节点生成器，可对子节点进行循环
tag.children

#  遍历获取所有子孙节点，返回一个列表
tag.descendants

#  获取父节点
tag.parent

#  递归获取父节点，返回一个列表
tag.parents

#  获取兄弟节点
tag.previous_sibling
tag.next_sibling

#  对兄弟节点进行迭代输出
tag.next_siblings
tag.previous_siblings

#  获取上一个/下一个被解析的对象
tag.previous_element
tag.next_element

#  迭代获取上一个/下一个被解析的对象
tag.previous_elements
tag.next_elements
```

## Tag对象方法
```python
#  搜索子节点，返回第一个结果
#  标签名，例如p，h2
#  也可以是正则
#  也可以是列表
#  也可以是属性
#  string参数，搜索包含string的tag
#  设置数量
tag.find('p')
tag.find(re.compile("t"))
tag.find(['a','p'])
tag.find('a', class_='aa')
tag.find(string='aaa')
tag.find('a', limit=5)

#  搜索子节点，返回一个列表
tag.find_all()
#  搜索父节点
tag.find_parent()
tag.find_parents()
#  搜索兄弟节点
tag.find_next_sibling()
tag.find_next_siblings()
tag.find_previous_sibling()
tag.find_previous_siblings()
#  向前搜索
tag.find_previous()
tag.find_all_previous()
#  向后搜索
tag.find_next()
tag.find_all_next()
#  css选择器，和css语法一样
tag.select('a[href$="tillie"]')

#  添加子节点
tag.append("<p>aaa</p>")
#  插入子节点
tag.insert(0, '<p>aaa</p>')
#  在当前节点前添加
tag.insert_before()
#  在当前节点后添加
tag.insert_after()
#  清除节点
tag.clear()
#  移除当前节点，并返回
tag.extract()
#  移除当前节点，并销毁
tag.decompose()
#  替换节点
tag.replace_with()
#  对节点进行封装
tag.wrap(tag.new_tag("b"))
#  移除节点标签
tag.unwrap()
#  获取文本
tag.get_text()
#  格式化输出
print(tag.prettify())
```

BeautifulSoup4

酷酷番茄