/home/sina/venv/bin/python /home/sina/PycharmProjects/qdq2/Parser.py
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
 <head>
  <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
  <title>
   Beautiful Soup Documentation — Beautiful Soup 4.4.0 documentation
  </title>
  <link href="_static/classic.css" rel="stylesheet" type="text/css"/>
  <link href="_static/pygments.css" rel="stylesheet" type="text/css"/>
  <script type="text/javascript">
   var DOCUMENTATION_OPTIONS = {
        URL_ROOT:    './',
        VERSION:     '4.4.0',
        COLLAPSE_INDEX: false,
        FILE_SUFFIX: '.html',
        HAS_SOURCE:  true
      };
  </script>
  <script src="_static/jquery.js" type="text/javascript">
  </script>
  <script src="_static/underscore.js" type="text/javascript">
  </script>
  <script src="_static/doctools.js" type="text/javascript">
  </script>
  <link href="#" rel="top" title="Beautiful Soup 4.4.0 documentation"/>
 </head>
 <body role="document">
  <div aria-label="related navigation" class="related" role="navigation">
   <h3>
    Navigation
   </h3>
   <ul>
    <li class="right" style="margin-right: 10px">
     <a accesskey="I" href="genindex.html" title="General Index">
      index
     </a>
    </li>
    <li class="nav-item nav-item-0">
     <a href="#">
      Beautiful Soup 4.4.0 documentation
     </a>
     »
    </li>
   </ul>
  </div>
  <div class="document">
   <div class="documentwrapper">
    <div class="bodywrapper">
     <div class="body" role="main">
      <div class="section" id="beautiful-soup-documentation">
       <h1>
        Beautiful Soup Documentation
        <a class="headerlink" href="#beautiful-soup-documentation" title="Permalink to this headline">
         ¶
        </a>
       </h1>
       <img alt='"The Fish-Footman began by producing from under his arm a great letter, nearly as large as himself."' class="align-right" src="_images/6.1.jpg"/>
       <p>
        <a class="reference external" href="http://www.crummy.com/software/BeautifulSoup/">
         Beautiful Soup
        </a>
        is a
Python library for pulling data out of HTML and XML files. It works
with your favorite parser to provide idiomatic ways of navigating,
searching, and modifying the parse tree. It commonly saves programmers
hours or days of work.
       </p>
       <p>
        These instructions illustrate all major features of Beautiful Soup 4,
with examples. I show you what the library is good for, how it works,
how to use it, how to make it do what you want, and what to do when it
violates your expectations.
       </p>
       <p>
        The examples in this documentation should work the same way in Python
2.7 and Python 3.2.
       </p>
       <p>
        You might be looking for the documentation for
        <a class="reference external" href="http://www.crummy.com/software/BeautifulSoup/bs3/documentation.html">
         Beautiful Soup 3
        </a>
        .
If so, you should know that Beautiful Soup 3 is no longer being
developed, and that Beautiful Soup 4 is recommended for all new
projects. If you want to learn about the differences between Beautiful
Soup 3 and Beautiful Soup 4, see
        <a class="reference internal" href="#porting-code-to-bs4">
         Porting code to BS4
        </a>
        .
       </p>
       <p>
        This documentation has been translated into other languages by
Beautiful Soup users:
       </p>
       <ul class="simple">
        <li>
         <a class="reference external" href="http://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/">
          这篇文档当然还有中文版.
         </a>
        </li>
        <li>
         このページは日本語で利用できます(
         <a class="reference external" href="http://kondou.com/BS4/">
          外部リンク
         </a>
         )
        </li>
        <li>
         이 문서는 한국어 번역도 가능합니다. (
         <a class="reference external" href="http://coreapython.hosting.paran.com/etc/beautifulsoup4.html">
          외부 링크
         </a>
         )
        </li>
       </ul>
       <div class="section" id="getting-help">
        <h2>
         Getting help
         <a class="headerlink" href="#getting-help" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         If you have questions about Beautiful Soup, or run into problems,
         <a class="reference external" href="https://groups.google.com/forum/?fromgroups#!forum/beautifulsoup">
          send mail to the discussion group
         </a>
         . If
your problem involves parsing an HTML document, be sure to mention
         <a class="reference internal" href="#diagnose">
          <span>
           what the diagnose() function says
          </span>
         </a>
         about
that document.
        </p>
       </div>
      </div>
      <div class="section" id="quick-start">
       <h1>
        Quick Start
        <a class="headerlink" href="#quick-start" title="Permalink to this headline">
         ¶
        </a>
       </h1>
       <p>
        Here’s an HTML document I’ll be using as an example throughout this
document. It’s part of a story from
        <cite>
         Alice in Wonderland
        </cite>
        :
       </p>
       <div class="highlight-python">
        <div class="highlight">
         <pre><span class="n">html_doc</span> <span class="o">=</span> <span class="s2">"""</span>
<span class="s2">&lt;html&gt;&lt;head&gt;&lt;title&gt;The Dormouse's story&lt;/title&gt;&lt;/head&gt;</span>
<span class="s2">&lt;body&gt;</span>
<span class="s2">&lt;p class="title"&gt;&lt;b&gt;The Dormouse's story&lt;/b&gt;&lt;/p&gt;</span>

<span class="s2">&lt;p class="story"&gt;Once upon a time there were three little sisters; and their names were</span>
<span class="s2">&lt;a href="http://example.com/elsie" class="sister" id="link1"&gt;Elsie&lt;/a&gt;,</span>
<span class="s2">&lt;a href="http://example.com/lacie" class="sister" id="link2"&gt;Lacie&lt;/a&gt; and</span>
<span class="s2">&lt;a href="http://example.com/tillie" class="sister" id="link3"&gt;Tillie&lt;/a&gt;;</span>
<span class="s2">and they lived at the bottom of a well.&lt;/p&gt;</span>

<span class="s2">&lt;p class="story"&gt;...&lt;/p&gt;</span>
<span class="s2">"""</span>
</pre>
        </div>
       </div>
       <p>
        Running the “three sisters” document through Beautiful Soup gives us a
        <code class="docutils literal">
         <span class="pre">
          BeautifulSoup
         </span>
        </code>
        object, which represents the document as a nested
data structure:
       </p>
       <div class="highlight-python">
        <div class="highlight">
         <pre><span class="kn">from</span> <span class="nn">bs4</span> <span class="kn">import</span> <span class="n">BeautifulSoup</span>
<span class="n">soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="n">html_doc</span><span class="p">,</span> <span class="s1">'html.parser'</span><span class="p">)</span>

<span class="k">print</span><span class="p">(</span><span class="n">soup</span><span class="o">.</span><span class="n">prettify</span><span class="p">())</span>
<span class="c1"># &lt;html&gt;</span>
<span class="c1">#  &lt;head&gt;</span>
<span class="c1">#   &lt;title&gt;</span>
<span class="c1">#    The Dormouse's story</span>
<span class="c1">#   &lt;/title&gt;</span>
<span class="c1">#  &lt;/head&gt;</span>
<span class="c1">#  &lt;body&gt;</span>
<span class="c1">#   &lt;p class="title"&gt;</span>
<span class="c1">#    &lt;b&gt;</span>
<span class="c1">#     The Dormouse's story</span>
<span class="c1">#    &lt;/b&gt;</span>
<span class="c1">#   &lt;/p&gt;</span>
<span class="c1">#   &lt;p class="story"&gt;</span>
<span class="c1">#    Once upon a time there were three little sisters; and their names were</span>
<span class="c1">#    &lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;</span>
<span class="c1">#     Elsie</span>
<span class="c1">#    &lt;/a&gt;</span>
<span class="c1">#    ,</span>
<span class="c1">#    &lt;a class="sister" href="http://example.com/lacie" id="link2"&gt;</span>
<span class="c1">#     Lacie</span>
<span class="c1">#    &lt;/a&gt;</span>
<span class="c1">#    and</span>
<span class="c1">#    &lt;a class="sister" href="http://example.com/tillie" id="link2"&gt;</span>
<span class="c1">#     Tillie</span>
<span class="c1">#    &lt;/a&gt;</span>
<span class="c1">#    ; and they lived at the bottom of a well.</span>
<span class="c1">#   &lt;/p&gt;</span>
<span class="c1">#   &lt;p class="story"&gt;</span>
<span class="c1">#    ...</span>
<span class="c1">#   &lt;/p&gt;</span>
<span class="c1">#  &lt;/body&gt;</span>
<span class="c1"># &lt;/html&gt;</span>
</pre>
        </div>
       </div>
       <p>
        Here are some simple ways to navigate that data structure:
       </p>
       <div class="highlight-python">
        <div class="highlight">
         <pre><span class="n">soup</span><span class="o">.</span><span class="n">title</span>
<span class="c1"># &lt;title&gt;The Dormouse's story&lt;/title&gt;</span>

<span class="n">soup</span><span class="o">.</span><span class="n">title</span><span class="o">.</span><span class="n">name</span>
<span class="c1"># u'title'</span>

<span class="n">soup</span><span class="o">.</span><span class="n">title</span><span class="o">.</span><span class="n">string</span>
<span class="c1"># u'The Dormouse's story'</span>

<span class="n">soup</span><span class="o">.</span><span class="n">title</span><span class="o">.</span><span class="n">parent</span><span class="o">.</span><span class="n">name</span>
<span class="c1"># u'head'</span>

<span class="n">soup</span><span class="o">.</span><span class="n">p</span>
<span class="c1"># &lt;p class="title"&gt;&lt;b&gt;The Dormouse's story&lt;/b&gt;&lt;/p&gt;</span>

<span class="n">soup</span><span class="o">.</span><span class="n">p</span><span class="p">[</span><span class="s1">'class'</span><span class="p">]</span>
<span class="c1"># u'title'</span>

<span class="n">soup</span><span class="o">.</span><span class="n">a</span>
<span class="c1"># &lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;Elsie&lt;/a&gt;</span>

<span class="n">soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="s1">'a'</span><span class="p">)</span>
<span class="c1"># [&lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;Elsie&lt;/a&gt;,</span>
<span class="c1">#  &lt;a class="sister" href="http://example.com/lacie" id="link2"&gt;Lacie&lt;/a&gt;,</span>
<span class="c1">#  &lt;a class="sister" href="http://example.com/tillie" id="link3"&gt;Tillie&lt;/a&gt;]</span>

<span class="n">soup</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="nb">id</span><span class="o">=</span><span class="s2">"link3"</span><span class="p">)</span>
<span class="c1"># &lt;a class="sister" href="http://example.com/tillie" id="link3"&gt;Tillie&lt;/a&gt;</span>
</pre>
        </div>
       </div>
       <p>
        One common task is extracting all the URLs found within a page’s &lt;a&gt; tags:
       </p>
       <div class="highlight-python">
        <div class="highlight">
         <pre><span class="k">for</span> <span class="n">link</span> <span class="ow">in</span> <span class="n">soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="s1">'a'</span><span class="p">):</span>
    <span class="k">print</span><span class="p">(</span><span class="n">link</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'href'</span><span class="p">))</span>
<span class="c1"># http://example.com/elsie</span>
<span class="c1"># http://example.com/lacie</span>
<span class="c1"># http://example.com/tillie</span>
</pre>
        </div>
       </div>
       <p>
        Another common task is extracting all the text from a page:
       </p>
       <div class="highlight-python">
        <div class="highlight">
         <pre><span class="k">print</span><span class="p">(</span><span class="n">soup</span><span class="o">.</span><span class="n">get_text</span><span class="p">())</span>
<span class="c1"># The Dormouse's story</span>
<span class="c1">#</span>
<span class="c1"># The Dormouse's story</span>
<span class="c1">#</span>
<span class="c1"># Once upon a time there were three little sisters; and their names were</span>
<span class="c1"># Elsie,</span>
<span class="c1"># Lacie and</span>
<span class="c1"># Tillie;</span>
<span class="c1"># and they lived at the bottom of a well.</span>
<span class="c1">#</span>
<span class="c1"># ...</span>
</pre>
        </div>
       </div>
       <p>
        Does this look like what you need? If so, read on.
       </p>
      </div>
      <div class="section" id="installing-beautiful-soup">
       <h1>
        Installing Beautiful Soup
        <a class="headerlink" href="#installing-beautiful-soup" title="Permalink to this headline">
         ¶
        </a>
       </h1>
       <p>
        If you’re using a recent version of Debian or Ubuntu Linux, you can
install Beautiful Soup with the system package manager:
       </p>
       <p>
        <code class="kbd docutils literal">
         <span class="pre">
          $
         </span>
         <span class="pre">
          apt-get
         </span>
         <span class="pre">
          install
         </span>
         <span class="pre">
          python-bs4
         </span>
        </code>
        (for Python 2)
       </p>
       <p>
        <code class="kbd docutils literal">
         <span class="pre">
          $
         </span>
         <span class="pre">
          apt-get
         </span>
         <span class="pre">
          install
         </span>
         <span class="pre">
          python3-bs4
         </span>
        </code>
        (for Python 3)
       </p>
       <p>
        Beautiful Soup 4 is published through PyPi, so if you can’t install it
with the system packager, you can install it with
        <code class="docutils literal">
         <span class="pre">
          easy_install
         </span>
        </code>
        or
        <code class="docutils literal">
         <span class="pre">
          pip
         </span>
        </code>
        . The package name is
        <code class="docutils literal">
         <span class="pre">
          beautifulsoup4
         </span>
        </code>
        , and the same package
works on Python 2 and Python 3. Make sure you use the right version of
        <code class="docutils literal">
         <span class="pre">
          pip
         </span>
        </code>
        or
        <code class="docutils literal">
         <span class="pre">
          easy_install
         </span>
        </code>
        for your Python version (these may be named
        <code class="docutils literal">
         <span class="pre">
          pip3
         </span>
        </code>
        and
        <code class="docutils literal">
         <span class="pre">
          easy_install3
         </span>
        </code>
        respectively if you’re using Python 3).
       </p>
       <p>
        <code class="kbd docutils literal">
         <span class="pre">
          $
         </span>
         <span class="pre">
          easy_install
         </span>
         <span class="pre">
          beautifulsoup4
         </span>
        </code>
       </p>
       <p>
        <code class="kbd docutils literal">
         <span class="pre">
          $
         </span>
         <span class="pre">
          pip
         </span>
         <span class="pre">
          install
         </span>
         <span class="pre">
          beautifulsoup4
         </span>
        </code>
       </p>
       <p>
        (The
        <code class="docutils literal">
         <span class="pre">
          BeautifulSoup
         </span>
        </code>
        package is probably
        <cite>
         not
        </cite>
        what you want. That’s
the previous major release,
        <a class="reference external" href="http://www.crummy.com/software/BeautifulSoup/bs3/documentation.html">
         Beautiful Soup 3
        </a>
        . Lots of software uses
BS3, so it’s still available, but if you’re writing new code you
should install
        <code class="docutils literal">
         <span class="pre">
          beautifulsoup4
         </span>
        </code>
        .)
       </p>
       <p>
        If you don’t have
        <code class="docutils literal">
         <span class="pre">
          easy_install
         </span>
        </code>
        or
        <code class="docutils literal">
         <span class="pre">
          pip
         </span>
        </code>
        installed, you can
        <a class="reference external" href="http://www.crummy.com/software/BeautifulSoup/download/4.x/">
         download the Beautiful Soup 4 source tarball
        </a>
        and
install it with
        <code class="docutils literal">
         <span class="pre">
          setup.py
         </span>
        </code>
        .
       </p>
       <p>
        <code class="kbd docutils literal">
         <span class="pre">
          $
         </span>
         <span class="pre">
          python
         </span>
         <span class="pre">
          setup.py
         </span>
         <span class="pre">
          install
         </span>
        </code>
       </p>
       <p>
        If all else fails, the license for Beautiful Soup allows you to
package the entire library with your application. You can download the
tarball, copy its
        <code class="docutils literal">
         <span class="pre">
          bs4
         </span>
        </code>
        directory into your application’s codebase,
and use Beautiful Soup without installing it at all.
       </p>
       <p>
        I use Python 2.7 and Python 3.2 to develop Beautiful Soup, but it
should work with other recent versions.
       </p>
       <div class="section" id="problems-after-installation">
        <h2>
         Problems after installation
         <a class="headerlink" href="#problems-after-installation" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         Beautiful Soup is packaged as Python 2 code. When you install it for
use with Python 3, it’s automatically converted to Python 3 code. If
you don’t install the package, the code won’t be converted. There have
also been reports on Windows machines of the wrong version being
installed.
        </p>
        <p>
         If you get the
         <code class="docutils literal">
          <span class="pre">
           ImportError
          </span>
         </code>
         “No module named HTMLParser”, your
problem is that you’re running the Python 2 version of the code under
Python 3.
        </p>
        <p>
         If you get the
         <code class="docutils literal">
          <span class="pre">
           ImportError
          </span>
         </code>
         “No module named html.parser”, your
problem is that you’re running the Python 3 version of the code under
Python 2.
        </p>
        <p>
         In both cases, your best bet is to completely remove the Beautiful
Soup installation from your system (including any directory created
when you unzipped the tarball) and try the installation again.
        </p>
        <p>
         If you get the
         <code class="docutils literal">
          <span class="pre">
           SyntaxError
          </span>
         </code>
         “Invalid syntax” on the line
         <code class="docutils literal">
          <span class="pre">
           ROOT_TAG_NAME
          </span>
          <span class="pre">
           =
          </span>
          <span class="pre">
           u'[document]'
          </span>
         </code>
         , you need to convert the Python 2
code to Python 3. You can do this either by installing the package:
        </p>
        <p>
         <code class="kbd docutils literal">
          <span class="pre">
           $
          </span>
          <span class="pre">
           python3
          </span>
          <span class="pre">
           setup.py
          </span>
          <span class="pre">
           install
          </span>
         </code>
        </p>
        <p>
         or by manually running Python’s
         <code class="docutils literal">
          <span class="pre">
           2to3
          </span>
         </code>
         conversion script on the
         <code class="docutils literal">
          <span class="pre">
           bs4
          </span>
         </code>
         directory:
        </p>
        <p>
         <code class="kbd docutils literal">
          <span class="pre">
           $
          </span>
          <span class="pre">
           2to3-3.2
          </span>
          <span class="pre">
           -w
          </span>
          <span class="pre">
           bs4
          </span>
         </code>
        </p>
       </div>
       <div class="section" id="installing-a-parser">
        <span id="parser-installation">
        </span>
        <h2>
         Installing a parser
         <a class="headerlink" href="#installing-a-parser" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         Beautiful Soup supports the HTML parser included in Python’s standard
library, but it also supports a number of third-party Python parsers.
One is the
         <a class="reference external" href="http://lxml.de/">
          lxml parser
         </a>
         . Depending on your setup,
you might install lxml with one of these commands:
        </p>
        <p>
         <code class="kbd docutils literal">
          <span class="pre">
           $
          </span>
          <span class="pre">
           apt-get
          </span>
          <span class="pre">
           install
          </span>
          <span class="pre">
           python-lxml
          </span>
         </code>
        </p>
        <p>
         <code class="kbd docutils literal">
          <span class="pre">
           $
          </span>
          <span class="pre">
           easy_install
          </span>
          <span class="pre">
           lxml
          </span>
         </code>
        </p>
        <p>
         <code class="kbd docutils literal">
          <span class="pre">
           $
          </span>
          <span class="pre">
           pip
          </span>
          <span class="pre">
           install
          </span>
          <span class="pre">
           lxml
          </span>
         </code>
        </p>
        <p>
         Another alternative is the pure-Python
         <a class="reference external" href="http://code.google.com/p/html5lib/">
          html5lib parser
         </a>
         , which parses HTML the way a
web browser does. Depending on your setup, you might install html5lib
with one of these commands:
        </p>
        <p>
         <code class="kbd docutils literal">
          <span class="pre">
           $
          </span>
          <span class="pre">
           apt-get
          </span>
          <span class="pre">
           install
          </span>
          <span class="pre">
           python-html5lib
          </span>
         </code>
        </p>
        <p>
         <code class="kbd docutils literal">
          <span class="pre">
           $
          </span>
          <span class="pre">
           easy_install
          </span>
          <span class="pre">
           html5lib
          </span>
         </code>
        </p>
        <p>
         <code class="kbd docutils literal">
          <span class="pre">
           $
          </span>
          <span class="pre">
           pip
          </span>
          <span class="pre">
           install
          </span>
          <span class="pre">
           html5lib
          </span>
         </code>
        </p>
        <p>
         This table summarizes the advantages and disadvantages of each parser library:
        </p>
        <table border="1" class="docutils">
         <colgroup>
          <col width="18%"/>
          <col width="35%"/>
          <col width="26%"/>
          <col width="21%"/>
         </colgroup>
         <tbody valign="top">
          <tr class="row-odd">
           <td>
            Parser
           </td>
           <td>
            Typical usage
           </td>
           <td>
            Advantages
           </td>
           <td>
            Disadvantages
           </td>
          </tr>
          <tr class="row-even">
           <td>
            Python’s html.parser
           </td>
           <td>
            <code class="docutils literal">
             <span class="pre">
              BeautifulSoup(markup,
             </span>
             <span class="pre">
              "html.parser")
             </span>
            </code>
           </td>
           <td>
            <ul class="first last simple">
             <li>
              Batteries included
             </li>
             <li>
              Decent speed
             </li>
             <li>
              Lenient (as of Python 2.7.3
and 3.2.)
             </li>
            </ul>
           </td>
           <td>
            <ul class="first last simple">
             <li>
              Not very lenient
(before Python 2.7.3
or 3.2.2)
             </li>
            </ul>
           </td>
          </tr>
          <tr class="row-odd">
           <td>
            lxml’s HTML parser
           </td>
           <td>
            <code class="docutils literal">
             <span class="pre">
              BeautifulSoup(markup,
             </span>
             <span class="pre">
              "lxml")
             </span>
            </code>
           </td>
           <td>
            <ul class="first last simple">
             <li>
              Very fast
             </li>
             <li>
              Lenient
             </li>
            </ul>
           </td>
           <td>
            <ul class="first last simple">
             <li>
              External C dependency
             </li>
            </ul>
           </td>
          </tr>
          <tr class="row-even">
           <td>
            lxml’s XML parser
           </td>
           <td>
            <code class="docutils literal">
             <span class="pre">
              BeautifulSoup(markup,
             </span>
             <span class="pre">
              "lxml-xml")
             </span>
            </code>
            <code class="docutils literal">
             <span class="pre">
              BeautifulSoup(markup,
             </span>
             <span class="pre">
              "xml")
             </span>
            </code>
           </td>
           <td>
            <ul class="first last simple">
             <li>
              Very fast
             </li>
             <li>
              The only currently supported
XML parser
             </li>
            </ul>
           </td>
           <td>
            <ul class="first last simple">
             <li>
              External C dependency
             </li>
            </ul>
           </td>
          </tr>
          <tr class="row-odd">
           <td>
            html5lib
           </td>
           <td>
            <code class="docutils literal">
             <span class="pre">
              BeautifulSoup(markup,
             </span>
             <span class="pre">
              "html5lib")
             </span>
            </code>
           </td>
           <td>
            <ul class="first last simple">
             <li>
              Extremely lenient
             </li>
             <li>
              Parses pages the same way a
web browser does
             </li>
             <li>
              Creates valid HTML5
             </li>
            </ul>
           </td>
           <td>
            <ul class="first last simple">
             <li>
              Very slow
             </li>
             <li>
              External Python
dependency
             </li>
            </ul>
           </td>
          </tr>
         </tbody>
        </table>
        <p>
         If you can, I recommend you install and use lxml for speed. If you’re
using a version of Python 2 earlier than 2.7.3, or a version of Python
3 earlier than 3.2.2, it’s
         <cite>
          essential
         </cite>
         that you install lxml or
html5lib–Python’s built-in HTML parser is just not very good in older
versions.
        </p>
        <p>
         Note that if a document is invalid, different parsers will generate
different Beautiful Soup trees for it. See
         <a class="reference internal" href="#differences-between-parsers">
          Differences
between parsers
         </a>
         for details.
        </p>
       </div>
      </div>
      <div class="section" id="making-the-soup">
       <h1>
        Making the soup
        <a class="headerlink" href="#making-the-soup" title="Permalink to this headline">
         ¶
        </a>
       </h1>
       <p>
        To parse a document, pass it into the
        <code class="docutils literal">
         <span class="pre">
          BeautifulSoup
         </span>
        </code>
        constructor. You can pass in a string or an open filehandle:
       </p>
       <div class="highlight-python">
        <div class="highlight">
         <pre><span class="kn">from</span> <span class="nn">bs4</span> <span class="kn">import</span> <span class="n">BeautifulSoup</span>

<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s2">"index.html"</span><span class="p">)</span> <span class="k">as</span> <span class="n">fp</span><span class="p">:</span>
    <span class="n">soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="n">fp</span><span class="p">)</span>

<span class="n">soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="s2">"&lt;html&gt;data&lt;/html&gt;"</span><span class="p">)</span>
</pre>
        </div>
       </div>
       <p>
        First, the document is converted to Unicode, and HTML entities are
converted to Unicode characters:
       </p>
       <div class="highlight-python">
        <div class="highlight">
         <pre>BeautifulSoup("Sacr&amp;eacute; bleu!")
&lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;Sacré bleu!&lt;/body&gt;&lt;/html&gt;
</pre>
        </div>
       </div>
       <p>
        Beautiful Soup then parses the document using the best available
parser. It will use an HTML parser unless you specifically tell it to
use an XML parser. (See
        <a class="reference internal" href="#id16">
         Parsing XML
        </a>
        .)
       </p>
      </div>
      <div class="section" id="kinds-of-objects">
       <h1>
        Kinds of objects
        <a class="headerlink" href="#kinds-of-objects" title="Permalink to this headline">
         ¶
        </a>
       </h1>
       <p>
        Beautiful Soup transforms a complex HTML document into a complex tree
of Python objects. But you’ll only ever have to deal with about four
        <cite>
         kinds
        </cite>
        of objects:
        <code class="docutils literal">
         <span class="pre">
          Tag
         </span>
        </code>
        ,
        <code class="docutils literal">
         <span class="pre">
          NavigableString
         </span>
        </code>
        ,
        <code class="docutils literal">
         <span class="pre">
          BeautifulSoup
         </span>
        </code>
        ,
and
        <code class="docutils literal">
         <span class="pre">
          Comment
         </span>
        </code>
        .
       </p>
       <div class="section" id="tag">
        <span id="id4">
        </span>
        <h2>
         <code class="docutils literal">
          <span class="pre">
           Tag
          </span>
         </code>
         <a class="headerlink" href="#tag" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         A
         <code class="docutils literal">
          <span class="pre">
           Tag
          </span>
         </code>
         object corresponds to an XML or HTML tag in the original document:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="s1">'&lt;b class="boldest"&gt;Extremely bold&lt;/b&gt;'</span><span class="p">)</span>
<span class="n">tag</span> <span class="o">=</span> <span class="n">soup</span><span class="o">.</span><span class="n">b</span>
<span class="nb">type</span><span class="p">(</span><span class="n">tag</span><span class="p">)</span>
<span class="c1"># &lt;class 'bs4.element.Tag'&gt;</span>
</pre>
         </div>
        </div>
        <p>
         Tags have a lot of attributes and methods, and I’ll cover most of them
in
         <a class="reference internal" href="#navigating-the-tree">
          Navigating the tree
         </a>
         and
         <a class="reference internal" href="#searching-the-tree">
          Searching the tree
         </a>
         . For now, the most
important features of a tag are its name and attributes.
        </p>
        <div class="section" id="name">
         <h3>
          Name
          <a class="headerlink" href="#name" title="Permalink to this headline">
           ¶
          </a>
         </h3>
         <p>
          Every tag has a name, accessible as
          <code class="docutils literal">
           <span class="pre">
            .name
           </span>
          </code>
          :
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">tag</span><span class="o">.</span><span class="n">name</span>
<span class="c1"># u'b'</span>
</pre>
          </div>
         </div>
         <p>
          If you change a tag’s name, the change will be reflected in any HTML
markup generated by Beautiful Soup:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">tag</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="s2">"blockquote"</span>
<span class="n">tag</span>
<span class="c1"># &lt;blockquote class="boldest"&gt;Extremely bold&lt;/blockquote&gt;</span>
</pre>
          </div>
         </div>
        </div>
        <div class="section" id="attributes">
         <h3>
          Attributes
          <a class="headerlink" href="#attributes" title="Permalink to this headline">
           ¶
          </a>
         </h3>
         <p>
          A tag may have any number of attributes. The tag
          <code class="docutils literal">
           <span class="pre">
            &lt;b
           </span>
           <span class="pre">
            id="boldest"&gt;
           </span>
          </code>
          has an attribute “id” whose value is
“boldest”. You can access a tag’s attributes by treating the tag like
a dictionary:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">tag</span><span class="p">[</span><span class="s1">'id'</span><span class="p">]</span>
<span class="c1"># u'boldest'</span>
</pre>
          </div>
         </div>
         <p>
          You can access that dictionary directly as
          <code class="docutils literal">
           <span class="pre">
            .attrs
           </span>
          </code>
          :
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">tag</span><span class="o">.</span><span class="n">attrs</span>
<span class="c1"># {u'id': 'boldest'}</span>
</pre>
          </div>
         </div>
         <p>
          You can add, remove, and modify a tag’s attributes. Again, this is
done by treating the tag as a dictionary:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">tag</span><span class="p">[</span><span class="s1">'id'</span><span class="p">]</span> <span class="o">=</span> <span class="s1">'verybold'</span>
<span class="n">tag</span><span class="p">[</span><span class="s1">'another-attribute'</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span>
<span class="n">tag</span>
<span class="c1"># &lt;b another-attribute="1" id="verybold"&gt;&lt;/b&gt;</span>

<span class="k">del</span> <span class="n">tag</span><span class="p">[</span><span class="s1">'id'</span><span class="p">]</span>
<span class="k">del</span> <span class="n">tag</span><span class="p">[</span><span class="s1">'another-attribute'</span><span class="p">]</span>
<span class="n">tag</span>
<span class="c1"># &lt;b&gt;&lt;/b&gt;</span>

<span class="n">tag</span><span class="p">[</span><span class="s1">'id'</span><span class="p">]</span>
<span class="c1"># KeyError: 'id'</span>
<span class="k">print</span><span class="p">(</span><span class="n">tag</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'id'</span><span class="p">))</span>
<span class="c1"># None</span>
</pre>
          </div>
         </div>
         <div class="section" id="multi-valued-attributes">
          <span id="multivalue">
          </span>
          <h4>
           Multi-valued attributes
           <a class="headerlink" href="#multi-valued-attributes" title="Permalink to this headline">
            ¶
           </a>
          </h4>
          <p>
           HTML 4 defines a few attributes that can have multiple values. HTML 5
removes a couple of them, but defines a few more. The most common
multi-valued attribute is
           <code class="docutils literal">
            <span class="pre">
             class
            </span>
           </code>
           (that is, a tag can have more than
one CSS class). Others include
           <code class="docutils literal">
            <span class="pre">
             rel
            </span>
           </code>
           ,
           <code class="docutils literal">
            <span class="pre">
             rev
            </span>
           </code>
           ,
           <code class="docutils literal">
            <span class="pre">
             accept-charset
            </span>
           </code>
           ,
           <code class="docutils literal">
            <span class="pre">
             headers
            </span>
           </code>
           , and
           <code class="docutils literal">
            <span class="pre">
             accesskey
            </span>
           </code>
           . Beautiful Soup presents the value(s)
of a multi-valued attribute as a list:
          </p>
          <div class="highlight-python">
           <div class="highlight">
            <pre><span class="n">css_soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="s1">'&lt;p class="body"&gt;&lt;/p&gt;'</span><span class="p">)</span>
<span class="n">css_soup</span><span class="o">.</span><span class="n">p</span><span class="p">[</span><span class="s1">'class'</span><span class="p">]</span>
<span class="c1"># ["body"]</span>

<span class="n">css_soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="s1">'&lt;p class="body strikeout"&gt;&lt;/p&gt;'</span><span class="p">)</span>
<span class="n">css_soup</span><span class="o">.</span><span class="n">p</span><span class="p">[</span><span class="s1">'class'</span><span class="p">]</span>
<span class="c1"># ["body", "strikeout"]</span>
</pre>
           </div>
          </div>
          <p>
           If an attribute
           <cite>
            looks
           </cite>
           like it has more than one value, but it’s not
a multi-valued attribute as defined by any version of the HTML
standard, Beautiful Soup will leave the attribute alone:
          </p>
          <div class="highlight-python">
           <div class="highlight">
            <pre><span class="n">id_soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="s1">'&lt;p id="my id"&gt;&lt;/p&gt;'</span><span class="p">)</span>
<span class="n">id_soup</span><span class="o">.</span><span class="n">p</span><span class="p">[</span><span class="s1">'id'</span><span class="p">]</span>
<span class="c1"># 'my id'</span>
</pre>
           </div>
          </div>
          <p>
           When you turn a tag back into a string, multiple attribute values are
consolidated:
          </p>
          <div class="highlight-python">
           <div class="highlight">
            <pre><span class="n">rel_soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="s1">'&lt;p&gt;Back to the &lt;a rel="index"&gt;homepage&lt;/a&gt;&lt;/p&gt;'</span><span class="p">)</span>
<span class="n">rel_soup</span><span class="o">.</span><span class="n">a</span><span class="p">[</span><span class="s1">'rel'</span><span class="p">]</span>
<span class="c1"># ['index']</span>
<span class="n">rel_soup</span><span class="o">.</span><span class="n">a</span><span class="p">[</span><span class="s1">'rel'</span><span class="p">]</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'index'</span><span class="p">,</span> <span class="s1">'contents'</span><span class="p">]</span>
<span class="k">print</span><span class="p">(</span><span class="n">rel_soup</span><span class="o">.</span><span class="n">p</span><span class="p">)</span>
<span class="c1"># &lt;p&gt;Back to the &lt;a rel="index contents"&gt;homepage&lt;/a&gt;&lt;/p&gt;</span>
</pre>
           </div>
          </div>
          <p>
           You can use
           <code class="docutils literal">
            <span class="pre">
             `get_attribute_list
            </span>
           </code>
           to get a value that’s always a list,
string, whether or not it’s a multi-valued atribute
          </p>
          <blockquote>
           <div>
            id_soup.p.get_attribute_list(‘id’)
# [“my id”]
           </div>
          </blockquote>
          <p>
           If you parse a document as XML, there are no multi-valued attributes:
          </p>
          <div class="highlight-python">
           <div class="highlight">
            <pre><span class="n">xml_soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="s1">'&lt;p class="body strikeout"&gt;&lt;/p&gt;'</span><span class="p">,</span> <span class="s1">'xml'</span><span class="p">)</span>
<span class="n">xml_soup</span><span class="o">.</span><span class="n">p</span><span class="p">[</span><span class="s1">'class'</span><span class="p">]</span>
<span class="c1"># u'body strikeout'</span>
</pre>
           </div>
          </div>
         </div>
        </div>
       </div>
       <div class="section" id="navigablestring">
        <h2>
         <code class="docutils literal">
          <span class="pre">
           NavigableString
          </span>
         </code>
         <a class="headerlink" href="#navigablestring" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         A string corresponds to a bit of text within a tag. Beautiful Soup
uses the
         <code class="docutils literal">
          <span class="pre">
           NavigableString
          </span>
         </code>
         class to contain these bits of text:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">tag</span><span class="o">.</span><span class="n">string</span>
<span class="c1"># u'Extremely bold'</span>
<span class="nb">type</span><span class="p">(</span><span class="n">tag</span><span class="o">.</span><span class="n">string</span><span class="p">)</span>
<span class="c1"># &lt;class 'bs4.element.NavigableString'&gt;</span>
</pre>
         </div>
        </div>
        <p>
         A
         <code class="docutils literal">
          <span class="pre">
           NavigableString
          </span>
         </code>
         is just like a Python Unicode string, except
that it also supports some of the features described in
         <a class="reference internal" href="#navigating-the-tree">
          Navigating
the tree
         </a>
         and
         <a class="reference internal" href="#searching-the-tree">
          Searching the tree
         </a>
         . You can convert a
         <code class="docutils literal">
          <span class="pre">
           NavigableString
          </span>
         </code>
         to a Unicode string with
         <code class="docutils literal">
          <span class="pre">
           unicode()
          </span>
         </code>
         :
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">unicode_string</span> <span class="o">=</span> <span class="nb">unicode</span><span class="p">(</span><span class="n">tag</span><span class="o">.</span><span class="n">string</span><span class="p">)</span>
<span class="n">unicode_string</span>
<span class="c1"># u'Extremely bold'</span>
<span class="nb">type</span><span class="p">(</span><span class="n">unicode_string</span><span class="p">)</span>
<span class="c1"># &lt;type 'unicode'&gt;</span>
</pre>
         </div>
        </div>
        <p>
         You can’t edit a string in place, but you can replace one string with
another, using
         <a class="reference internal" href="#replace-with">
          <span>
           replace_with()
          </span>
         </a>
         :
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">tag</span><span class="o">.</span><span class="n">string</span><span class="o">.</span><span class="n">replace_with</span><span class="p">(</span><span class="s2">"No longer bold"</span><span class="p">)</span>
<span class="n">tag</span>
<span class="c1"># &lt;blockquote&gt;No longer bold&lt;/blockquote&gt;</span>
</pre>
         </div>
        </div>
        <p>
         <code class="docutils literal">
          <span class="pre">
           NavigableString
          </span>
         </code>
         supports most of the features described in
         <a class="reference internal" href="#navigating-the-tree">
          Navigating the tree
         </a>
         and
         <a class="reference internal" href="#searching-the-tree">
          Searching the tree
         </a>
         , but not all of
them. In particular, since a string can’t contain anything (the way a
tag may contain a string or another tag), strings don’t support the
         <code class="docutils literal">
          <span class="pre">
           .contents
          </span>
         </code>
         or
         <code class="docutils literal">
          <span class="pre">
           .string
          </span>
         </code>
         attributes, or the
         <code class="docutils literal">
          <span class="pre">
           find()
          </span>
         </code>
         method.
        </p>
        <p>
         If you want to use a
         <code class="docutils literal">
          <span class="pre">
           NavigableString
          </span>
         </code>
         outside of Beautiful Soup,
you should call
         <code class="docutils literal">
          <span class="pre">
           unicode()
          </span>
         </code>
         on it to turn it into a normal Python
Unicode string. If you don’t, your string will carry around a
reference to the entire Beautiful Soup parse tree, even when you’re
done using Beautiful Soup. This is a big waste of memory.
        </p>
       </div>
       <div class="section" id="beautifulsoup">
        <h2>
         <code class="docutils literal">
          <span class="pre">
           BeautifulSoup
          </span>
         </code>
         <a class="headerlink" href="#beautifulsoup" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         The
         <code class="docutils literal">
          <span class="pre">
           BeautifulSoup
          </span>
         </code>
         object itself represents the document as a
whole. For most purposes, you can treat it as a
         <a class="reference internal" href="#tag">
          <span>
           Tag
          </span>
         </a>
         object. This means it supports most of the methods described in
         <a class="reference internal" href="#navigating-the-tree">
          Navigating the tree
         </a>
         and
         <a class="reference internal" href="#searching-the-tree">
          Searching the tree
         </a>
         .
        </p>
        <p>
         Since the
         <code class="docutils literal">
          <span class="pre">
           BeautifulSoup
          </span>
         </code>
         object doesn’t correspond to an actual
HTML or XML tag, it has no name and no attributes. But sometimes it’s
useful to look at its
         <code class="docutils literal">
          <span class="pre">
           .name
          </span>
         </code>
         , so it’s been given the special
         <code class="docutils literal">
          <span class="pre">
           .name
          </span>
         </code>
         “[document]”:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">soup</span><span class="o">.</span><span class="n">name</span>
<span class="c1"># u'[document]'</span>
</pre>
         </div>
        </div>
       </div>
       <div class="section" id="comments-and-other-special-strings">
        <h2>
         Comments and other special strings
         <a class="headerlink" href="#comments-and-other-special-strings" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         <code class="docutils literal">
          <span class="pre">
           Tag
          </span>
         </code>
         ,
         <code class="docutils literal">
          <span class="pre">
           NavigableString
          </span>
         </code>
         , and
         <code class="docutils literal">
          <span class="pre">
           BeautifulSoup
          </span>
         </code>
         cover almost
everything you’ll see in an HTML or XML file, but there are a few
leftover bits. The only one you’ll probably ever need to worry about
is the comment:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">markup</span> <span class="o">=</span> <span class="s2">"&lt;b&gt;&lt;!--Hey, buddy. Want to buy a used parser?--&gt;&lt;/b&gt;"</span>
<span class="n">soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="n">markup</span><span class="p">)</span>
<span class="n">comment</span> <span class="o">=</span> <span class="n">soup</span><span class="o">.</span><span class="n">b</span><span class="o">.</span><span class="n">string</span>
<span class="nb">type</span><span class="p">(</span><span class="n">comment</span><span class="p">)</span>
<span class="c1"># &lt;class 'bs4.element.Comment'&gt;</span>
</pre>
         </div>
        </div>
        <p>
         The
         <code class="docutils literal">
          <span class="pre">
           Comment
          </span>
         </code>
         object is just a special type of
         <code class="docutils literal">
          <span class="pre">
           NavigableString
          </span>
         </code>
         :
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">comment</span>
<span class="c1"># u'Hey, buddy. Want to buy a used parser'</span>
</pre>
         </div>
        </div>
        <p>
         But when it appears as part of an HTML document, a
         <code class="docutils literal">
          <span class="pre">
           Comment
          </span>
         </code>
         is
displayed with special formatting:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="k">print</span><span class="p">(</span><span class="n">soup</span><span class="o">.</span><span class="n">b</span><span class="o">.</span><span class="n">prettify</span><span class="p">())</span>
<span class="c1"># &lt;b&gt;</span>
<span class="c1">#  &lt;!--Hey, buddy. Want to buy a used parser?--&gt;</span>
<span class="c1"># &lt;/b&gt;</span>
</pre>
         </div>
        </div>
        <p>
         Beautiful Soup defines classes for anything else that might show up in
an XML document:
         <code class="docutils literal">
          <span class="pre">
           CData
          </span>
         </code>
         ,
         <code class="docutils literal">
          <span class="pre">
           ProcessingInstruction
          </span>
         </code>
         ,
         <code class="docutils literal">
          <span class="pre">
           Declaration
          </span>
         </code>
         , and
         <code class="docutils literal">
          <span class="pre">
           Doctype
          </span>
         </code>
         . Just like
         <code class="docutils literal">
          <span class="pre">
           Comment
          </span>
         </code>
         , these classes
are subclasses of
         <code class="docutils literal">
          <span class="pre">
           NavigableString
          </span>
         </code>
         that add something extra to the
string. Here’s an example that replaces the comment with a CDATA
block:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="kn">from</span> <span class="nn">bs4</span> <span class="kn">import</span> <span class="n">CData</span>
<span class="n">cdata</span> <span class="o">=</span> <span class="n">CData</span><span class="p">(</span><span class="s2">"A CDATA block"</span><span class="p">)</span>
<span class="n">comment</span><span class="o">.</span><span class="n">replace_with</span><span class="p">(</span><span class="n">cdata</span><span class="p">)</span>

<span class="k">print</span><span class="p">(</span><span class="n">soup</span><span class="o">.</span><span class="n">b</span><span class="o">.</span><span class="n">prettify</span><span class="p">())</span>
<span class="c1"># &lt;b&gt;</span>
<span class="c1">#  &lt;![CDATA[A CDATA block]]&gt;</span>
<span class="c1"># &lt;/b&gt;</span>
</pre>
         </div>
        </div>
       </div>
      </div>
      <div class="section" id="navigating-the-tree">
       <h1>
        Navigating the tree
        <a class="headerlink" href="#navigating-the-tree" title="Permalink to this headline">
         ¶
        </a>
       </h1>
       <p>
        Here’s the “Three sisters” HTML document again:
       </p>
       <div class="highlight-python">
        <div class="highlight">
         <pre><span class="n">html_doc</span> <span class="o">=</span> <span class="s2">"""</span>
<span class="s2">&lt;html&gt;&lt;head&gt;&lt;title&gt;The Dormouse's story&lt;/title&gt;&lt;/head&gt;</span>
<span class="s2">&lt;body&gt;</span>
<span class="s2">&lt;p class="title"&gt;&lt;b&gt;The Dormouse's story&lt;/b&gt;&lt;/p&gt;</span>

<span class="s2">&lt;p class="story"&gt;Once upon a time there were three little sisters; and their names were</span>
<span class="s2">&lt;a href="http://example.com/elsie" class="sister" id="link1"&gt;Elsie&lt;/a&gt;,</span>
<span class="s2">&lt;a href="http://example.com/lacie" class="sister" id="link2"&gt;Lacie&lt;/a&gt; and</span>
<span class="s2">&lt;a href="http://example.com/tillie" class="sister" id="link3"&gt;Tillie&lt;/a&gt;;</span>
<span class="s2">and they lived at the bottom of a well.&lt;/p&gt;</span>

<span class="s2">&lt;p class="story"&gt;...&lt;/p&gt;</span>
<span class="s2">"""</span>

<span class="kn">from</span> <span class="nn">bs4</span> <span class="kn">import</span> <span class="n">BeautifulSoup</span>
<span class="n">soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="n">html_doc</span><span class="p">,</span> <span class="s1">'html.parser'</span><span class="p">)</span>
</pre>
        </div>
       </div>
       <p>
        I’ll use this as an example to show you how to move from one part of
a document to another.
       </p>
       <div class="section" id="going-down">
        <h2>
         Going down
         <a class="headerlink" href="#going-down" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         Tags may contain strings and other tags. These elements are the tag’s
         <cite>
          children
         </cite>
         . Beautiful Soup provides a lot of different attributes for
navigating and iterating over a tag’s children.
        </p>
        <p>
         Note that Beautiful Soup strings don’t support any of these
attributes, because a string can’t have children.
        </p>
        <div class="section" id="navigating-using-tag-names">
         <h3>
          Navigating using tag names
          <a class="headerlink" href="#navigating-using-tag-names" title="Permalink to this headline">
           ¶
          </a>
         </h3>
         <p>
          The simplest way to navigate the parse tree is to say the name of the
tag you want. If you want the &lt;head&gt; tag, just say
          <code class="docutils literal">
           <span class="pre">
            soup.head
           </span>
          </code>
          :
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">soup</span><span class="o">.</span><span class="n">head</span>
<span class="c1"># &lt;head&gt;&lt;title&gt;The Dormouse's story&lt;/title&gt;&lt;/head&gt;</span>

<span class="n">soup</span><span class="o">.</span><span class="n">title</span>
<span class="c1"># &lt;title&gt;The Dormouse's story&lt;/title&gt;</span>
</pre>
          </div>
         </div>
         <p>
          You can do use this trick again and again to zoom in on a certain part
of the parse tree. This code gets the first &lt;b&gt; tag beneath the &lt;body&gt; tag:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">soup</span><span class="o">.</span><span class="n">body</span><span class="o">.</span><span class="n">b</span>
<span class="c1"># &lt;b&gt;The Dormouse's story&lt;/b&gt;</span>
</pre>
          </div>
         </div>
         <p>
          Using a tag name as an attribute will give you only the
          <cite>
           first
          </cite>
          tag by that
name:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">soup</span><span class="o">.</span><span class="n">a</span>
<span class="c1"># &lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;Elsie&lt;/a&gt;</span>
</pre>
          </div>
         </div>
         <p>
          If you need to get
          <cite>
           all
          </cite>
          the &lt;a&gt; tags, or anything more complicated
than the first tag with a certain name, you’ll need to use one of the
methods described in
          <a class="reference internal" href="#searching-the-tree">
           Searching the tree
          </a>
          , such as
          <cite>
           find_all()
          </cite>
          :
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="s1">'a'</span><span class="p">)</span>
<span class="c1"># [&lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;Elsie&lt;/a&gt;,</span>
<span class="c1">#  &lt;a class="sister" href="http://example.com/lacie" id="link2"&gt;Lacie&lt;/a&gt;,</span>
<span class="c1">#  &lt;a class="sister" href="http://example.com/tillie" id="link3"&gt;Tillie&lt;/a&gt;]</span>
</pre>
          </div>
         </div>
        </div>
        <div class="section" id="contents-and-children">
         <h3>
          <code class="docutils literal">
           <span class="pre">
            .contents
           </span>
          </code>
          and
          <code class="docutils literal">
           <span class="pre">
            .children
           </span>
          </code>
          <a class="headerlink" href="#contents-and-children" title="Permalink to this headline">
           ¶
          </a>
         </h3>
         <p>
          A tag’s children are available in a list called
          <code class="docutils literal">
           <span class="pre">
            .contents
           </span>
          </code>
          :
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre>head_tag = soup.head
head_tag
# &lt;head&gt;&lt;title&gt;The Dormouse's story&lt;/title&gt;&lt;/head&gt;

head_tag.contents
[&lt;title&gt;The Dormouse's story&lt;/title&gt;]

title_tag = head_tag.contents[0]
title_tag
# &lt;title&gt;The Dormouse's story&lt;/title&gt;
title_tag.contents
# [u'The Dormouse's story']
</pre>
          </div>
         </div>
         <p>
          The
          <code class="docutils literal">
           <span class="pre">
            BeautifulSoup
           </span>
          </code>
          object itself has children. In this case, the
&lt;html&gt; tag is the child of the
          <code class="docutils literal">
           <span class="pre">
            BeautifulSoup
           </span>
          </code>
          object.:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="nb">len</span><span class="p">(</span><span class="n">soup</span><span class="o">.</span><span class="n">contents</span><span class="p">)</span>
<span class="c1"># 1</span>
<span class="n">soup</span><span class="o">.</span><span class="n">contents</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">name</span>
<span class="c1"># u'html'</span>
</pre>
          </div>
         </div>
         <p>
          A string does not have
          <code class="docutils literal">
           <span class="pre">
            .contents
           </span>
          </code>
          , because it can’t contain
anything:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">text</span> <span class="o">=</span> <span class="n">title_tag</span><span class="o">.</span><span class="n">contents</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">text</span><span class="o">.</span><span class="n">contents</span>
<span class="c1"># AttributeError: 'NavigableString' object has no attribute 'contents'</span>
</pre>
          </div>
         </div>
         <p>
          Instead of getting them as a list, you can iterate over a tag’s
children using the
          <code class="docutils literal">
           <span class="pre">
            .children
           </span>
          </code>
          generator:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="k">for</span> <span class="n">child</span> <span class="ow">in</span> <span class="n">title_tag</span><span class="o">.</span><span class="n">children</span><span class="p">:</span>
    <span class="k">print</span><span class="p">(</span><span class="n">child</span><span class="p">)</span>
<span class="c1"># The Dormouse's story</span>
</pre>
          </div>
         </div>
        </div>
        <div class="section" id="descendants">
         <h3>
          <code class="docutils literal">
           <span class="pre">
            .descendants
           </span>
          </code>
          <a class="headerlink" href="#descendants" title="Permalink to this headline">
           ¶
          </a>
         </h3>
         <p>
          The
          <code class="docutils literal">
           <span class="pre">
            .contents
           </span>
          </code>
          and
          <code class="docutils literal">
           <span class="pre">
            .children
           </span>
          </code>
          attributes only consider a tag’s
          <cite>
           direct
          </cite>
          children. For instance, the &lt;head&gt; tag has a single direct
child–the &lt;title&gt; tag:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">head_tag</span><span class="o">.</span><span class="n">contents</span>
<span class="c1"># [&lt;title&gt;The Dormouse's story&lt;/title&gt;]</span>
</pre>
          </div>
         </div>
         <p>
          But the &lt;title&gt; tag itself has a child: the string “The Dormouse’s
story”. There’s a sense in which that string is also a child of the
&lt;head&gt; tag. The
          <code class="docutils literal">
           <span class="pre">
            .descendants
           </span>
          </code>
          attribute lets you iterate over
          <cite>
           all
          </cite>
          of a tag’s children, recursively: its direct children, the children of
its direct children, and so on:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="k">for</span> <span class="n">child</span> <span class="ow">in</span> <span class="n">head_tag</span><span class="o">.</span><span class="n">descendants</span><span class="p">:</span>
    <span class="k">print</span><span class="p">(</span><span class="n">child</span><span class="p">)</span>
<span class="c1"># &lt;title&gt;The Dormouse's story&lt;/title&gt;</span>
<span class="c1"># The Dormouse's story</span>
</pre>
          </div>
         </div>
         <p>
          The &lt;head&gt; tag has only one child, but it has two descendants: the
&lt;title&gt; tag and the &lt;title&gt; tag’s child. The
          <code class="docutils literal">
           <span class="pre">
            BeautifulSoup
           </span>
          </code>
          object
only has one direct child (the &lt;html&gt; tag), but it has a whole lot of
descendants:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="nb">len</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="n">soup</span><span class="o">.</span><span class="n">children</span><span class="p">))</span>
<span class="c1"># 1</span>
<span class="nb">len</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="n">soup</span><span class="o">.</span><span class="n">descendants</span><span class="p">))</span>
<span class="c1"># 25</span>
</pre>
          </div>
         </div>
        </div>
        <div class="section" id="string">
         <span id="id5">
         </span>
         <h3>
          <code class="docutils literal">
           <span class="pre">
            .string
           </span>
          </code>
          <a class="headerlink" href="#string" title="Permalink to this headline">
           ¶
          </a>
         </h3>
         <p>
          If a tag has only one child, and that child is a
          <code class="docutils literal">
           <span class="pre">
            NavigableString
           </span>
          </code>
          ,
the child is made available as
          <code class="docutils literal">
           <span class="pre">
            .string
           </span>
          </code>
          :
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">title_tag</span><span class="o">.</span><span class="n">string</span>
<span class="c1"># u'The Dormouse's story'</span>
</pre>
          </div>
         </div>
         <p>
          If a tag’s only child is another tag, and
          <cite>
           that
          </cite>
          tag has a
          <code class="docutils literal">
           <span class="pre">
            .string
           </span>
          </code>
          , then the parent tag is considered to have the same
          <code class="docutils literal">
           <span class="pre">
            .string
           </span>
          </code>
          as its child:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">head_tag</span><span class="o">.</span><span class="n">contents</span>
<span class="c1"># [&lt;title&gt;The Dormouse's story&lt;/title&gt;]</span>

<span class="n">head_tag</span><span class="o">.</span><span class="n">string</span>
<span class="c1"># u'The Dormouse's story'</span>
</pre>
          </div>
         </div>
         <p>
          If a tag contains more than one thing, then it’s not clear what
          <code class="docutils literal">
           <span class="pre">
            .string
           </span>
          </code>
          should refer to, so
          <code class="docutils literal">
           <span class="pre">
            .string
           </span>
          </code>
          is defined to be
          <code class="docutils literal">
           <span class="pre">
            None
           </span>
          </code>
          :
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="k">print</span><span class="p">(</span><span class="n">soup</span><span class="o">.</span><span class="n">html</span><span class="o">.</span><span class="n">string</span><span class="p">)</span>
<span class="c1"># None</span>
</pre>
          </div>
         </div>
        </div>
        <div class="section" id="strings-and-stripped-strings">
         <span id="string-generators">
         </span>
         <h3>
          <code class="docutils literal">
           <span class="pre">
            .strings
           </span>
          </code>
          and
          <code class="docutils literal">
           <span class="pre">
            stripped_strings
           </span>
          </code>
          <a class="headerlink" href="#strings-and-stripped-strings" title="Permalink to this headline">
           ¶
          </a>
         </h3>
         <p>
          If there’s more than one thing inside a tag, you can still look at
just the strings. Use the
          <code class="docutils literal">
           <span class="pre">
            .strings
           </span>
          </code>
          generator:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="k">for</span> <span class="n">string</span> <span class="ow">in</span> <span class="n">soup</span><span class="o">.</span><span class="n">strings</span><span class="p">:</span>
    <span class="k">print</span><span class="p">(</span><span class="nb">repr</span><span class="p">(</span><span class="n">string</span><span class="p">))</span>
<span class="c1"># u"The Dormouse's story"</span>
<span class="c1"># u'\n\n'</span>
<span class="c1"># u"The Dormouse's story"</span>
<span class="c1"># u'\n\n'</span>
<span class="c1"># u'Once upon a time there were three little sisters; and their names were\n'</span>
<span class="c1"># u'Elsie'</span>
<span class="c1"># u',\n'</span>
<span class="c1"># u'Lacie'</span>
<span class="c1"># u' and\n'</span>
<span class="c1"># u'Tillie'</span>
<span class="c1"># u';\nand they lived at the bottom of a well.'</span>
<span class="c1"># u'\n\n'</span>
<span class="c1"># u'...'</span>
<span class="c1"># u'\n'</span>
</pre>
          </div>
         </div>
         <p>
          These strings tend to have a lot of extra whitespace, which you can
remove by using the
          <code class="docutils literal">
           <span class="pre">
            .stripped_strings
           </span>
          </code>
          generator instead:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="k">for</span> <span class="n">string</span> <span class="ow">in</span> <span class="n">soup</span><span class="o">.</span><span class="n">stripped_strings</span><span class="p">:</span>
    <span class="k">print</span><span class="p">(</span><span class="nb">repr</span><span class="p">(</span><span class="n">string</span><span class="p">))</span>
<span class="c1"># u"The Dormouse's story"</span>
<span class="c1"># u"The Dormouse's story"</span>
<span class="c1"># u'Once upon a time there were three little sisters; and their names were'</span>
<span class="c1"># u'Elsie'</span>
<span class="c1"># u','</span>
<span class="c1"># u'Lacie'</span>
<span class="c1"># u'and'</span>
<span class="c1"># u'Tillie'</span>
<span class="c1"># u';\nand they lived at the bottom of a well.'</span>
<span class="c1"># u'...'</span>
</pre>
          </div>
         </div>
         <p>
          Here, strings consisting entirely of whitespace are ignored, and
whitespace at the beginning and end of strings is removed.
         </p>
        </div>
       </div>
       <div class="section" id="going-up">
        <h2>
         Going up
         <a class="headerlink" href="#going-up" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         Continuing the “family tree” analogy, every tag and every string has a
         <cite>
          parent
         </cite>
         : the tag that contains it.
        </p>
        <div class="section" id="parent">
         <span id="id6">
         </span>
         <h3>
          <code class="docutils literal">
           <span class="pre">
            .parent
           </span>
          </code>
          <a class="headerlink" href="#parent" title="Permalink to this headline">
           ¶
          </a>
         </h3>
         <p>
          You can access an element’s parent with the
          <code class="docutils literal">
           <span class="pre">
            .parent
           </span>
          </code>
          attribute. In
the example “three sisters” document, the &lt;head&gt; tag is the parent
of the &lt;title&gt; tag:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">title_tag</span> <span class="o">=</span> <span class="n">soup</span><span class="o">.</span><span class="n">title</span>
<span class="n">title_tag</span>
<span class="c1"># &lt;title&gt;The Dormouse's story&lt;/title&gt;</span>
<span class="n">title_tag</span><span class="o">.</span><span class="n">parent</span>
<span class="c1"># &lt;head&gt;&lt;title&gt;The Dormouse's story&lt;/title&gt;&lt;/head&gt;</span>
</pre>
          </div>
         </div>
         <p>
          The title string itself has a parent: the &lt;title&gt; tag that contains
it:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">title_tag</span><span class="o">.</span><span class="n">string</span><span class="o">.</span><span class="n">parent</span>
<span class="c1"># &lt;title&gt;The Dormouse's story&lt;/title&gt;</span>
</pre>
          </div>
         </div>
         <p>
          The parent of a top-level tag like &lt;html&gt; is the
          <code class="docutils literal">
           <span class="pre">
            BeautifulSoup
           </span>
          </code>
          object
itself:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">html_tag</span> <span class="o">=</span> <span class="n">soup</span><span class="o">.</span><span class="n">html</span>
<span class="nb">type</span><span class="p">(</span><span class="n">html_tag</span><span class="o">.</span><span class="n">parent</span><span class="p">)</span>
<span class="c1"># &lt;class 'bs4.BeautifulSoup'&gt;</span>
</pre>
          </div>
         </div>
         <p>
          And the
          <code class="docutils literal">
           <span class="pre">
            .parent
           </span>
          </code>
          of a
          <code class="docutils literal">
           <span class="pre">
            BeautifulSoup
           </span>
          </code>
          object is defined as None:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="k">print</span><span class="p">(</span><span class="n">soup</span><span class="o">.</span><span class="n">parent</span><span class="p">)</span>
<span class="c1"># None</span>
</pre>
          </div>
         </div>
        </div>
        <div class="section" id="parents">
         <span id="id7">
         </span>
         <h3>
          <code class="docutils literal">
           <span class="pre">
            .parents
           </span>
          </code>
          <a class="headerlink" href="#parents" title="Permalink to this headline">
           ¶
          </a>
         </h3>
         <p>
          You can iterate over all of an element’s parents with
          <code class="docutils literal">
           <span class="pre">
            .parents
           </span>
          </code>
          . This example uses
          <code class="docutils literal">
           <span class="pre">
            .parents
           </span>
          </code>
          to travel from an &lt;a&gt; tag
buried deep within the document, to the very top of the document:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">link</span> <span class="o">=</span> <span class="n">soup</span><span class="o">.</span><span class="n">a</span>
<span class="n">link</span>
<span class="c1"># &lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;Elsie&lt;/a&gt;</span>
<span class="k">for</span> <span class="n">parent</span> <span class="ow">in</span> <span class="n">link</span><span class="o">.</span><span class="n">parents</span><span class="p">:</span>
    <span class="k">if</span> <span class="n">parent</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span>
        <span class="k">print</span><span class="p">(</span><span class="n">parent</span><span class="p">)</span>
    <span class="k">else</span><span class="p">:</span>
        <span class="k">print</span><span class="p">(</span><span class="n">parent</span><span class="o">.</span><span class="n">name</span><span class="p">)</span>
<span class="c1"># p</span>
<span class="c1"># body</span>
<span class="c1"># html</span>
<span class="c1"># [document]</span>
<span class="c1"># None</span>
</pre>
          </div>
         </div>
        </div>
       </div>
       <div class="section" id="going-sideways">
        <h2>
         Going sideways
         <a class="headerlink" href="#going-sideways" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         Consider a simple document like this:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">sibling_soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="s2">"&lt;a&gt;&lt;b&gt;text1&lt;/b&gt;&lt;c&gt;text2&lt;/c&gt;&lt;/b&gt;&lt;/a&gt;"</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">sibling_soup</span><span class="o">.</span><span class="n">prettify</span><span class="p">())</span>
<span class="c1"># &lt;html&gt;</span>
<span class="c1">#  &lt;body&gt;</span>
<span class="c1">#   &lt;a&gt;</span>
<span class="c1">#    &lt;b&gt;</span>
<span class="c1">#     text1</span>
<span class="c1">#    &lt;/b&gt;</span>
<span class="c1">#    &lt;c&gt;</span>
<span class="c1">#     text2</span>
<span class="c1">#    &lt;/c&gt;</span>
<span class="c1">#   &lt;/a&gt;</span>
<span class="c1">#  &lt;/body&gt;</span>
<span class="c1"># &lt;/html&gt;</span>
</pre>
         </div>
        </div>
        <p>
         The &lt;b&gt; tag and the &lt;c&gt; tag are at the same level: they’re both direct
children of the same tag. We call them
         <cite>
          siblings
         </cite>
         . When a document is
pretty-printed, siblings show up at the same indentation level. You
can also use this relationship in the code you write.
        </p>
        <div class="section" id="next-sibling-and-previous-sibling">
         <h3>
          <code class="docutils literal">
           <span class="pre">
            .next_sibling
           </span>
          </code>
          and
          <code class="docutils literal">
           <span class="pre">
            .previous_sibling
           </span>
          </code>
          <a class="headerlink" href="#next-sibling-and-previous-sibling" title="Permalink to this headline">
           ¶
          </a>
         </h3>
         <p>
          You can use
          <code class="docutils literal">
           <span class="pre">
            .next_sibling
           </span>
          </code>
          and
          <code class="docutils literal">
           <span class="pre">
            .previous_sibling
           </span>
          </code>
          to navigate
between page elements that are on the same level of the parse tree:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">sibling_soup</span><span class="o">.</span><span class="n">b</span><span class="o">.</span><span class="n">next_sibling</span>
<span class="c1"># &lt;c&gt;text2&lt;/c&gt;</span>

<span class="n">sibling_soup</span><span class="o">.</span><span class="n">c</span><span class="o">.</span><span class="n">previous_sibling</span>
<span class="c1"># &lt;b&gt;text1&lt;/b&gt;</span>
</pre>
          </div>
         </div>
         <p>
          The &lt;b&gt; tag has a
          <code class="docutils literal">
           <span class="pre">
            .next_sibling
           </span>
          </code>
          , but no
          <code class="docutils literal">
           <span class="pre">
            .previous_sibling
           </span>
          </code>
          ,
because there’s nothing before the &lt;b&gt; tag
          <cite>
           on the same level of the
tree
          </cite>
          . For the same reason, the &lt;c&gt; tag has a
          <code class="docutils literal">
           <span class="pre">
            .previous_sibling
           </span>
          </code>
          but no
          <code class="docutils literal">
           <span class="pre">
            .next_sibling
           </span>
          </code>
          :
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="k">print</span><span class="p">(</span><span class="n">sibling_soup</span><span class="o">.</span><span class="n">b</span><span class="o">.</span><span class="n">previous_sibling</span><span class="p">)</span>
<span class="c1"># None</span>
<span class="k">print</span><span class="p">(</span><span class="n">sibling_soup</span><span class="o">.</span><span class="n">c</span><span class="o">.</span><span class="n">next_sibling</span><span class="p">)</span>
<span class="c1"># None</span>
</pre>
          </div>
         </div>
         <p>
          The strings “text1” and “text2” are
          <cite>
           not
          </cite>
          siblings, because they don’t
have the same parent:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">sibling_soup</span><span class="o">.</span><span class="n">b</span><span class="o">.</span><span class="n">string</span>
<span class="c1"># u'text1'</span>

<span class="k">print</span><span class="p">(</span><span class="n">sibling_soup</span><span class="o">.</span><span class="n">b</span><span class="o">.</span><span class="n">string</span><span class="o">.</span><span class="n">next_sibling</span><span class="p">)</span>
<span class="c1"># None</span>
</pre>
          </div>
         </div>
         <p>
          In real documents, the
          <code class="docutils literal">
           <span class="pre">
            .next_sibling
           </span>
          </code>
          or
          <code class="docutils literal">
           <span class="pre">
            .previous_sibling
           </span>
          </code>
          of a
tag will usually be a string containing whitespace. Going back to the
“three sisters” document:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre>&lt;a href="http://example.com/elsie" class="sister" id="link1"&gt;Elsie&lt;/a&gt;
&lt;a href="http://example.com/lacie" class="sister" id="link2"&gt;Lacie&lt;/a&gt;
&lt;a href="http://example.com/tillie" class="sister" id="link3"&gt;Tillie&lt;/a&gt;
</pre>
          </div>
         </div>
         <p>
          You might think that the
          <code class="docutils literal">
           <span class="pre">
            .next_sibling
           </span>
          </code>
          of the first &lt;a&gt; tag would
be the second &lt;a&gt; tag. But actually, it’s a string: the comma and
newline that separate the first &lt;a&gt; tag from the second:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">link</span> <span class="o">=</span> <span class="n">soup</span><span class="o">.</span><span class="n">a</span>
<span class="n">link</span>
<span class="c1"># &lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;Elsie&lt;/a&gt;</span>

<span class="n">link</span><span class="o">.</span><span class="n">next_sibling</span>
<span class="c1"># u',\n'</span>
</pre>
          </div>
         </div>
         <p>
          The second &lt;a&gt; tag is actually the
          <code class="docutils literal">
           <span class="pre">
            .next_sibling
           </span>
          </code>
          of the comma:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">link</span><span class="o">.</span><span class="n">next_sibling</span><span class="o">.</span><span class="n">next_sibling</span>
<span class="c1"># &lt;a class="sister" href="http://example.com/lacie" id="link2"&gt;Lacie&lt;/a&gt;</span>
</pre>
          </div>
         </div>
        </div>
        <div class="section" id="next-siblings-and-previous-siblings">
         <span id="sibling-generators">
         </span>
         <h3>
          <code class="docutils literal">
           <span class="pre">
            .next_siblings
           </span>
          </code>
          and
          <code class="docutils literal">
           <span class="pre">
            .previous_siblings
           </span>
          </code>
          <a class="headerlink" href="#next-siblings-and-previous-siblings" title="Permalink to this headline">
           ¶
          </a>
         </h3>
         <p>
          You can iterate over a tag’s siblings with
          <code class="docutils literal">
           <span class="pre">
            .next_siblings
           </span>
          </code>
          or
          <code class="docutils literal">
           <span class="pre">
            .previous_siblings
           </span>
          </code>
          :
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="k">for</span> <span class="n">sibling</span> <span class="ow">in</span> <span class="n">soup</span><span class="o">.</span><span class="n">a</span><span class="o">.</span><span class="n">next_siblings</span><span class="p">:</span>
    <span class="k">print</span><span class="p">(</span><span class="nb">repr</span><span class="p">(</span><span class="n">sibling</span><span class="p">))</span>
<span class="c1"># u',\n'</span>
<span class="c1"># &lt;a class="sister" href="http://example.com/lacie" id="link2"&gt;Lacie&lt;/a&gt;</span>
<span class="c1"># u' and\n'</span>
<span class="c1"># &lt;a class="sister" href="http://example.com/tillie" id="link3"&gt;Tillie&lt;/a&gt;</span>
<span class="c1"># u'; and they lived at the bottom of a well.'</span>
<span class="c1"># None</span>

<span class="k">for</span> <span class="n">sibling</span> <span class="ow">in</span> <span class="n">soup</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="nb">id</span><span class="o">=</span><span class="s2">"link3"</span><span class="p">)</span><span class="o">.</span><span class="n">previous_siblings</span><span class="p">:</span>
    <span class="k">print</span><span class="p">(</span><span class="nb">repr</span><span class="p">(</span><span class="n">sibling</span><span class="p">))</span>
<span class="c1"># ' and\n'</span>
<span class="c1"># &lt;a class="sister" href="http://example.com/lacie" id="link2"&gt;Lacie&lt;/a&gt;</span>
<span class="c1"># u',\n'</span>
<span class="c1"># &lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;Elsie&lt;/a&gt;</span>
<span class="c1"># u'Once upon a time there were three little sisters; and their names were\n'</span>
<span class="c1"># None</span>
</pre>
          </div>
         </div>
        </div>
       </div>
       <div class="section" id="going-back-and-forth">
        <h2>
         Going back and forth
         <a class="headerlink" href="#going-back-and-forth" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         Take a look at the beginning of the “three sisters” document:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre>&lt;html&gt;&lt;head&gt;&lt;title&gt;The Dormouse's story&lt;/title&gt;&lt;/head&gt;
&lt;p class="title"&gt;&lt;b&gt;The Dormouse's story&lt;/b&gt;&lt;/p&gt;
</pre>
         </div>
        </div>
        <p>
         An HTML parser takes this string of characters and turns it into a
series of events: “open an &lt;html&gt; tag”, “open a &lt;head&gt; tag”, “open a
&lt;title&gt; tag”, “add a string”, “close the &lt;title&gt; tag”, “open a &lt;p&gt;
tag”, and so on. Beautiful Soup offers tools for reconstructing the
initial parse of the document.
        </p>
        <div class="section" id="next-element-and-previous-element">
         <span id="element-generators">
         </span>
         <h3>
          <code class="docutils literal">
           <span class="pre">
            .next_element
           </span>
          </code>
          and
          <code class="docutils literal">
           <span class="pre">
            .previous_element
           </span>
          </code>
          <a class="headerlink" href="#next-element-and-previous-element" title="Permalink to this headline">
           ¶
          </a>
         </h3>
         <p>
          The
          <code class="docutils literal">
           <span class="pre">
            .next_element
           </span>
          </code>
          attribute of a string or tag points to whatever
was parsed immediately afterwards. It might be the same as
          <code class="docutils literal">
           <span class="pre">
            .next_sibling
           </span>
          </code>
          , but it’s usually drastically different.
         </p>
         <p>
          Here’s the final &lt;a&gt; tag in the “three sisters” document. Its
          <code class="docutils literal">
           <span class="pre">
            .next_sibling
           </span>
          </code>
          is a string: the conclusion of the sentence that was
interrupted by the start of the &lt;a&gt; tag.:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">last_a_tag</span> <span class="o">=</span> <span class="n">soup</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s2">"a"</span><span class="p">,</span> <span class="nb">id</span><span class="o">=</span><span class="s2">"link3"</span><span class="p">)</span>
<span class="n">last_a_tag</span>
<span class="c1"># &lt;a class="sister" href="http://example.com/tillie" id="link3"&gt;Tillie&lt;/a&gt;</span>

<span class="n">last_a_tag</span><span class="o">.</span><span class="n">next_sibling</span>
<span class="c1"># '; and they lived at the bottom of a well.'</span>
</pre>
          </div>
         </div>
         <p>
          But the
          <code class="docutils literal">
           <span class="pre">
            .next_element
           </span>
          </code>
          of that &lt;a&gt; tag, the thing that was parsed
immediately after the &lt;a&gt; tag, is
          <cite>
           not
          </cite>
          the rest of that sentence:
it’s the word “Tillie”:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">last_a_tag</span><span class="o">.</span><span class="n">next_element</span>
<span class="c1"># u'Tillie'</span>
</pre>
          </div>
         </div>
         <p>
          That’s because in the original markup, the word “Tillie” appeared
before that semicolon. The parser encountered an &lt;a&gt; tag, then the
word “Tillie”, then the closing &lt;/a&gt; tag, then the semicolon and rest of
the sentence. The semicolon is on the same level as the &lt;a&gt; tag, but the
word “Tillie” was encountered first.
         </p>
         <p>
          The
          <code class="docutils literal">
           <span class="pre">
            .previous_element
           </span>
          </code>
          attribute is the exact opposite of
          <code class="docutils literal">
           <span class="pre">
            .next_element
           </span>
          </code>
          . It points to whatever element was parsed
immediately before this one:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">last_a_tag</span><span class="o">.</span><span class="n">previous_element</span>
<span class="c1"># u' and\n'</span>
<span class="n">last_a_tag</span><span class="o">.</span><span class="n">previous_element</span><span class="o">.</span><span class="n">next_element</span>
<span class="c1"># &lt;a class="sister" href="http://example.com/tillie" id="link3"&gt;Tillie&lt;/a&gt;</span>
</pre>
          </div>
         </div>
        </div>
        <div class="section" id="next-elements-and-previous-elements">
         <h3>
          <code class="docutils literal">
           <span class="pre">
            .next_elements
           </span>
          </code>
          and
          <code class="docutils literal">
           <span class="pre">
            .previous_elements
           </span>
          </code>
          <a class="headerlink" href="#next-elements-and-previous-elements" title="Permalink to this headline">
           ¶
          </a>
         </h3>
         <p>
          You should get the idea by now. You can use these iterators to move
forward or backward in the document as it was parsed:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="k">for</span> <span class="n">element</span> <span class="ow">in</span> <span class="n">last_a_tag</span><span class="o">.</span><span class="n">next_elements</span><span class="p">:</span>
    <span class="k">print</span><span class="p">(</span><span class="nb">repr</span><span class="p">(</span><span class="n">element</span><span class="p">))</span>
<span class="c1"># u'Tillie'</span>
<span class="c1"># u';\nand they lived at the bottom of a well.'</span>
<span class="c1"># u'\n\n'</span>
<span class="c1"># &lt;p class="story"&gt;...&lt;/p&gt;</span>
<span class="c1"># u'...'</span>
<span class="c1"># u'\n'</span>
<span class="c1"># None</span>
</pre>
          </div>
         </div>
        </div>
       </div>
      </div>
      <div class="section" id="searching-the-tree">
       <h1>
        Searching the tree
        <a class="headerlink" href="#searching-the-tree" title="Permalink to this headline">
         ¶
        </a>
       </h1>
       <p>
        Beautiful Soup defines a lot of methods for searching the parse tree,
but they’re all very similar. I’m going to spend a lot of time explaining
the two most popular methods:
        <code class="docutils literal">
         <span class="pre">
          find()
         </span>
        </code>
        and
        <code class="docutils literal">
         <span class="pre">
          find_all()
         </span>
        </code>
        . The other
methods take almost exactly the same arguments, so I’ll just cover
them briefly.
       </p>
       <p>
        Once again, I’ll be using the “three sisters” document as an example:
       </p>
       <div class="highlight-python">
        <div class="highlight">
         <pre><span class="n">html_doc</span> <span class="o">=</span> <span class="s2">"""</span>
<span class="s2">&lt;html&gt;&lt;head&gt;&lt;title&gt;The Dormouse's story&lt;/title&gt;&lt;/head&gt;</span>
<span class="s2">&lt;body&gt;</span>
<span class="s2">&lt;p class="title"&gt;&lt;b&gt;The Dormouse's story&lt;/b&gt;&lt;/p&gt;</span>

<span class="s2">&lt;p class="story"&gt;Once upon a time there were three little sisters; and their names were</span>
<span class="s2">&lt;a href="http://example.com/elsie" class="sister" id="link1"&gt;Elsie&lt;/a&gt;,</span>
<span class="s2">&lt;a href="http://example.com/lacie" class="sister" id="link2"&gt;Lacie&lt;/a&gt; and</span>
<span class="s2">&lt;a href="http://example.com/tillie" class="sister" id="link3"&gt;Tillie&lt;/a&gt;;</span>
<span class="s2">and they lived at the bottom of a well.&lt;/p&gt;</span>

<span class="s2">&lt;p class="story"&gt;...&lt;/p&gt;</span>
<span class="s2">"""</span>

<span class="kn">from</span> <span class="nn">bs4</span> <span class="kn">import</span> <span class="n">BeautifulSoup</span>
<span class="n">soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="n">html_doc</span><span class="p">,</span> <span class="s1">'html.parser'</span><span class="p">)</span>
</pre>
        </div>
       </div>
       <p>
        By passing in a filter to an argument like
        <code class="docutils literal">
         <span class="pre">
          find_all()
         </span>
        </code>
        , you can
zoom in on the parts of the document you’re interested in.
       </p>
       <div class="section" id="kinds-of-filters">
        <h2>
         Kinds of filters
         <a class="headerlink" href="#kinds-of-filters" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         Before talking in detail about
         <code class="docutils literal">
          <span class="pre">
           find_all()
          </span>
         </code>
         and similar methods, I
want to show examples of different filters you can pass into these
methods. These filters show up again and again, throughout the
search API. You can use them to filter based on a tag’s name,
on its attributes, on the text of a string, or on some combination of
these.
        </p>
        <div class="section" id="a-string">
         <span id="id8">
         </span>
         <h3>
          A string
          <a class="headerlink" href="#a-string" title="Permalink to this headline">
           ¶
          </a>
         </h3>
         <p>
          The simplest filter is a string. Pass a string to a search method and
Beautiful Soup will perform a match against that exact string. This
code finds all the &lt;b&gt; tags in the document:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="s1">'b'</span><span class="p">)</span>
<span class="c1"># [&lt;b&gt;The Dormouse's story&lt;/b&gt;]</span>
</pre>
          </div>
         </div>
         <p>
          If you pass in a byte string, Beautiful Soup will assume the string is
encoded as UTF-8. You can avoid this by passing in a Unicode string instead.
         </p>
        </div>
        <div class="section" id="a-regular-expression">
         <span id="id9">
         </span>
         <h3>
          A regular expression
          <a class="headerlink" href="#a-regular-expression" title="Permalink to this headline">
           ¶
          </a>
         </h3>
         <p>
          If you pass in a regular expression object, Beautiful Soup will filter
against that regular expression using its
          <code class="docutils literal">
           <span class="pre">
            search()
           </span>
          </code>
          method. This code
finds all the tags whose names start with the letter “b”; in this
case, the &lt;body&gt; tag and the &lt;b&gt; tag:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="kn">import</span> <span class="nn">re</span>
<span class="k">for</span> <span class="n">tag</span> <span class="ow">in</span> <span class="n">soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="s2">"^b"</span><span class="p">)):</span>
    <span class="k">print</span><span class="p">(</span><span class="n">tag</span><span class="o">.</span><span class="n">name</span><span class="p">)</span>
<span class="c1"># body</span>
<span class="c1"># b</span>
</pre>
          </div>
         </div>
         <p>
          This code finds all the tags whose names contain the letter ‘t’:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="k">for</span> <span class="n">tag</span> <span class="ow">in</span> <span class="n">soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="s2">"t"</span><span class="p">)):</span>
    <span class="k">print</span><span class="p">(</span><span class="n">tag</span><span class="o">.</span><span class="n">name</span><span class="p">)</span>
<span class="c1"># html</span>
<span class="c1"># title</span>
</pre>
          </div>
         </div>
        </div>
        <div class="section" id="a-list">
         <span id="id10">
         </span>
         <h3>
          A list
          <a class="headerlink" href="#a-list" title="Permalink to this headline">
           ¶
          </a>
         </h3>
         <p>
          If you pass in a list, Beautiful Soup will allow a string match
against
          <cite>
           any
          </cite>
          item in that list. This code finds all the &lt;a&gt; tags
          <cite>
           and
          </cite>
          all the &lt;b&gt; tags:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">([</span><span class="s2">"a"</span><span class="p">,</span> <span class="s2">"b"</span><span class="p">])</span>
<span class="c1"># [&lt;b&gt;The Dormouse's story&lt;/b&gt;,</span>
<span class="c1">#  &lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;Elsie&lt;/a&gt;,</span>
<span class="c1">#  &lt;a class="sister" href="http://example.com/lacie" id="link2"&gt;Lacie&lt;/a&gt;,</span>
<span class="c1">#  &lt;a class="sister" href="http://example.com/tillie" id="link3"&gt;Tillie&lt;/a&gt;]</span>
</pre>
          </div>
         </div>
        </div>
        <div class="section" id="true">
         <span id="the-value-true">
         </span>
         <h3>
          <code class="docutils literal">
           <span class="pre">
            True
           </span>
          </code>
          <a class="headerlink" href="#true" title="Permalink to this headline">
           ¶
          </a>
         </h3>
         <p>
          The value
          <code class="docutils literal">
           <span class="pre">
            True
           </span>
          </code>
          matches everything it can. This code finds
          <cite>
           all
          </cite>
          the tags in the document, but none of the text strings:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="k">for</span> <span class="n">tag</span> <span class="ow">in</span> <span class="n">soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="bp">True</span><span class="p">):</span>
    <span class="k">print</span><span class="p">(</span><span class="n">tag</span><span class="o">.</span><span class="n">name</span><span class="p">)</span>
<span class="c1"># html</span>
<span class="c1"># head</span>
<span class="c1"># title</span>
<span class="c1"># body</span>
<span class="c1"># p</span>
<span class="c1"># b</span>
<span class="c1"># p</span>
<span class="c1"># a</span>
<span class="c1"># a</span>
<span class="c1"># a</span>
<span class="c1"># p</span>
</pre>
          </div>
         </div>
        </div>
        <div class="section" id="a-function">
         <h3>
          A function
          <a class="headerlink" href="#a-function" title="Permalink to this headline">
           ¶
          </a>
         </h3>
         <p>
          If none of the other matches work for you, define a function that
takes an element as its only argument. The function should return
          <code class="docutils literal">
           <span class="pre">
            True
           </span>
          </code>
          if the argument matches, and
          <code class="docutils literal">
           <span class="pre">
            False
           </span>
          </code>
          otherwise.
         </p>
         <p>
          Here’s a function that returns
          <code class="docutils literal">
           <span class="pre">
            True
           </span>
          </code>
          if a tag defines the “class”
attribute but doesn’t define the “id” attribute:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="k">def</span> <span class="nf">has_class_but_no_id</span><span class="p">(</span><span class="n">tag</span><span class="p">):</span>
    <span class="k">return</span> <span class="n">tag</span><span class="o">.</span><span class="n">has_attr</span><span class="p">(</span><span class="s1">'class'</span><span class="p">)</span> <span class="ow">and</span> <span class="ow">not</span> <span class="n">tag</span><span class="o">.</span><span class="n">has_attr</span><span class="p">(</span><span class="s1">'id'</span><span class="p">)</span>
</pre>
          </div>
         </div>
         <p>
          Pass this function into
          <code class="docutils literal">
           <span class="pre">
            find_all()
           </span>
          </code>
          and you’ll pick up all the &lt;p&gt;
tags:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="n">has_class_but_no_id</span><span class="p">)</span>
<span class="c1"># [&lt;p class="title"&gt;&lt;b&gt;The Dormouse's story&lt;/b&gt;&lt;/p&gt;,</span>
<span class="c1">#  &lt;p class="story"&gt;Once upon a time there were...&lt;/p&gt;,</span>
<span class="c1">#  &lt;p class="story"&gt;...&lt;/p&gt;]</span>
</pre>
          </div>
         </div>
         <p>
          This function only picks up the &lt;p&gt; tags. It doesn’t pick up the &lt;a&gt;
tags, because those tags define both “class” and “id”. It doesn’t pick
up tags like &lt;html&gt; and &lt;title&gt;, because those tags don’t define
“class”.
         </p>
         <p>
          If you pass in a function to filter on a specific attribute like
          <code class="docutils literal">
           <span class="pre">
            href
           </span>
          </code>
          , the argument passed into the function will be the attribute
value, not the whole tag. Here’s a function that finds all
          <code class="docutils literal">
           <span class="pre">
            a
           </span>
          </code>
          tags
whose
          <code class="docutils literal">
           <span class="pre">
            href
           </span>
          </code>
          attribute
          <em>
           does not
          </em>
          match a regular expression:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="k">def</span> <span class="nf">not_lacie</span><span class="p">(</span><span class="n">href</span><span class="p">):</span>
    <span class="k">return</span> <span class="n">href</span> <span class="ow">and</span> <span class="ow">not</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="s2">"lacie"</span><span class="p">)</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="n">href</span><span class="p">)</span>
<span class="n">soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="n">href</span><span class="o">=</span><span class="n">not_lacie</span><span class="p">)</span>
<span class="c1"># [&lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;Elsie&lt;/a&gt;,</span>
<span class="c1">#  &lt;a class="sister" href="http://example.com/tillie" id="link3"&gt;Tillie&lt;/a&gt;]</span>
</pre>
          </div>
         </div>
         <p>
          The function can be as complicated as you need it to be. Here’s a
function that returns
          <code class="docutils literal">
           <span class="pre">
            True
           </span>
          </code>
          if a tag is surrounded by string
objects:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="kn">from</span> <span class="nn">bs4</span> <span class="kn">import</span> <span class="n">NavigableString</span>
<span class="k">def</span> <span class="nf">surrounded_by_strings</span><span class="p">(</span><span class="n">tag</span><span class="p">):</span>
    <span class="k">return</span> <span class="p">(</span><span class="nb">isinstance</span><span class="p">(</span><span class="n">tag</span><span class="o">.</span><span class="n">next_element</span><span class="p">,</span> <span class="n">NavigableString</span><span class="p">)</span>
            <span class="ow">and</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">tag</span><span class="o">.</span><span class="n">previous_element</span><span class="p">,</span> <span class="n">NavigableString</span><span class="p">))</span>

<span class="k">for</span> <span class="n">tag</span> <span class="ow">in</span> <span class="n">soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="n">surrounded_by_strings</span><span class="p">):</span>
    <span class="k">print</span> <span class="n">tag</span><span class="o">.</span><span class="n">name</span>
<span class="c1"># p</span>
<span class="c1"># a</span>
<span class="c1"># a</span>
<span class="c1"># a</span>
<span class="c1"># p</span>
</pre>
          </div>
         </div>
         <p>
          Now we’re ready to look at the search methods in detail.
         </p>
        </div>
       </div>
       <div class="section" id="find-all">
        <h2>
         <code class="docutils literal">
          <span class="pre">
           find_all()
          </span>
         </code>
         <a class="headerlink" href="#find-all" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         Signature: find_all(
         <a class="reference internal" href="#id11">
          <span>
           name
          </span>
         </a>
         ,
         <a class="reference internal" href="#attrs">
          <span>
           attrs
          </span>
         </a>
         ,
         <a class="reference internal" href="#recursive">
          <span>
           recursive
          </span>
         </a>
         ,
         <a class="reference internal" href="#id12">
          <span>
           string
          </span>
         </a>
         ,
         <a class="reference internal" href="#limit">
          <span>
           limit
          </span>
         </a>
         ,
         <a class="reference internal" href="#kwargs">
          <span>
           **kwargs
          </span>
         </a>
         )
        </p>
        <p>
         The
         <code class="docutils literal">
          <span class="pre">
           find_all()
          </span>
         </code>
         method looks through a tag’s descendants and
retrieves
         <cite>
          all
         </cite>
         descendants that match your filters. I gave several
examples in
         <a class="reference internal" href="#kinds-of-filters">
          Kinds of filters
         </a>
         , but here are a few more:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="s2">"title"</span><span class="p">)</span>
<span class="c1"># [&lt;title&gt;The Dormouse's story&lt;/title&gt;]</span>

<span class="n">soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="s2">"p"</span><span class="p">,</span> <span class="s2">"title"</span><span class="p">)</span>
<span class="c1"># [&lt;p class="title"&gt;&lt;b&gt;The Dormouse's story&lt;/b&gt;&lt;/p&gt;]</span>

<span class="n">soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="s2">"a"</span><span class="p">)</span>
<span class="c1"># [&lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;Elsie&lt;/a&gt;,</span>
<span class="c1">#  &lt;a class="sister" href="http://example.com/lacie" id="link2"&gt;Lacie&lt;/a&gt;,</span>
<span class="c1">#  &lt;a class="sister" href="http://example.com/tillie" id="link3"&gt;Tillie&lt;/a&gt;]</span>

<span class="n">soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="nb">id</span><span class="o">=</span><span class="s2">"link2"</span><span class="p">)</span>
<span class="c1"># [&lt;a class="sister" href="http://example.com/lacie" id="link2"&gt;Lacie&lt;/a&gt;]</span>

<span class="kn">import</span> <span class="nn">re</span>
<span class="n">soup</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="n">string</span><span class="o">=</span><span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="s2">"sisters"</span><span class="p">))</span>
<span class="c1"># u'Once upon a time there were three little sisters; and their names were\n'</span>
</pre>
         </div>
        </div>
        <p>
         Some of these should look familiar, but others are new. What does it
mean to pass in a value for
         <code class="docutils literal">
          <span class="pre">
           string
          </span>
         </code>
         , or
         <code class="docutils literal">
          <span class="pre">
           id
          </span>
         </code>
         ? Why does
         <code class="docutils literal">
          <span class="pre">
           find_all("p",
          </span>
          <span class="pre">
           "title")
          </span>
         </code>
         find a &lt;p&gt; tag with the CSS class “title”?
Let’s look at the arguments to
         <code class="docutils literal">
          <span class="pre">
           find_all()
          </span>
         </code>
         .
        </p>
        <div class="section" id="the-name-argument">
         <span id="id11">
         </span>
         <h3>
          The
          <code class="docutils literal">
           <span class="pre">
            name
           </span>
          </code>
          argument
          <a class="headerlink" href="#the-name-argument" title="Permalink to this headline">
           ¶
          </a>
         </h3>
         <p>
          Pass in a value for
          <code class="docutils literal">
           <span class="pre">
            name
           </span>
          </code>
          and you’ll tell Beautiful Soup to only
consider tags with certain names. Text strings will be ignored, as
will tags whose names that don’t match.
         </p>
         <p>
          This is the simplest usage:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="s2">"title"</span><span class="p">)</span>
<span class="c1"># [&lt;title&gt;The Dormouse's story&lt;/title&gt;]</span>
</pre>
          </div>
         </div>
         <p>
          Recall from
          <a class="reference internal" href="#kinds-of-filters">
           Kinds of filters
          </a>
          that the value to
          <code class="docutils literal">
           <span class="pre">
            name
           </span>
          </code>
          can be
          <a class="reference internal" href="#a-string">
           a
string
          </a>
          ,
          <a class="reference internal" href="#a-regular-expression">
           a regular expression
          </a>
          ,
          <a class="reference internal" href="#a-list">
           a list
          </a>
          ,
          <a class="reference internal" href="#a-function">
           a function
          </a>
          , or
          <a class="reference internal" href="#the-value-true">
           the value
True
          </a>
          .
         </p>
        </div>
        <div class="section" id="the-keyword-arguments">
         <span id="kwargs">
         </span>
         <h3>
          The keyword arguments
          <a class="headerlink" href="#the-keyword-arguments" title="Permalink to this headline">
           ¶
          </a>
         </h3>
         <p>
          Any argument that’s not recognized will be turned into a filter on one
of a tag’s attributes. If you pass in a value for an argument called
          <code class="docutils literal">
           <span class="pre">
            id
           </span>
          </code>
          ,
Beautiful Soup will filter against each tag’s ‘id’ attribute:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="nb">id</span><span class="o">=</span><span class="s1">'link2'</span><span class="p">)</span>
<span class="c1"># [&lt;a class="sister" href="http://example.com/lacie" id="link2"&gt;Lacie&lt;/a&gt;]</span>
</pre>
          </div>
         </div>
         <p>
          If you pass in a value for
          <code class="docutils literal">
           <span class="pre">
            href
           </span>
          </code>
          , Beautiful Soup will filter
against each tag’s ‘href’ attribute:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="n">href</span><span class="o">=</span><span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="s2">"elsie"</span><span class="p">))</span>
<span class="c1"># [&lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;Elsie&lt;/a&gt;]</span>
</pre>
          </div>
         </div>
         <p>
          You can filter an attribute based on
          <a class="reference internal" href="#a-string">
           a string
          </a>
          ,
          <a class="reference internal" href="#a-regular-expression">
           a regular
expression
          </a>
          ,
          <a class="reference internal" href="#a-list">
           a list
          </a>
          ,
          <a class="reference internal" href="#a-function">
           a function
          </a>
          , or
          <a class="reference internal" href="#the-value-true">
           the value True
          </a>
          .
         </p>
         <p>
          This code finds all tags whose
          <code class="docutils literal">
           <span class="pre">
            id
           </span>
          </code>
          attribute has a value,
regardless of what the value is:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="nb">id</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="c1"># [&lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;Elsie&lt;/a&gt;,</span>
<span class="c1">#  &lt;a class="sister" href="http://example.com/lacie" id="link2"&gt;Lacie&lt;/a&gt;,</span>
<span class="c1">#  &lt;a class="sister" href="http://example.com/tillie" id="link3"&gt;Tillie&lt;/a&gt;]</span>
</pre>
          </div>
         </div>
         <p>
          You can filter multiple attributes at once by passing in more than one
keyword argument:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="n">href</span><span class="o">=</span><span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="s2">"elsie"</span><span class="p">),</span> <span class="nb">id</span><span class="o">=</span><span class="s1">'link1'</span><span class="p">)</span>
<span class="c1"># [&lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;three&lt;/a&gt;]</span>
</pre>
          </div>
         </div>
         <p>
          Some attributes, like the data-* attributes in HTML 5, have names that
can’t be used as the names of keyword arguments:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">data_soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="s1">'&lt;div data-foo="value"&gt;foo!&lt;/div&gt;'</span><span class="p">)</span>
<span class="n">data_soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="n">data</span><span class="o">-</span><span class="n">foo</span><span class="o">=</span><span class="s2">"value"</span><span class="p">)</span>
<span class="c1"># SyntaxError: keyword can't be an expression</span>
</pre>
          </div>
         </div>
         <p>
          You can use these attributes in searches by putting them into a
dictionary and passing the dictionary into
          <code class="docutils literal">
           <span class="pre">
            find_all()
           </span>
          </code>
          as the
          <code class="docutils literal">
           <span class="pre">
            attrs
           </span>
          </code>
          argument:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">data_soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="n">attrs</span><span class="o">=</span><span class="p">{</span><span class="s2">"data-foo"</span><span class="p">:</span> <span class="s2">"value"</span><span class="p">})</span>
<span class="c1"># [&lt;div data-foo="value"&gt;foo!&lt;/div&gt;]</span>
</pre>
          </div>
         </div>
         <p>
          You can’t use a keyword argument to search for HTML’s ‘name’ element,
because Beautiful Soup uses the
          <code class="docutils literal">
           <span class="pre">
            name
           </span>
          </code>
          argument to contain the name
of the tag itself. Instead, you can give a value to ‘name’ in the
          <code class="docutils literal">
           <span class="pre">
            attrs
           </span>
          </code>
          argument.
         </p>
         <blockquote>
          <div>
           name_soup = BeautifulSoup(‘&lt;input name=”email”/&gt;’)
name_soup.find_all(name=”email”)
# []
name_soup.find_all(attrs={“name”: “email”})
# [&lt;input name=”email”/&gt;]
          </div>
         </blockquote>
        </div>
        <div class="section" id="searching-by-css-class">
         <span id="attrs">
         </span>
         <h3>
          Searching by CSS class
          <a class="headerlink" href="#searching-by-css-class" title="Permalink to this headline">
           ¶
          </a>
         </h3>
         <p>
          It’s very useful to search for a tag that has a certain CSS class, but
the name of the CSS attribute, “class”, is a reserved word in
Python. Using
          <code class="docutils literal">
           <span class="pre">
            class
           </span>
          </code>
          as a keyword argument will give you a syntax
error. As of Beautiful Soup 4.1.2, you can search by CSS class using
the keyword argument
          <code class="docutils literal">
           <span class="pre">
            class_
           </span>
          </code>
          :
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="s2">"a"</span><span class="p">,</span> <span class="n">class_</span><span class="o">=</span><span class="s2">"sister"</span><span class="p">)</span>
<span class="c1"># [&lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;Elsie&lt;/a&gt;,</span>
<span class="c1">#  &lt;a class="sister" href="http://example.com/lacie" id="link2"&gt;Lacie&lt;/a&gt;,</span>
<span class="c1">#  &lt;a class="sister" href="http://example.com/tillie" id="link3"&gt;Tillie&lt;/a&gt;]</span>
</pre>
          </div>
         </div>
         <p>
          As with any keyword argument, you can pass
          <code class="docutils literal">
           <span class="pre">
            class_
           </span>
          </code>
          a string, a regular
expression, a function, or
          <code class="docutils literal">
           <span class="pre">
            True
           </span>
          </code>
          :
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="n">class_</span><span class="o">=</span><span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="s2">"itl"</span><span class="p">))</span>
<span class="c1"># [&lt;p class="title"&gt;&lt;b&gt;The Dormouse's story&lt;/b&gt;&lt;/p&gt;]</span>

<span class="k">def</span> <span class="nf">has_six_characters</span><span class="p">(</span><span class="n">css_class</span><span class="p">):</span>
    <span class="k">return</span> <span class="n">css_class</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span> <span class="ow">and</span> <span class="nb">len</span><span class="p">(</span><span class="n">css_class</span><span class="p">)</span> <span class="o">==</span> <span class="mi">6</span>

<span class="n">soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="n">class_</span><span class="o">=</span><span class="n">has_six_characters</span><span class="p">)</span>
<span class="c1"># [&lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;Elsie&lt;/a&gt;,</span>
<span class="c1">#  &lt;a class="sister" href="http://example.com/lacie" id="link2"&gt;Lacie&lt;/a&gt;,</span>
<span class="c1">#  &lt;a class="sister" href="http://example.com/tillie" id="link3"&gt;Tillie&lt;/a&gt;]</span>
</pre>
          </div>
         </div>
         <p>
          <a class="reference internal" href="#multivalue">
           <span>
            Remember
           </span>
          </a>
          that a single tag can have multiple
values for its “class” attribute. When you search for a tag that
matches a certain CSS class, you’re matching against
          <cite>
           any
          </cite>
          of its CSS
classes:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">css_soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="s1">'&lt;p class="body strikeout"&gt;&lt;/p&gt;'</span><span class="p">)</span>
<span class="n">css_soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="s2">"p"</span><span class="p">,</span> <span class="n">class_</span><span class="o">=</span><span class="s2">"strikeout"</span><span class="p">)</span>
<span class="c1"># [&lt;p class="body strikeout"&gt;&lt;/p&gt;]</span>

<span class="n">css_soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="s2">"p"</span><span class="p">,</span> <span class="n">class_</span><span class="o">=</span><span class="s2">"body"</span><span class="p">)</span>
<span class="c1"># [&lt;p class="body strikeout"&gt;&lt;/p&gt;]</span>
</pre>
          </div>
         </div>
         <p>
          You can also search for the exact string value of the
          <code class="docutils literal">
           <span class="pre">
            class
           </span>
          </code>
          attribute:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">css_soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="s2">"p"</span><span class="p">,</span> <span class="n">class_</span><span class="o">=</span><span class="s2">"body strikeout"</span><span class="p">)</span>
<span class="c1"># [&lt;p class="body strikeout"&gt;&lt;/p&gt;]</span>
</pre>
          </div>
         </div>
         <p>
          But searching for variants of the string value won’t work:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">css_soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="s2">"p"</span><span class="p">,</span> <span class="n">class_</span><span class="o">=</span><span class="s2">"strikeout body"</span><span class="p">)</span>
<span class="c1"># []</span>
</pre>
          </div>
         </div>
         <p>
          If you want to search for tags that match two or more CSS classes, you
should use a CSS selector:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">css_soup</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s2">"p.strikeout.body"</span><span class="p">)</span>
<span class="c1"># [&lt;p class="body strikeout"&gt;&lt;/p&gt;]</span>
</pre>
          </div>
         </div>
         <p>
          In older versions of Beautiful Soup, which don’t have the
          <code class="docutils literal">
           <span class="pre">
            class_
           </span>
          </code>
          shortcut, you can use the
          <code class="docutils literal">
           <span class="pre">
            attrs
           </span>
          </code>
          trick mentioned above. Create a
dictionary whose value for “class” is the string (or regular
expression, or whatever) you want to search for:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="s2">"a"</span><span class="p">,</span> <span class="n">attrs</span><span class="o">=</span><span class="p">{</span><span class="s2">"class"</span><span class="p">:</span> <span class="s2">"sister"</span><span class="p">})</span>
<span class="c1"># [&lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;Elsie&lt;/a&gt;,</span>
<span class="c1">#  &lt;a class="sister" href="http://example.com/lacie" id="link2"&gt;Lacie&lt;/a&gt;,</span>
<span class="c1">#  &lt;a class="sister" href="http://example.com/tillie" id="link3"&gt;Tillie&lt;/a&gt;]</span>
</pre>
          </div>
         </div>
        </div>
        <div class="section" id="the-string-argument">
         <span id="id12">
         </span>
         <h3>
          The
          <code class="docutils literal">
           <span class="pre">
            string
           </span>
          </code>
          argument
          <a class="headerlink" href="#the-string-argument" title="Permalink to this headline">
           ¶
          </a>
         </h3>
         <p>
          With
          <code class="docutils literal">
           <span class="pre">
            string
           </span>
          </code>
          you can search for strings instead of tags. As with
          <code class="docutils literal">
           <span class="pre">
            name
           </span>
          </code>
          and the keyword arguments, you can pass in
          <a class="reference internal" href="#a-string">
           a string
          </a>
          ,
          <a class="reference internal" href="#a-regular-expression">
           a
regular expression
          </a>
          ,
          <a class="reference internal" href="#a-list">
           a list
          </a>
          ,
          <a class="reference internal" href="#a-function">
           a function
          </a>
          , or
          <a class="reference internal" href="#the-value-true">
           the value True
          </a>
          .
Here are some examples:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="n">string</span><span class="o">=</span><span class="s2">"Elsie"</span><span class="p">)</span>
<span class="c1"># [u'Elsie']</span>

<span class="n">soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="n">string</span><span class="o">=</span><span class="p">[</span><span class="s2">"Tillie"</span><span class="p">,</span> <span class="s2">"Elsie"</span><span class="p">,</span> <span class="s2">"Lacie"</span><span class="p">])</span>
<span class="c1"># [u'Elsie', u'Lacie', u'Tillie']</span>

<span class="n">soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="n">string</span><span class="o">=</span><span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="s2">"Dormouse"</span><span class="p">))</span>
<span class="p">[</span><span class="s2">u"The Dormouse's story"</span><span class="p">,</span> <span class="s2">u"The Dormouse's story"</span><span class="p">]</span>

<span class="k">def</span> <span class="nf">is_the_only_string_within_a_tag</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
    <span class="sd">"""Return True if this string is the only child of its parent tag."""</span>
    <span class="k">return</span> <span class="p">(</span><span class="n">s</span> <span class="o">==</span> <span class="n">s</span><span class="o">.</span><span class="n">parent</span><span class="o">.</span><span class="n">string</span><span class="p">)</span>

<span class="n">soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="n">string</span><span class="o">=</span><span class="n">is_the_only_string_within_a_tag</span><span class="p">)</span>
<span class="c1"># [u"The Dormouse's story", u"The Dormouse's story", u'Elsie', u'Lacie', u'Tillie', u'...']</span>
</pre>
          </div>
         </div>
         <p>
          Although
          <code class="docutils literal">
           <span class="pre">
            string
           </span>
          </code>
          is for finding strings, you can combine it with
arguments that find tags: Beautiful Soup will find all tags whose
          <code class="docutils literal">
           <span class="pre">
            .string
           </span>
          </code>
          matches your value for
          <code class="docutils literal">
           <span class="pre">
            string
           </span>
          </code>
          . This code finds the &lt;a&gt;
tags whose
          <code class="docutils literal">
           <span class="pre">
            .string
           </span>
          </code>
          is “Elsie”:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="s2">"a"</span><span class="p">,</span> <span class="n">string</span><span class="o">=</span><span class="s2">"Elsie"</span><span class="p">)</span>
<span class="c1"># [&lt;a href="http://example.com/elsie" class="sister" id="link1"&gt;Elsie&lt;/a&gt;]</span>
</pre>
          </div>
         </div>
         <p>
          The
          <code class="docutils literal">
           <span class="pre">
            string
           </span>
          </code>
          argument is new in Beautiful Soup 4.4.0. In earlier
versions it was called
          <code class="docutils literal">
           <span class="pre">
            text
           </span>
          </code>
          :
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="s2">"a"</span><span class="p">,</span> <span class="n">text</span><span class="o">=</span><span class="s2">"Elsie"</span><span class="p">)</span>
<span class="c1"># [&lt;a href="http://example.com/elsie" class="sister" id="link1"&gt;Elsie&lt;/a&gt;]</span>
</pre>
          </div>
         </div>
        </div>
        <div class="section" id="the-limit-argument">
         <span id="limit">
         </span>
         <h3>
          The
          <code class="docutils literal">
           <span class="pre">
            limit
           </span>
          </code>
          argument
          <a class="headerlink" href="#the-limit-argument" title="Permalink to this headline">
           ¶
          </a>
         </h3>
         <p>
          <code class="docutils literal">
           <span class="pre">
            find_all()
           </span>
          </code>
          returns all the tags and strings that match your
filters. This can take a while if the document is large. If you don’t
need
          <cite>
           all
          </cite>
          the results, you can pass in a number for
          <code class="docutils literal">
           <span class="pre">
            limit
           </span>
          </code>
          . This
works just like the LIMIT keyword in SQL. It tells Beautiful Soup to
stop gathering results after it’s found a certain number.
         </p>
         <p>
          There are three links in the “three sisters” document, but this code
only finds the first two:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="s2">"a"</span><span class="p">,</span> <span class="n">limit</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
<span class="c1"># [&lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;Elsie&lt;/a&gt;,</span>
<span class="c1">#  &lt;a class="sister" href="http://example.com/lacie" id="link2"&gt;Lacie&lt;/a&gt;]</span>
</pre>
          </div>
         </div>
        </div>
        <div class="section" id="the-recursive-argument">
         <span id="recursive">
         </span>
         <h3>
          The
          <code class="docutils literal">
           <span class="pre">
            recursive
           </span>
          </code>
          argument
          <a class="headerlink" href="#the-recursive-argument" title="Permalink to this headline">
           ¶
          </a>
         </h3>
         <p>
          If you call
          <code class="docutils literal">
           <span class="pre">
            mytag.find_all()
           </span>
          </code>
          , Beautiful Soup will examine all the
descendants of
          <code class="docutils literal">
           <span class="pre">
            mytag
           </span>
          </code>
          : its children, its children’s children, and
so on. If you only want Beautiful Soup to consider direct children,
you can pass in
          <code class="docutils literal">
           <span class="pre">
            recursive=False
           </span>
          </code>
          . See the difference here:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">soup</span><span class="o">.</span><span class="n">html</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="s2">"title"</span><span class="p">)</span>
<span class="c1"># [&lt;title&gt;The Dormouse's story&lt;/title&gt;]</span>

<span class="n">soup</span><span class="o">.</span><span class="n">html</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="s2">"title"</span><span class="p">,</span> <span class="n">recursive</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="c1"># []</span>
</pre>
          </div>
         </div>
         <p>
          Here’s that part of the document:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre>&lt;html&gt;
 &lt;head&gt;
  &lt;title&gt;
   The Dormouse's story
  &lt;/title&gt;
 &lt;/head&gt;
...
</pre>
          </div>
         </div>
         <p>
          The &lt;title&gt; tag is beneath the &lt;html&gt; tag, but it’s not
          <cite>
           directly
          </cite>
          beneath the &lt;html&gt; tag: the &lt;head&gt; tag is in the way. Beautiful Soup
finds the &lt;title&gt; tag when it’s allowed to look at all descendants of
the &lt;html&gt; tag, but when
          <code class="docutils literal">
           <span class="pre">
            recursive=False
           </span>
          </code>
          restricts it to the
&lt;html&gt; tag’s immediate children, it finds nothing.
         </p>
         <p>
          Beautiful Soup offers a lot of tree-searching methods (covered below),
and they mostly take the same arguments as
          <code class="docutils literal">
           <span class="pre">
            find_all()
           </span>
          </code>
          :
          <code class="docutils literal">
           <span class="pre">
            name
           </span>
          </code>
          ,
          <code class="docutils literal">
           <span class="pre">
            attrs
           </span>
          </code>
          ,
          <code class="docutils literal">
           <span class="pre">
            string
           </span>
          </code>
          ,
          <code class="docutils literal">
           <span class="pre">
            limit
           </span>
          </code>
          , and the keyword arguments. But the
          <code class="docutils literal">
           <span class="pre">
            recursive
           </span>
          </code>
          argument is different:
          <code class="docutils literal">
           <span class="pre">
            find_all()
           </span>
          </code>
          and
          <code class="docutils literal">
           <span class="pre">
            find()
           </span>
          </code>
          are
the only methods that support it. Passing
          <code class="docutils literal">
           <span class="pre">
            recursive=False
           </span>
          </code>
          into a
method like
          <code class="docutils literal">
           <span class="pre">
            find_parents()
           </span>
          </code>
          wouldn’t be very useful.
         </p>
        </div>
       </div>
       <div class="section" id="calling-a-tag-is-like-calling-find-all">
        <h2>
         Calling a tag is like calling
         <code class="docutils literal">
          <span class="pre">
           find_all()
          </span>
         </code>
         <a class="headerlink" href="#calling-a-tag-is-like-calling-find-all" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         Because
         <code class="docutils literal">
          <span class="pre">
           find_all()
          </span>
         </code>
         is the most popular method in the Beautiful
Soup search API, you can use a shortcut for it. If you treat the
         <code class="docutils literal">
          <span class="pre">
           BeautifulSoup
          </span>
         </code>
         object or a
         <code class="docutils literal">
          <span class="pre">
           Tag
          </span>
         </code>
         object as though it were a
function, then it’s the same as calling
         <code class="docutils literal">
          <span class="pre">
           find_all()
          </span>
         </code>
         on that
object. These two lines of code are equivalent:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="s2">"a"</span><span class="p">)</span>
<span class="n">soup</span><span class="p">(</span><span class="s2">"a"</span><span class="p">)</span>
</pre>
         </div>
        </div>
        <p>
         These two lines are also equivalent:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">soup</span><span class="o">.</span><span class="n">title</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="n">string</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">soup</span><span class="o">.</span><span class="n">title</span><span class="p">(</span><span class="n">string</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
</pre>
         </div>
        </div>
       </div>
       <div class="section" id="find">
        <h2>
         <code class="docutils literal">
          <span class="pre">
           find()
          </span>
         </code>
         <a class="headerlink" href="#find" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         Signature: find(
         <a class="reference internal" href="#id11">
          <span>
           name
          </span>
         </a>
         ,
         <a class="reference internal" href="#attrs">
          <span>
           attrs
          </span>
         </a>
         ,
         <a class="reference internal" href="#recursive">
          <span>
           recursive
          </span>
         </a>
         ,
         <a class="reference internal" href="#id12">
          <span>
           string
          </span>
         </a>
         ,
         <a class="reference internal" href="#kwargs">
          <span>
           **kwargs
          </span>
         </a>
         )
        </p>
        <p>
         The
         <code class="docutils literal">
          <span class="pre">
           find_all()
          </span>
         </code>
         method scans the entire document looking for
results, but sometimes you only want to find one result. If you know a
document only has one &lt;body&gt; tag, it’s a waste of time to scan the
entire document looking for more. Rather than passing in
         <code class="docutils literal">
          <span class="pre">
           limit=1
          </span>
         </code>
         every time you call
         <code class="docutils literal">
          <span class="pre">
           find_all
          </span>
         </code>
         , you can use the
         <code class="docutils literal">
          <span class="pre">
           find()
          </span>
         </code>
         method. These two lines of code are
         <cite>
          nearly
         </cite>
         equivalent:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="s1">'title'</span><span class="p">,</span> <span class="n">limit</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="c1"># [&lt;title&gt;The Dormouse's story&lt;/title&gt;]</span>

<span class="n">soup</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s1">'title'</span><span class="p">)</span>
<span class="c1"># &lt;title&gt;The Dormouse's story&lt;/title&gt;</span>
</pre>
         </div>
        </div>
        <p>
         The only difference is that
         <code class="docutils literal">
          <span class="pre">
           find_all()
          </span>
         </code>
         returns a list containing
the single result, and
         <code class="docutils literal">
          <span class="pre">
           find()
          </span>
         </code>
         just returns the result.
        </p>
        <p>
         If
         <code class="docutils literal">
          <span class="pre">
           find_all()
          </span>
         </code>
         can’t find anything, it returns an empty list. If
         <code class="docutils literal">
          <span class="pre">
           find()
          </span>
         </code>
         can’t find anything, it returns
         <code class="docutils literal">
          <span class="pre">
           None
          </span>
         </code>
         :
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="k">print</span><span class="p">(</span><span class="n">soup</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s2">"nosuchtag"</span><span class="p">))</span>
<span class="c1"># None</span>
</pre>
         </div>
        </div>
        <p>
         Remember the
         <code class="docutils literal">
          <span class="pre">
           soup.head.title
          </span>
         </code>
         trick from
         <a class="reference internal" href="#navigating-using-tag-names">
          Navigating using tag
names
         </a>
         ? That trick works by repeatedly calling
         <code class="docutils literal">
          <span class="pre">
           find()
          </span>
         </code>
         :
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">soup</span><span class="o">.</span><span class="n">head</span><span class="o">.</span><span class="n">title</span>
<span class="c1"># &lt;title&gt;The Dormouse's story&lt;/title&gt;</span>

<span class="n">soup</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s2">"head"</span><span class="p">)</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s2">"title"</span><span class="p">)</span>
<span class="c1"># &lt;title&gt;The Dormouse's story&lt;/title&gt;</span>
</pre>
         </div>
        </div>
       </div>
       <div class="section" id="find-parents-and-find-parent">
        <h2>
         <code class="docutils literal">
          <span class="pre">
           find_parents()
          </span>
         </code>
         and
         <code class="docutils literal">
          <span class="pre">
           find_parent()
          </span>
         </code>
         <a class="headerlink" href="#find-parents-and-find-parent" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         Signature: find_parents(
         <a class="reference internal" href="#id11">
          <span>
           name
          </span>
         </a>
         ,
         <a class="reference internal" href="#attrs">
          <span>
           attrs
          </span>
         </a>
         ,
         <a class="reference internal" href="#id12">
          <span>
           string
          </span>
         </a>
         ,
         <a class="reference internal" href="#limit">
          <span>
           limit
          </span>
         </a>
         ,
         <a class="reference internal" href="#kwargs">
          <span>
           **kwargs
          </span>
         </a>
         )
        </p>
        <p>
         Signature: find_parent(
         <a class="reference internal" href="#id11">
          <span>
           name
          </span>
         </a>
         ,
         <a class="reference internal" href="#attrs">
          <span>
           attrs
          </span>
         </a>
         ,
         <a class="reference internal" href="#id12">
          <span>
           string
          </span>
         </a>
         ,
         <a class="reference internal" href="#kwargs">
          <span>
           **kwargs
          </span>
         </a>
         )
        </p>
        <p>
         I spent a lot of time above covering
         <code class="docutils literal">
          <span class="pre">
           find_all()
          </span>
         </code>
         and
         <code class="docutils literal">
          <span class="pre">
           find()
          </span>
         </code>
         . The Beautiful Soup API defines ten other methods for
searching the tree, but don’t be afraid. Five of these methods are
basically the same as
         <code class="docutils literal">
          <span class="pre">
           find_all()
          </span>
         </code>
         , and the other five are basically
the same as
         <code class="docutils literal">
          <span class="pre">
           find()
          </span>
         </code>
         . The only differences are in what parts of the
tree they search.
        </p>
        <p>
         First let’s consider
         <code class="docutils literal">
          <span class="pre">
           find_parents()
          </span>
         </code>
         and
         <code class="docutils literal">
          <span class="pre">
           find_parent()
          </span>
         </code>
         . Remember that
         <code class="docutils literal">
          <span class="pre">
           find_all()
          </span>
         </code>
         and
         <code class="docutils literal">
          <span class="pre">
           find()
          </span>
         </code>
         work
their way down the tree, looking at tag’s descendants. These methods
do the opposite: they work their way
         <cite>
          up
         </cite>
         the tree, looking at a tag’s
(or a string’s) parents. Let’s try them out, starting from a string
buried deep in the “three daughters” document:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre>a_string = soup.find(string="Lacie")
a_string
# u'Lacie'

a_string.find_parents("a")
# [&lt;a class="sister" href="http://example.com/lacie" id="link2"&gt;Lacie&lt;/a&gt;]

a_string.find_parent("p")
# &lt;p class="story"&gt;Once upon a time there were three little sisters; and their names were
#  &lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;Elsie&lt;/a&gt;,
#  &lt;a class="sister" href="http://example.com/lacie" id="link2"&gt;Lacie&lt;/a&gt; and
#  &lt;a class="sister" href="http://example.com/tillie" id="link3"&gt;Tillie&lt;/a&gt;;
#  and they lived at the bottom of a well.&lt;/p&gt;

a_string.find_parents("p", class="title")
# []
</pre>
         </div>
        </div>
        <p>
         One of the three &lt;a&gt; tags is the direct parent of the string in
question, so our search finds it. One of the three &lt;p&gt; tags is an
indirect parent of the string, and our search finds that as
well. There’s a &lt;p&gt; tag with the CSS class “title”
         <cite>
          somewhere
         </cite>
         in the
document, but it’s not one of this string’s parents, so we can’t find
it with
         <code class="docutils literal">
          <span class="pre">
           find_parents()
          </span>
         </code>
         .
        </p>
        <p>
         You may have made the connection between
         <code class="docutils literal">
          <span class="pre">
           find_parent()
          </span>
         </code>
         and
         <code class="docutils literal">
          <span class="pre">
           find_parents()
          </span>
         </code>
         , and the
         <a class="reference internal" href="#parent">
          .parent
         </a>
         and
         <a class="reference internal" href="#parents">
          .parents
         </a>
         attributes
mentioned earlier. The connection is very strong. These search methods
actually use
         <code class="docutils literal">
          <span class="pre">
           .parents
          </span>
         </code>
         to iterate over all the parents, and check
each one against the provided filter to see if it matches.
        </p>
       </div>
       <div class="section" id="find-next-siblings-and-find-next-sibling">
        <h2>
         <code class="docutils literal">
          <span class="pre">
           find_next_siblings()
          </span>
         </code>
         and
         <code class="docutils literal">
          <span class="pre">
           find_next_sibling()
          </span>
         </code>
         <a class="headerlink" href="#find-next-siblings-and-find-next-sibling" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         Signature: find_next_siblings(
         <a class="reference internal" href="#id11">
          <span>
           name
          </span>
         </a>
         ,
         <a class="reference internal" href="#attrs">
          <span>
           attrs
          </span>
         </a>
         ,
         <a class="reference internal" href="#id12">
          <span>
           string
          </span>
         </a>
         ,
         <a class="reference internal" href="#limit">
          <span>
           limit
          </span>
         </a>
         ,
         <a class="reference internal" href="#kwargs">
          <span>
           **kwargs
          </span>
         </a>
         )
        </p>
        <p>
         Signature: find_next_sibling(
         <a class="reference internal" href="#id11">
          <span>
           name
          </span>
         </a>
         ,
         <a class="reference internal" href="#attrs">
          <span>
           attrs
          </span>
         </a>
         ,
         <a class="reference internal" href="#id12">
          <span>
           string
          </span>
         </a>
         ,
         <a class="reference internal" href="#kwargs">
          <span>
           **kwargs
          </span>
         </a>
         )
        </p>
        <p>
         These methods use
         <a class="reference internal" href="#sibling-generators">
          <span>
           .next_siblings
          </span>
         </a>
         to
iterate over the rest of an element’s siblings in the tree. The
         <code class="docutils literal">
          <span class="pre">
           find_next_siblings()
          </span>
         </code>
         method returns all the siblings that match,
and
         <code class="docutils literal">
          <span class="pre">
           find_next_sibling()
          </span>
         </code>
         only returns the first one:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">first_link</span> <span class="o">=</span> <span class="n">soup</span><span class="o">.</span><span class="n">a</span>
<span class="n">first_link</span>
<span class="c1"># &lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;Elsie&lt;/a&gt;</span>

<span class="n">first_link</span><span class="o">.</span><span class="n">find_next_siblings</span><span class="p">(</span><span class="s2">"a"</span><span class="p">)</span>
<span class="c1"># [&lt;a class="sister" href="http://example.com/lacie" id="link2"&gt;Lacie&lt;/a&gt;,</span>
<span class="c1">#  &lt;a class="sister" href="http://example.com/tillie" id="link3"&gt;Tillie&lt;/a&gt;]</span>

<span class="n">first_story_paragraph</span> <span class="o">=</span> <span class="n">soup</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s2">"p"</span><span class="p">,</span> <span class="s2">"story"</span><span class="p">)</span>
<span class="n">first_story_paragraph</span><span class="o">.</span><span class="n">find_next_sibling</span><span class="p">(</span><span class="s2">"p"</span><span class="p">)</span>
<span class="c1"># &lt;p class="story"&gt;...&lt;/p&gt;</span>
</pre>
         </div>
        </div>
       </div>
       <div class="section" id="find-previous-siblings-and-find-previous-sibling">
        <h2>
         <code class="docutils literal">
          <span class="pre">
           find_previous_siblings()
          </span>
         </code>
         and
         <code class="docutils literal">
          <span class="pre">
           find_previous_sibling()
          </span>
         </code>
         <a class="headerlink" href="#find-previous-siblings-and-find-previous-sibling" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         Signature: find_previous_siblings(
         <a class="reference internal" href="#id11">
          <span>
           name
          </span>
         </a>
         ,
         <a class="reference internal" href="#attrs">
          <span>
           attrs
          </span>
         </a>
         ,
         <a class="reference internal" href="#id12">
          <span>
           string
          </span>
         </a>
         ,
         <a class="reference internal" href="#limit">
          <span>
           limit
          </span>
         </a>
         ,
         <a class="reference internal" href="#kwargs">
          <span>
           **kwargs
          </span>
         </a>
         )
        </p>
        <p>
         Signature: find_previous_sibling(
         <a class="reference internal" href="#id11">
          <span>
           name
          </span>
         </a>
         ,
         <a class="reference internal" href="#attrs">
          <span>
           attrs
          </span>
         </a>
         ,
         <a class="reference internal" href="#id12">
          <span>
           string
          </span>
         </a>
         ,
         <a class="reference internal" href="#kwargs">
          <span>
           **kwargs
          </span>
         </a>
         )
        </p>
        <p>
         These methods use
         <a class="reference internal" href="#sibling-generators">
          <span>
           .previous_siblings
          </span>
         </a>
         to iterate over an element’s
siblings that precede it in the tree. The
         <code class="docutils literal">
          <span class="pre">
           find_previous_siblings()
          </span>
         </code>
         method returns all the siblings that match, and
         <code class="docutils literal">
          <span class="pre">
           find_previous_sibling()
          </span>
         </code>
         only returns the first one:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">last_link</span> <span class="o">=</span> <span class="n">soup</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s2">"a"</span><span class="p">,</span> <span class="nb">id</span><span class="o">=</span><span class="s2">"link3"</span><span class="p">)</span>
<span class="n">last_link</span>
<span class="c1"># &lt;a class="sister" href="http://example.com/tillie" id="link3"&gt;Tillie&lt;/a&gt;</span>

<span class="n">last_link</span><span class="o">.</span><span class="n">find_previous_siblings</span><span class="p">(</span><span class="s2">"a"</span><span class="p">)</span>
<span class="c1"># [&lt;a class="sister" href="http://example.com/lacie" id="link2"&gt;Lacie&lt;/a&gt;,</span>
<span class="c1">#  &lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;Elsie&lt;/a&gt;]</span>

<span class="n">first_story_paragraph</span> <span class="o">=</span> <span class="n">soup</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s2">"p"</span><span class="p">,</span> <span class="s2">"story"</span><span class="p">)</span>
<span class="n">first_story_paragraph</span><span class="o">.</span><span class="n">find_previous_sibling</span><span class="p">(</span><span class="s2">"p"</span><span class="p">)</span>
<span class="c1"># &lt;p class="title"&gt;&lt;b&gt;The Dormouse's story&lt;/b&gt;&lt;/p&gt;</span>
</pre>
         </div>
        </div>
       </div>
       <div class="section" id="find-all-next-and-find-next">
        <h2>
         <code class="docutils literal">
          <span class="pre">
           find_all_next()
          </span>
         </code>
         and
         <code class="docutils literal">
          <span class="pre">
           find_next()
          </span>
         </code>
         <a class="headerlink" href="#find-all-next-and-find-next" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         Signature: find_all_next(
         <a class="reference internal" href="#id11">
          <span>
           name
          </span>
         </a>
         ,
         <a class="reference internal" href="#attrs">
          <span>
           attrs
          </span>
         </a>
         ,
         <a class="reference internal" href="#id12">
          <span>
           string
          </span>
         </a>
         ,
         <a class="reference internal" href="#limit">
          <span>
           limit
          </span>
         </a>
         ,
         <a class="reference internal" href="#kwargs">
          <span>
           **kwargs
          </span>
         </a>
         )
        </p>
        <p>
         Signature: find_next(
         <a class="reference internal" href="#id11">
          <span>
           name
          </span>
         </a>
         ,
         <a class="reference internal" href="#attrs">
          <span>
           attrs
          </span>
         </a>
         ,
         <a class="reference internal" href="#id12">
          <span>
           string
          </span>
         </a>
         ,
         <a class="reference internal" href="#kwargs">
          <span>
           **kwargs
          </span>
         </a>
         )
        </p>
        <p>
         These methods use
         <a class="reference internal" href="#element-generators">
          <span>
           .next_elements
          </span>
         </a>
         to
iterate over whatever tags and strings that come after it in the
document. The
         <code class="docutils literal">
          <span class="pre">
           find_all_next()
          </span>
         </code>
         method returns all matches, and
         <code class="docutils literal">
          <span class="pre">
           find_next()
          </span>
         </code>
         only returns the first match:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">first_link</span> <span class="o">=</span> <span class="n">soup</span><span class="o">.</span><span class="n">a</span>
<span class="n">first_link</span>
<span class="c1"># &lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;Elsie&lt;/a&gt;</span>

<span class="n">first_link</span><span class="o">.</span><span class="n">find_all_next</span><span class="p">(</span><span class="n">string</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="c1"># [u'Elsie', u',\n', u'Lacie', u' and\n', u'Tillie',</span>
<span class="c1">#  u';\nand they lived at the bottom of a well.', u'\n\n', u'...', u'\n']</span>

<span class="n">first_link</span><span class="o">.</span><span class="n">find_next</span><span class="p">(</span><span class="s2">"p"</span><span class="p">)</span>
<span class="c1"># &lt;p class="story"&gt;...&lt;/p&gt;</span>
</pre>
         </div>
        </div>
        <p>
         In the first example, the string “Elsie” showed up, even though it was
contained within the &lt;a&gt; tag we started from. In the second example,
the last &lt;p&gt; tag in the document showed up, even though it’s not in
the same part of the tree as the &lt;a&gt; tag we started from. For these
methods, all that matters is that an element match the filter, and
show up later in the document than the starting element.
        </p>
       </div>
       <div class="section" id="find-all-previous-and-find-previous">
        <h2>
         <code class="docutils literal">
          <span class="pre">
           find_all_previous()
          </span>
         </code>
         and
         <code class="docutils literal">
          <span class="pre">
           find_previous()
          </span>
         </code>
         <a class="headerlink" href="#find-all-previous-and-find-previous" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         Signature: find_all_previous(
         <a class="reference internal" href="#id11">
          <span>
           name
          </span>
         </a>
         ,
         <a class="reference internal" href="#attrs">
          <span>
           attrs
          </span>
         </a>
         ,
         <a class="reference internal" href="#id12">
          <span>
           string
          </span>
         </a>
         ,
         <a class="reference internal" href="#limit">
          <span>
           limit
          </span>
         </a>
         ,
         <a class="reference internal" href="#kwargs">
          <span>
           **kwargs
          </span>
         </a>
         )
        </p>
        <p>
         Signature: find_previous(
         <a class="reference internal" href="#id11">
          <span>
           name
          </span>
         </a>
         ,
         <a class="reference internal" href="#attrs">
          <span>
           attrs
          </span>
         </a>
         ,
         <a class="reference internal" href="#id12">
          <span>
           string
          </span>
         </a>
         ,
         <a class="reference internal" href="#kwargs">
          <span>
           **kwargs
          </span>
         </a>
         )
        </p>
        <p>
         These methods use
         <a class="reference internal" href="#element-generators">
          <span>
           .previous_elements
          </span>
         </a>
         to
iterate over the tags and strings that came before it in the
document. The
         <code class="docutils literal">
          <span class="pre">
           find_all_previous()
          </span>
         </code>
         method returns all matches, and
         <code class="docutils literal">
          <span class="pre">
           find_previous()
          </span>
         </code>
         only returns the first match:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">first_link</span> <span class="o">=</span> <span class="n">soup</span><span class="o">.</span><span class="n">a</span>
<span class="n">first_link</span>
<span class="c1"># &lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;Elsie&lt;/a&gt;</span>

<span class="n">first_link</span><span class="o">.</span><span class="n">find_all_previous</span><span class="p">(</span><span class="s2">"p"</span><span class="p">)</span>
<span class="c1"># [&lt;p class="story"&gt;Once upon a time there were three little sisters; ...&lt;/p&gt;,</span>
<span class="c1">#  &lt;p class="title"&gt;&lt;b&gt;The Dormouse's story&lt;/b&gt;&lt;/p&gt;]</span>

<span class="n">first_link</span><span class="o">.</span><span class="n">find_previous</span><span class="p">(</span><span class="s2">"title"</span><span class="p">)</span>
<span class="c1"># &lt;title&gt;The Dormouse's story&lt;/title&gt;</span>
</pre>
         </div>
        </div>
        <p>
         The call to
         <code class="docutils literal">
          <span class="pre">
           find_all_previous("p")
          </span>
         </code>
         found the first paragraph in
the document (the one with class=”title”), but it also finds the
second paragraph, the &lt;p&gt; tag that contains the &lt;a&gt; tag we started
with. This shouldn’t be too surprising: we’re looking at all the tags
that show up earlier in the document than the one we started with. A
&lt;p&gt; tag that contains an &lt;a&gt; tag must have shown up before the &lt;a&gt;
tag it contains.
        </p>
       </div>
       <div class="section" id="css-selectors">
        <h2>
         CSS selectors
         <a class="headerlink" href="#css-selectors" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         Beautiful Soup supports the most commonly-used CSS selectors. Just
pass a string into the
         <code class="docutils literal">
          <span class="pre">
           .select()
          </span>
         </code>
         method of a
         <code class="docutils literal">
          <span class="pre">
           Tag
          </span>
         </code>
         object or the
         <code class="docutils literal">
          <span class="pre">
           BeautifulSoup
          </span>
         </code>
         object itself.
        </p>
        <p>
         You can find tags:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">soup</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s2">"title"</span><span class="p">)</span>
<span class="c1"># [&lt;title&gt;The Dormouse's story&lt;/title&gt;]</span>

<span class="n">soup</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s2">"p:nth-of-type(3)"</span><span class="p">)</span>
<span class="c1"># [&lt;p class="story"&gt;...&lt;/p&gt;]</span>
</pre>
         </div>
        </div>
        <p>
         Find tags beneath other tags:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">soup</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s2">"body a"</span><span class="p">)</span>
<span class="c1"># [&lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;Elsie&lt;/a&gt;,</span>
<span class="c1">#  &lt;a class="sister" href="http://example.com/lacie"  id="link2"&gt;Lacie&lt;/a&gt;,</span>
<span class="c1">#  &lt;a class="sister" href="http://example.com/tillie" id="link3"&gt;Tillie&lt;/a&gt;]</span>

<span class="n">soup</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s2">"html head title"</span><span class="p">)</span>
<span class="c1"># [&lt;title&gt;The Dormouse's story&lt;/title&gt;]</span>
</pre>
         </div>
        </div>
        <p>
         Find tags
         <cite>
          directly
         </cite>
         beneath other tags:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">soup</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s2">"head &gt; title"</span><span class="p">)</span>
<span class="c1"># [&lt;title&gt;The Dormouse's story&lt;/title&gt;]</span>

<span class="n">soup</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s2">"p &gt; a"</span><span class="p">)</span>
<span class="c1"># [&lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;Elsie&lt;/a&gt;,</span>
<span class="c1">#  &lt;a class="sister" href="http://example.com/lacie"  id="link2"&gt;Lacie&lt;/a&gt;,</span>
<span class="c1">#  &lt;a class="sister" href="http://example.com/tillie" id="link3"&gt;Tillie&lt;/a&gt;]</span>

<span class="n">soup</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s2">"p &gt; a:nth-of-type(2)"</span><span class="p">)</span>
<span class="c1"># [&lt;a class="sister" href="http://example.com/lacie" id="link2"&gt;Lacie&lt;/a&gt;]</span>

<span class="n">soup</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s2">"p &gt; #link1"</span><span class="p">)</span>
<span class="c1"># [&lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;Elsie&lt;/a&gt;]</span>

<span class="n">soup</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s2">"body &gt; a"</span><span class="p">)</span>
<span class="c1"># []</span>
</pre>
         </div>
        </div>
        <p>
         Find the siblings of tags:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">soup</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s2">"#link1 ~ .sister"</span><span class="p">)</span>
<span class="c1"># [&lt;a class="sister" href="http://example.com/lacie" id="link2"&gt;Lacie&lt;/a&gt;,</span>
<span class="c1">#  &lt;a class="sister" href="http://example.com/tillie"  id="link3"&gt;Tillie&lt;/a&gt;]</span>

<span class="n">soup</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s2">"#link1 + .sister"</span><span class="p">)</span>
<span class="c1"># [&lt;a class="sister" href="http://example.com/lacie" id="link2"&gt;Lacie&lt;/a&gt;]</span>
</pre>
         </div>
        </div>
        <p>
         Find tags by CSS class:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">soup</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s2">".sister"</span><span class="p">)</span>
<span class="c1"># [&lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;Elsie&lt;/a&gt;,</span>
<span class="c1">#  &lt;a class="sister" href="http://example.com/lacie" id="link2"&gt;Lacie&lt;/a&gt;,</span>
<span class="c1">#  &lt;a class="sister" href="http://example.com/tillie" id="link3"&gt;Tillie&lt;/a&gt;]</span>

<span class="n">soup</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s2">"[class~=sister]"</span><span class="p">)</span>
<span class="c1"># [&lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;Elsie&lt;/a&gt;,</span>
<span class="c1">#  &lt;a class="sister" href="http://example.com/lacie" id="link2"&gt;Lacie&lt;/a&gt;,</span>
<span class="c1">#  &lt;a class="sister" href="http://example.com/tillie" id="link3"&gt;Tillie&lt;/a&gt;]</span>
</pre>
         </div>
        </div>
        <p>
         Find tags by ID:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">soup</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s2">"#link1"</span><span class="p">)</span>
<span class="c1"># [&lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;Elsie&lt;/a&gt;]</span>

<span class="n">soup</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s2">"a#link2"</span><span class="p">)</span>
<span class="c1"># [&lt;a class="sister" href="http://example.com/lacie" id="link2"&gt;Lacie&lt;/a&gt;]</span>
</pre>
         </div>
        </div>
        <p>
         Find tags that match any selector from a list of selectors:
        </p>
        <blockquote>
         <div>
          soup.select(“#link1,#link2”)
# [&lt;a class=”sister” href=”
          <a class="reference external" href="http://example.com/elsie">
           http://example.com/elsie
          </a>
          ” id=”link1”&gt;Elsie&lt;/a&gt;,
#  &lt;a class=”sister” href=”
          <a class="reference external" href="http://example.com/lacie">
           http://example.com/lacie
          </a>
          ” id=”link2”&gt;Lacie&lt;/a&gt;]
         </div>
        </blockquote>
        <p>
         Test for the existence of an attribute:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">soup</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s1">'a[href]'</span><span class="p">)</span>
<span class="c1"># [&lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;Elsie&lt;/a&gt;,</span>
<span class="c1">#  &lt;a class="sister" href="http://example.com/lacie" id="link2"&gt;Lacie&lt;/a&gt;,</span>
<span class="c1">#  &lt;a class="sister" href="http://example.com/tillie" id="link3"&gt;Tillie&lt;/a&gt;]</span>
</pre>
         </div>
        </div>
        <p>
         Find tags by attribute value:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">soup</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s1">'a[href="http://example.com/elsie"]'</span><span class="p">)</span>
<span class="c1"># [&lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;Elsie&lt;/a&gt;]</span>

<span class="n">soup</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s1">'a[href^="http://example.com/"]'</span><span class="p">)</span>
<span class="c1"># [&lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;Elsie&lt;/a&gt;,</span>
<span class="c1">#  &lt;a class="sister" href="http://example.com/lacie" id="link2"&gt;Lacie&lt;/a&gt;,</span>
<span class="c1">#  &lt;a class="sister" href="http://example.com/tillie" id="link3"&gt;Tillie&lt;/a&gt;]</span>

<span class="n">soup</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s1">'a[href$="tillie"]'</span><span class="p">)</span>
<span class="c1"># [&lt;a class="sister" href="http://example.com/tillie" id="link3"&gt;Tillie&lt;/a&gt;]</span>

<span class="n">soup</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s1">'a[href*=".com/el"]'</span><span class="p">)</span>
<span class="c1"># [&lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;Elsie&lt;/a&gt;]</span>
</pre>
         </div>
        </div>
        <p>
         Match language codes:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">multilingual_markup</span> <span class="o">=</span> <span class="s2">"""</span>
<span class="s2"> &lt;p lang="en"&gt;Hello&lt;/p&gt;</span>
<span class="s2"> &lt;p lang="en-us"&gt;Howdy, y'all&lt;/p&gt;</span>
<span class="s2"> &lt;p lang="en-gb"&gt;Pip-pip, old fruit&lt;/p&gt;</span>
<span class="s2"> &lt;p lang="fr"&gt;Bonjour mes amis&lt;/p&gt;</span>
<span class="s2">"""</span>
<span class="n">multilingual_soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="n">multilingual_markup</span><span class="p">)</span>
<span class="n">multilingual_soup</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s1">'p[lang|=en]'</span><span class="p">)</span>
<span class="c1"># [&lt;p lang="en"&gt;Hello&lt;/p&gt;,</span>
<span class="c1">#  &lt;p lang="en-us"&gt;Howdy, y'all&lt;/p&gt;,</span>
<span class="c1">#  &lt;p lang="en-gb"&gt;Pip-pip, old fruit&lt;/p&gt;]</span>
</pre>
         </div>
        </div>
        <p>
         Find only the first tag that matches a selector:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">soup</span><span class="o">.</span><span class="n">select_one</span><span class="p">(</span><span class="s2">".sister"</span><span class="p">)</span>
<span class="c1"># &lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;Elsie&lt;/a&gt;</span>
</pre>
         </div>
        </div>
        <p>
         This is all a convenience for users who know the CSS selector syntax. You
can do all this stuff with the Beautiful Soup API. And if CSS
selectors are all you need, you might as well use lxml directly: it’s
a lot faster, and it supports more CSS selectors. But this lets you
         <cite>
          combine
         </cite>
         simple CSS selectors with the Beautiful Soup API.
        </p>
       </div>
      </div>
      <div class="section" id="modifying-the-tree">
       <h1>
        Modifying the tree
        <a class="headerlink" href="#modifying-the-tree" title="Permalink to this headline">
         ¶
        </a>
       </h1>
       <p>
        Beautiful Soup’s main strength is in searching the parse tree, but you
can also modify the tree and write your changes as a new HTML or XML
document.
       </p>
       <div class="section" id="changing-tag-names-and-attributes">
        <h2>
         Changing tag names and attributes
         <a class="headerlink" href="#changing-tag-names-and-attributes" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         I covered this earlier, in
         <a class="reference internal" href="#attributes">
          Attributes
         </a>
         , but it bears repeating. You
can rename a tag, change the values of its attributes, add new
attributes, and delete attributes:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="s1">'&lt;b class="boldest"&gt;Extremely bold&lt;/b&gt;'</span><span class="p">)</span>
<span class="n">tag</span> <span class="o">=</span> <span class="n">soup</span><span class="o">.</span><span class="n">b</span>

<span class="n">tag</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="s2">"blockquote"</span>
<span class="n">tag</span><span class="p">[</span><span class="s1">'class'</span><span class="p">]</span> <span class="o">=</span> <span class="s1">'verybold'</span>
<span class="n">tag</span><span class="p">[</span><span class="s1">'id'</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span>
<span class="n">tag</span>
<span class="c1"># &lt;blockquote class="verybold" id="1"&gt;Extremely bold&lt;/blockquote&gt;</span>

<span class="k">del</span> <span class="n">tag</span><span class="p">[</span><span class="s1">'class'</span><span class="p">]</span>
<span class="k">del</span> <span class="n">tag</span><span class="p">[</span><span class="s1">'id'</span><span class="p">]</span>
<span class="n">tag</span>
<span class="c1"># &lt;blockquote&gt;Extremely bold&lt;/blockquote&gt;</span>
</pre>
         </div>
        </div>
       </div>
       <div class="section" id="modifying-string">
        <h2>
         Modifying
         <code class="docutils literal">
          <span class="pre">
           .string
          </span>
         </code>
         <a class="headerlink" href="#modifying-string" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         If you set a tag’s
         <code class="docutils literal">
          <span class="pre">
           .string
          </span>
         </code>
         attribute, the tag’s contents are
replaced with the string you give:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">markup</span> <span class="o">=</span> <span class="s1">'&lt;a href="http://example.com/"&gt;I linked to &lt;i&gt;example.com&lt;/i&gt;&lt;/a&gt;'</span>
<span class="n">soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="n">markup</span><span class="p">)</span>

<span class="n">tag</span> <span class="o">=</span> <span class="n">soup</span><span class="o">.</span><span class="n">a</span>
<span class="n">tag</span><span class="o">.</span><span class="n">string</span> <span class="o">=</span> <span class="s2">"New link text."</span>
<span class="n">tag</span>
<span class="c1"># &lt;a href="http://example.com/"&gt;New link text.&lt;/a&gt;</span>
</pre>
         </div>
        </div>
        <p>
         Be careful: if the tag contained other tags, they and all their
contents will be destroyed.
        </p>
       </div>
       <div class="section" id="append">
        <h2>
         <code class="docutils literal">
          <span class="pre">
           append()
          </span>
         </code>
         <a class="headerlink" href="#append" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         You can add to a tag’s contents with
         <code class="docutils literal">
          <span class="pre">
           Tag.append()
          </span>
         </code>
         . It works just
like calling
         <code class="docutils literal">
          <span class="pre">
           .append()
          </span>
         </code>
         on a Python list:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="s2">"&lt;a&gt;Foo&lt;/a&gt;"</span><span class="p">)</span>
<span class="n">soup</span><span class="o">.</span><span class="n">a</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="s2">"Bar"</span><span class="p">)</span>

<span class="n">soup</span>
<span class="c1"># &lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;a&gt;FooBar&lt;/a&gt;&lt;/body&gt;&lt;/html&gt;</span>
<span class="n">soup</span><span class="o">.</span><span class="n">a</span><span class="o">.</span><span class="n">contents</span>
<span class="c1"># [u'Foo', u'Bar']</span>
</pre>
         </div>
        </div>
       </div>
       <div class="section" id="navigablestring-and-new-tag">
        <h2>
         <code class="docutils literal">
          <span class="pre">
           NavigableString()
          </span>
         </code>
         and
         <code class="docutils literal">
          <span class="pre">
           .new_tag()
          </span>
         </code>
         <a class="headerlink" href="#navigablestring-and-new-tag" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         If you need to add a string to a document, no problem–you can pass a
Python string in to
         <code class="docutils literal">
          <span class="pre">
           append()
          </span>
         </code>
         , or you can call the
         <code class="docutils literal">
          <span class="pre">
           NavigableString
          </span>
         </code>
         constructor:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="s2">"&lt;b&gt;&lt;/b&gt;"</span><span class="p">)</span>
<span class="n">tag</span> <span class="o">=</span> <span class="n">soup</span><span class="o">.</span><span class="n">b</span>
<span class="n">tag</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="s2">"Hello"</span><span class="p">)</span>
<span class="n">new_string</span> <span class="o">=</span> <span class="n">NavigableString</span><span class="p">(</span><span class="s2">" there"</span><span class="p">)</span>
<span class="n">tag</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">new_string</span><span class="p">)</span>
<span class="n">tag</span>
<span class="c1"># &lt;b&gt;Hello there.&lt;/b&gt;</span>
<span class="n">tag</span><span class="o">.</span><span class="n">contents</span>
<span class="c1"># [u'Hello', u' there']</span>
</pre>
         </div>
        </div>
        <p>
         If you want to create a comment or some other subclass of
         <code class="docutils literal">
          <span class="pre">
           NavigableString
          </span>
         </code>
         , just call the constructor:
        </p>
        <p class="important" id="com"><!--This is a comment--></p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="kn">from</span> <span class="nn">bs4</span> <span class="kn">import</span> <span class="n">Comment</span>
<span class="n">new_comment</span> <span class="o">=</span> <span class="n">Comment</span><span class="p">(</span><span class="s2">"Nice to see you."</span><span class="p">)</span>
<span class="n">tag</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">new_comment</span><span class="p">)</span>
<span class="n">tag</span>
<span class="c1"># &lt;b&gt;Hello there&lt;!--Nice to see you.--&gt;&lt;/b&gt;</span>
<span class="n">tag</span><span class="o">.</span><span class="n">contents</span>
<span class="c1"># [u'Hello', u' there', u'Nice to see you.']</span>
</pre>
         </div>
        </div>
        <p>
         (This is a new feature in Beautiful Soup 4.4.0.)
        </p>
        <p>
         What if you need to create a whole new tag?  The best solution is to
call the factory method
         <code class="docutils literal">
          <span class="pre">
           BeautifulSoup.new_tag()
          </span>
         </code>
         :
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="s2">"&lt;b&gt;&lt;/b&gt;"</span><span class="p">)</span>
<span class="n">original_tag</span> <span class="o">=</span> <span class="n">soup</span><span class="o">.</span><span class="n">b</span>

<span class="n">new_tag</span> <span class="o">=</span> <span class="n">soup</span><span class="o">.</span><span class="n">new_tag</span><span class="p">(</span><span class="s2">"a"</span><span class="p">,</span> <span class="n">href</span><span class="o">=</span><span class="s2">"http://www.example.com"</span><span class="p">)</span>
<span class="n">original_tag</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">new_tag</span><span class="p">)</span>
<span class="n">original_tag</span>
<span class="c1"># &lt;b&gt;&lt;a href="http://www.example.com"&gt;&lt;/a&gt;&lt;/b&gt;</span>

<span class="n">new_tag</span><span class="o">.</span><span class="n">string</span> <span class="o">=</span> <span class="s2">"Link text."</span>
<span class="n">original_tag</span>
<span class="c1"># &lt;b&gt;&lt;a href="http://www.example.com"&gt;Link text.&lt;/a&gt;&lt;/b&gt;</span>
</pre>
         </div>
        </div>
        <p>
         Only the first argument, the tag name, is required.
        </p>
       </div>
       <div class="section" id="insert">
        <h2>
         <code class="docutils literal">
          <span class="pre">
           insert()
          </span>
         </code>
         <a class="headerlink" href="#insert" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         <code class="docutils literal">
          <span class="pre">
           Tag.insert()
          </span>
         </code>
         is just like
         <code class="docutils literal">
          <span class="pre">
           Tag.append()
          </span>
         </code>
         , except the new element
doesn’t necessarily go at the end of its parent’s
         <code class="docutils literal">
          <span class="pre">
           .contents
          </span>
         </code>
         . It’ll be inserted at whatever numeric position you
say. It works just like
         <code class="docutils literal">
          <span class="pre">
           .insert()
          </span>
         </code>
         on a Python list:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">markup</span> <span class="o">=</span> <span class="s1">'&lt;a href="http://example.com/"&gt;I linked to &lt;i&gt;example.com&lt;/i&gt;&lt;/a&gt;'</span>
<span class="n">soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="n">markup</span><span class="p">)</span>
<span class="n">tag</span> <span class="o">=</span> <span class="n">soup</span><span class="o">.</span><span class="n">a</span>

<span class="n">tag</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s2">"but did not endorse "</span><span class="p">)</span>
<span class="n">tag</span>
<span class="c1"># &lt;a href="http://example.com/"&gt;I linked to but did not endorse &lt;i&gt;example.com&lt;/i&gt;&lt;/a&gt;</span>
<span class="n">tag</span><span class="o">.</span><span class="n">contents</span>
<span class="c1"># [u'I linked to ', u'but did not endorse', &lt;i&gt;example.com&lt;/i&gt;]</span>
</pre>
         </div>
        </div>
       </div>
       <div class="section" id="insert-before-and-insert-after">
        <h2>
         <code class="docutils literal">
          <span class="pre">
           insert_before()
          </span>
         </code>
         and
         <code class="docutils literal">
          <span class="pre">
           insert_after()
          </span>
         </code>
         <a class="headerlink" href="#insert-before-and-insert-after" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         The
         <code class="docutils literal">
          <span class="pre">
           insert_before()
          </span>
         </code>
         method inserts a tag or string immediately
before something else in the parse tree:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="s2">"&lt;b&gt;stop&lt;/b&gt;"</span><span class="p">)</span>
<span class="n">tag</span> <span class="o">=</span> <span class="n">soup</span><span class="o">.</span><span class="n">new_tag</span><span class="p">(</span><span class="s2">"i"</span><span class="p">)</span>
<span class="n">tag</span><span class="o">.</span><span class="n">string</span> <span class="o">=</span> <span class="s2">"Don't"</span>
<span class="n">soup</span><span class="o">.</span><span class="n">b</span><span class="o">.</span><span class="n">string</span><span class="o">.</span><span class="n">insert_before</span><span class="p">(</span><span class="n">tag</span><span class="p">)</span>
<span class="n">soup</span><span class="o">.</span><span class="n">b</span>
<span class="c1"># &lt;b&gt;&lt;i&gt;Don't&lt;/i&gt;stop&lt;/b&gt;</span>
</pre>
         </div>
        </div>
        <p>
         The
         <code class="docutils literal">
          <span class="pre">
           insert_after()
          </span>
         </code>
         method moves a tag or string so that it
immediately follows something else in the parse tree:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">soup</span><span class="o">.</span><span class="n">b</span><span class="o">.</span><span class="n">i</span><span class="o">.</span><span class="n">insert_after</span><span class="p">(</span><span class="n">soup</span><span class="o">.</span><span class="n">new_string</span><span class="p">(</span><span class="s2">" ever "</span><span class="p">))</span>
<span class="n">soup</span><span class="o">.</span><span class="n">b</span>
<span class="c1"># &lt;b&gt;&lt;i&gt;Don't&lt;/i&gt; ever stop&lt;/b&gt;</span>
<span class="n">soup</span><span class="o">.</span><span class="n">b</span><span class="o">.</span><span class="n">contents</span>
<span class="c1"># [&lt;i&gt;Don't&lt;/i&gt;, u' ever ', u'stop']</span>
</pre>
         </div>
        </div>
       </div>
       <div class="section" id="clear">
        <h2>
         <code class="docutils literal">
          <span class="pre">
           clear()
          </span>
         </code>
         <a class="headerlink" href="#clear" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         <code class="docutils literal">
          <span class="pre">
           Tag.clear()
          </span>
         </code>
         removes the contents of a tag:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">markup</span> <span class="o">=</span> <span class="s1">'&lt;a href="http://example.com/"&gt;I linked to &lt;i&gt;example.com&lt;/i&gt;&lt;/a&gt;'</span>
<span class="n">soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="n">markup</span><span class="p">)</span>
<span class="n">tag</span> <span class="o">=</span> <span class="n">soup</span><span class="o">.</span><span class="n">a</span>

<span class="n">tag</span><span class="o">.</span><span class="n">clear</span><span class="p">()</span>
<span class="n">tag</span>
<span class="c1"># &lt;a href="http://example.com/"&gt;&lt;/a&gt;</span>
</pre>
         </div>
        </div>
       </div>
       <div class="section" id="extract">
        <h2>
         <code class="docutils literal">
          <span class="pre">
           extract()
          </span>
         </code>
         <a class="headerlink" href="#extract" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         <code class="docutils literal">
          <span class="pre">
           PageElement.extract()
          </span>
         </code>
         removes a tag or string from the tree. It
returns the tag or string that was extracted:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">markup</span> <span class="o">=</span> <span class="s1">'&lt;a href="http://example.com/"&gt;I linked to &lt;i&gt;example.com&lt;/i&gt;&lt;/a&gt;'</span>
<span class="n">soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="n">markup</span><span class="p">)</span>
<span class="n">a_tag</span> <span class="o">=</span> <span class="n">soup</span><span class="o">.</span><span class="n">a</span>

<span class="n">i_tag</span> <span class="o">=</span> <span class="n">soup</span><span class="o">.</span><span class="n">i</span><span class="o">.</span><span class="n">extract</span><span class="p">()</span>

<span class="n">a_tag</span>
<span class="c1"># &lt;a href="http://example.com/"&gt;I linked to&lt;/a&gt;</span>

<span class="n">i_tag</span>
<span class="c1"># &lt;i&gt;example.com&lt;/i&gt;</span>

<span class="k">print</span><span class="p">(</span><span class="n">i_tag</span><span class="o">.</span><span class="n">parent</span><span class="p">)</span>
<span class="bp">None</span>
</pre>
         </div>
        </div>
        <p>
         At this point you effectively have two parse trees: one rooted at the
         <code class="docutils literal">
          <span class="pre">
           BeautifulSoup
          </span>
         </code>
         object you used to parse the document, and one rooted
at the tag that was extracted. You can go on to call
         <code class="docutils literal">
          <span class="pre">
           extract
          </span>
         </code>
         on
a child of the element you extracted:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">my_string</span> <span class="o">=</span> <span class="n">i_tag</span><span class="o">.</span><span class="n">string</span><span class="o">.</span><span class="n">extract</span><span class="p">()</span>
<span class="n">my_string</span>
<span class="c1"># u'example.com'</span>

<span class="k">print</span><span class="p">(</span><span class="n">my_string</span><span class="o">.</span><span class="n">parent</span><span class="p">)</span>
<span class="c1"># None</span>
<span class="n">i_tag</span>
<span class="c1"># &lt;i&gt;&lt;/i&gt;</span>
</pre>
         </div>
        </div>
       </div>
       <div class="section" id="decompose">
        <h2>
         <code class="docutils literal">
          <span class="pre">
           decompose()
          </span>
         </code>
         <a class="headerlink" href="#decompose" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         <code class="docutils literal">
          <span class="pre">
           Tag.decompose()
          </span>
         </code>
         removes a tag from the tree, then
         <cite>
          completely
destroys it and its contents
         </cite>
         :
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">markup</span> <span class="o">=</span> <span class="s1">'&lt;a href="http://example.com/"&gt;I linked to &lt;i&gt;example.com&lt;/i&gt;&lt;/a&gt;'</span>
<span class="n">soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="n">markup</span><span class="p">)</span>
<span class="n">a_tag</span> <span class="o">=</span> <span class="n">soup</span><span class="o">.</span><span class="n">a</span>

<span class="n">soup</span><span class="o">.</span><span class="n">i</span><span class="o">.</span><span class="n">decompose</span><span class="p">()</span>

<span class="n">a_tag</span>
<span class="c1"># &lt;a href="http://example.com/"&gt;I linked to&lt;/a&gt;</span>
</pre>
         </div>
        </div>
       </div>
       <div class="section" id="replace-with">
        <span id="id13">
        </span>
        <h2>
         <code class="docutils literal">
          <span class="pre">
           replace_with()
          </span>
         </code>
         <a class="headerlink" href="#replace-with" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         <code class="docutils literal">
          <span class="pre">
           PageElement.replace_with()
          </span>
         </code>
         removes a tag or string from the tree,
and replaces it with the tag or string of your choice:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">markup</span> <span class="o">=</span> <span class="s1">'&lt;a href="http://example.com/"&gt;I linked to &lt;i&gt;example.com&lt;/i&gt;&lt;/a&gt;'</span>
<span class="n">soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="n">markup</span><span class="p">)</span>
<span class="n">a_tag</span> <span class="o">=</span> <span class="n">soup</span><span class="o">.</span><span class="n">a</span>

<span class="n">new_tag</span> <span class="o">=</span> <span class="n">soup</span><span class="o">.</span><span class="n">new_tag</span><span class="p">(</span><span class="s2">"b"</span><span class="p">)</span>
<span class="n">new_tag</span><span class="o">.</span><span class="n">string</span> <span class="o">=</span> <span class="s2">"example.net"</span>
<span class="n">a_tag</span><span class="o">.</span><span class="n">i</span><span class="o">.</span><span class="n">replace_with</span><span class="p">(</span><span class="n">new_tag</span><span class="p">)</span>

<span class="n">a_tag</span>
<span class="c1"># &lt;a href="http://example.com/"&gt;I linked to &lt;b&gt;example.net&lt;/b&gt;&lt;/a&gt;</span>
</pre>
         </div>
        </div>
        <p>
         <code class="docutils literal">
          <span class="pre">
           replace_with()
          </span>
         </code>
         returns the tag or string that was replaced, so
that you can examine it or add it back to another part of the tree.
        </p>
       </div>
       <div class="section" id="wrap">
        <h2>
         <code class="docutils literal">
          <span class="pre">
           wrap()
          </span>
         </code>
         <a class="headerlink" href="#wrap" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         <code class="docutils literal">
          <span class="pre">
           PageElement.wrap()
          </span>
         </code>
         wraps an element in the tag you specify. It
returns the new wrapper:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre>soup = BeautifulSoup("&lt;p&gt;I wish I was bold.&lt;/p&gt;")
soup.p.string.wrap(soup.new_tag("b"))
# &lt;b&gt;I wish I was bold.&lt;/b&gt;

soup.p.wrap(soup.new_tag("div")
# &lt;div&gt;&lt;p&gt;&lt;b&gt;I wish I was bold.&lt;/b&gt;&lt;/p&gt;&lt;/div&gt;
</pre>
         </div>
        </div>
        <p>
         This method is new in Beautiful Soup 4.0.5.
        </p>
       </div>
       <div class="section" id="unwrap">
        <h2>
         <code class="docutils literal">
          <span class="pre">
           unwrap()
          </span>
         </code>
         <a class="headerlink" href="#unwrap" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         <code class="docutils literal">
          <span class="pre">
           Tag.unwrap()
          </span>
         </code>
         is the opposite of
         <code class="docutils literal">
          <span class="pre">
           wrap()
          </span>
         </code>
         . It replaces a tag with
whatever’s inside that tag. It’s good for stripping out markup:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">markup</span> <span class="o">=</span> <span class="s1">'&lt;a href="http://example.com/"&gt;I linked to &lt;i&gt;example.com&lt;/i&gt;&lt;/a&gt;'</span>
<span class="n">soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="n">markup</span><span class="p">)</span>
<span class="n">a_tag</span> <span class="o">=</span> <span class="n">soup</span><span class="o">.</span><span class="n">a</span>

<span class="n">a_tag</span><span class="o">.</span><span class="n">i</span><span class="o">.</span><span class="n">unwrap</span><span class="p">()</span>
<span class="n">a_tag</span>
<span class="c1"># &lt;a href="http://example.com/"&gt;I linked to example.com&lt;/a&gt;</span>
</pre>
         </div>
        </div>
        <p>
         Like
         <code class="docutils literal">
          <span class="pre">
           replace_with()
          </span>
         </code>
         ,
         <code class="docutils literal">
          <span class="pre">
           unwrap()
          </span>
         </code>
         returns the tag
that was replaced.
        </p>
       </div>
      </div>
      <div class="section" id="output">
       <h1>
        Output
        <a class="headerlink" href="#output" title="Permalink to this headline">
         ¶
        </a>
       </h1>
       <div class="section" id="pretty-printing">
        <span id="prettyprinting">
        </span>
        <h2>
         Pretty-printing
         <a class="headerlink" href="#pretty-printing" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         The
         <code class="docutils literal">
          <span class="pre">
           prettify()
          </span>
         </code>
         method will turn a Beautiful Soup parse tree into a
nicely formatted Unicode string, with each HTML/XML tag on its own line:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">markup</span> <span class="o">=</span> <span class="s1">'&lt;a href="http://example.com/"&gt;I linked to &lt;i&gt;example.com&lt;/i&gt;&lt;/a&gt;'</span>
<span class="n">soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="n">markup</span><span class="p">)</span>
<span class="n">soup</span><span class="o">.</span><span class="n">prettify</span><span class="p">()</span>
<span class="c1"># '&lt;html&gt;\n &lt;head&gt;\n &lt;/head&gt;\n &lt;body&gt;\n  &lt;a href="http://example.com/"&gt;\n...'</span>

<span class="k">print</span><span class="p">(</span><span class="n">soup</span><span class="o">.</span><span class="n">prettify</span><span class="p">())</span>
<span class="c1"># &lt;html&gt;</span>
<span class="c1">#  &lt;head&gt;</span>
<span class="c1">#  &lt;/head&gt;</span>
<span class="c1">#  &lt;body&gt;</span>
<span class="c1">#   &lt;a href="http://example.com/"&gt;</span>
<span class="c1">#    I linked to</span>
<span class="c1">#    &lt;i&gt;</span>
<span class="c1">#     example.com</span>
<span class="c1">#    &lt;/i&gt;</span>
<span class="c1">#   &lt;/a&gt;</span>
<span class="c1">#  &lt;/body&gt;</span>
<span class="c1"># &lt;/html&gt;</span>
</pre>
         </div>
        </div>
        <p>
         You can call
         <code class="docutils literal">
          <span class="pre">
           prettify()
          </span>
         </code>
         on the top-level
         <code class="docutils literal">
          <span class="pre">
           BeautifulSoup
          </span>
         </code>
         object,
or on any of its
         <code class="docutils literal">
          <span class="pre">
           Tag
          </span>
         </code>
         objects:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="k">print</span><span class="p">(</span><span class="n">soup</span><span class="o">.</span><span class="n">a</span><span class="o">.</span><span class="n">prettify</span><span class="p">())</span>
<span class="c1"># &lt;a href="http://example.com/"&gt;</span>
<span class="c1">#  I linked to</span>
<span class="c1">#  &lt;i&gt;</span>
<span class="c1">#   example.com</span>
<span class="c1">#  &lt;/i&gt;</span>
<span class="c1"># &lt;/a&gt;</span>
</pre>
         </div>
        </div>
       </div>
       <div class="section" id="non-pretty-printing">
        <h2>
         Non-pretty printing
         <a class="headerlink" href="#non-pretty-printing" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         If you just want a string, with no fancy formatting, you can call
         <code class="docutils literal">
          <span class="pre">
           unicode()
          </span>
         </code>
         or
         <code class="docutils literal">
          <span class="pre">
           str()
          </span>
         </code>
         on a
         <code class="docutils literal">
          <span class="pre">
           BeautifulSoup
          </span>
         </code>
         object, or a
         <code class="docutils literal">
          <span class="pre">
           Tag
          </span>
         </code>
         within it:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="nb">str</span><span class="p">(</span><span class="n">soup</span><span class="p">)</span>
<span class="c1"># '&lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;a href="http://example.com/"&gt;I linked to &lt;i&gt;example.com&lt;/i&gt;&lt;/a&gt;&lt;/body&gt;&lt;/html&gt;'</span>

<span class="nb">unicode</span><span class="p">(</span><span class="n">soup</span><span class="o">.</span><span class="n">a</span><span class="p">)</span>
<span class="c1"># u'&lt;a href="http://example.com/"&gt;I linked to &lt;i&gt;example.com&lt;/i&gt;&lt;/a&gt;'</span>
</pre>
         </div>
        </div>
        <p>
         The
         <code class="docutils literal">
          <span class="pre">
           str()
          </span>
         </code>
         function returns a string encoded in UTF-8. See
         <a class="reference internal" href="#encodings">
          Encodings
         </a>
         for other options.
        </p>
        <p>
         You can also call
         <code class="docutils literal">
          <span class="pre">
           encode()
          </span>
         </code>
         to get a bytestring, and
         <code class="docutils literal">
          <span class="pre">
           decode()
          </span>
         </code>
         to get Unicode.
        </p>
       </div>
       <div class="section" id="output-formatters">
        <span id="id14">
        </span>
        <h2>
         Output formatters
         <a class="headerlink" href="#output-formatters" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         If you give Beautiful Soup a document that contains HTML entities like
“&amp;lquot;”, they’ll be converted to Unicode characters:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="s2">"&amp;ldquo;Dammit!&amp;rdquo; he said."</span><span class="p">)</span>
<span class="nb">unicode</span><span class="p">(</span><span class="n">soup</span><span class="p">)</span>
<span class="c1"># u'&lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;\u201cDammit!\u201d he said.&lt;/body&gt;&lt;/html&gt;'</span>
</pre>
         </div>
        </div>
        <p>
         If you then convert the document to a string, the Unicode characters
will be encoded as UTF-8. You won’t get the HTML entities back:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="nb">str</span><span class="p">(</span><span class="n">soup</span><span class="p">)</span>
<span class="c1"># '&lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;\xe2\x80\x9cDammit!\xe2\x80\x9d he said.&lt;/body&gt;&lt;/html&gt;'</span>
</pre>
         </div>
        </div>
        <p>
         By default, the only characters that are escaped upon output are bare
ampersands and angle brackets. These get turned into “&amp;amp;”, “&amp;lt;”,
and “&amp;gt;”, so that Beautiful Soup doesn’t inadvertently generate
invalid HTML or XML:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="s2">"&lt;p&gt;The law firm of Dewey, Cheatem, &amp; Howe&lt;/p&gt;"</span><span class="p">)</span>
<span class="n">soup</span><span class="o">.</span><span class="n">p</span>
<span class="c1"># &lt;p&gt;The law firm of Dewey, Cheatem, &amp;amp; Howe&lt;/p&gt;</span>

<span class="n">soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="s1">'&lt;a href="http://example.com/?foo=val1&amp;bar=val2"&gt;A link&lt;/a&gt;'</span><span class="p">)</span>
<span class="n">soup</span><span class="o">.</span><span class="n">a</span>
<span class="c1"># &lt;a href="http://example.com/?foo=val1&amp;amp;bar=val2"&gt;A link&lt;/a&gt;</span>
</pre>
         </div>
        </div>
        <p>
         You can change this behavior by providing a value for the
         <code class="docutils literal">
          <span class="pre">
           formatter
          </span>
         </code>
         argument to
         <code class="docutils literal">
          <span class="pre">
           prettify()
          </span>
         </code>
         ,
         <code class="docutils literal">
          <span class="pre">
           encode()
          </span>
         </code>
         , or
         <code class="docutils literal">
          <span class="pre">
           decode()
          </span>
         </code>
         . Beautiful Soup recognizes four possible values for
         <code class="docutils literal">
          <span class="pre">
           formatter
          </span>
         </code>
         .
        </p>
        <p>
         The default is
         <code class="docutils literal">
          <span class="pre">
           formatter="minimal"
          </span>
         </code>
         . Strings will only be processed
enough to ensure that Beautiful Soup generates valid HTML/XML:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">french</span> <span class="o">=</span> <span class="s2">"&lt;p&gt;Il a dit &amp;lt;&amp;lt;Sacr&amp;eacute; bleu!&amp;gt;&amp;gt;&lt;/p&gt;"</span>
<span class="n">soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="n">french</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">soup</span><span class="o">.</span><span class="n">prettify</span><span class="p">(</span><span class="n">formatter</span><span class="o">=</span><span class="s2">"minimal"</span><span class="p">))</span>
<span class="c1"># &lt;html&gt;</span>
<span class="c1">#  &lt;body&gt;</span>
<span class="c1">#   &lt;p&gt;</span>
<span class="c1">#    Il a dit &amp;lt;&amp;lt;Sacré bleu!&amp;gt;&amp;gt;</span>
<span class="c1">#   &lt;/p&gt;</span>
<span class="c1">#  &lt;/body&gt;</span>
<span class="c1"># &lt;/html&gt;</span>
</pre>
         </div>
        </div>
        <p>
         If you pass in
         <code class="docutils literal">
          <span class="pre">
           formatter="html"
          </span>
         </code>
         , Beautiful Soup will convert
Unicode characters to HTML entities whenever possible:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="k">print</span><span class="p">(</span><span class="n">soup</span><span class="o">.</span><span class="n">prettify</span><span class="p">(</span><span class="n">formatter</span><span class="o">=</span><span class="s2">"html"</span><span class="p">))</span>
<span class="c1"># &lt;html&gt;</span>
<span class="c1">#  &lt;body&gt;</span>
<span class="c1">#   &lt;p&gt;</span>
<span class="c1">#    Il a dit &amp;lt;&amp;lt;Sacr&amp;eacute; bleu!&amp;gt;&amp;gt;</span>
<span class="c1">#   &lt;/p&gt;</span>
<span class="c1">#  &lt;/body&gt;</span>
<span class="c1"># &lt;/html&gt;</span>
</pre>
         </div>
        </div>
        <p>
         If you pass in
         <code class="docutils literal">
          <span class="pre">
           formatter=None
          </span>
         </code>
         , Beautiful Soup will not modify
strings at all on output. This is the fastest option, but it may lead
to Beautiful Soup generating invalid HTML/XML, as in these examples:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="k">print</span><span class="p">(</span><span class="n">soup</span><span class="o">.</span><span class="n">prettify</span><span class="p">(</span><span class="n">formatter</span><span class="o">=</span><span class="bp">None</span><span class="p">))</span>
<span class="c1"># &lt;html&gt;</span>
<span class="c1">#  &lt;body&gt;</span>
<span class="c1">#   &lt;p&gt;</span>
<span class="c1">#    Il a dit &lt;&lt;Sacré bleu!&gt;&gt;</span>
<span class="c1">#   &lt;/p&gt;</span>
<span class="c1">#  &lt;/body&gt;</span>
<span class="c1"># &lt;/html&gt;</span>

<span class="n">link_soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="s1">'&lt;a href="http://example.com/?foo=val1&amp;bar=val2"&gt;A link&lt;/a&gt;'</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">link_soup</span><span class="o">.</span><span class="n">a</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="n">formatter</span><span class="o">=</span><span class="bp">None</span><span class="p">))</span>
<span class="c1"># &lt;a href="http://example.com/?foo=val1&amp;bar=val2"&gt;A link&lt;/a&gt;</span>
</pre>
         </div>
        </div>
        <p>
         Finally, if you pass in a function for
         <code class="docutils literal">
          <span class="pre">
           formatter
          </span>
         </code>
         , Beautiful Soup
will call that function once for every string and attribute value in
the document. You can do whatever you want in this function. Here’s a
formatter that converts strings to uppercase and does absolutely
nothing else:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="k">def</span> <span class="nf">uppercase</span><span class="p">(</span><span class="nb">str</span><span class="p">):</span>
    <span class="k">return</span> <span class="nb">str</span><span class="o">.</span><span class="n">upper</span><span class="p">()</span>

<span class="k">print</span><span class="p">(</span><span class="n">soup</span><span class="o">.</span><span class="n">prettify</span><span class="p">(</span><span class="n">formatter</span><span class="o">=</span><span class="n">uppercase</span><span class="p">))</span>
<span class="c1"># &lt;html&gt;</span>
<span class="c1">#  &lt;body&gt;</span>
<span class="c1">#   &lt;p&gt;</span>
<span class="c1">#    IL A DIT &lt;&lt;SACRÉ BLEU!&gt;&gt;</span>
<span class="c1">#   &lt;/p&gt;</span>
<span class="c1">#  &lt;/body&gt;</span>
<span class="c1"># &lt;/html&gt;</span>

<span class="k">print</span><span class="p">(</span><span class="n">link_soup</span><span class="o">.</span><span class="n">a</span><span class="o">.</span><span class="n">prettify</span><span class="p">(</span><span class="n">formatter</span><span class="o">=</span><span class="n">uppercase</span><span class="p">))</span>
<span class="c1"># &lt;a href="HTTP://EXAMPLE.COM/?FOO=VAL1&amp;BAR=VAL2"&gt;</span>
<span class="c1">#  A LINK</span>
<span class="c1"># &lt;/a&gt;</span>
</pre>
         </div>
        </div>
        <p>
         If you’re writing your own function, you should know about the
         <code class="docutils literal">
          <span class="pre">
           EntitySubstitution
          </span>
         </code>
         class in the
         <code class="docutils literal">
          <span class="pre">
           bs4.dammit
          </span>
         </code>
         module. This class
implements Beautiful Soup’s standard formatters as class methods: the
“html” formatter is
         <code class="docutils literal">
          <span class="pre">
           EntitySubstitution.substitute_html
          </span>
         </code>
         , and the
“minimal” formatter is
         <code class="docutils literal">
          <span class="pre">
           EntitySubstitution.substitute_xml
          </span>
         </code>
         . You can
use these functions to simulate
         <code class="docutils literal">
          <span class="pre">
           formatter=html
          </span>
         </code>
         or
         <code class="docutils literal">
          <span class="pre">
           formatter==minimal
          </span>
         </code>
         , but then do something extra.
        </p>
        <p>
         Here’s an example that replaces Unicode characters with HTML entities
whenever possible, but
         <cite>
          also
         </cite>
         converts all strings to uppercase:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="kn">from</span> <span class="nn">bs4.dammit</span> <span class="kn">import</span> <span class="n">EntitySubstitution</span>
<span class="k">def</span> <span class="nf">uppercase_and_substitute_html_entities</span><span class="p">(</span><span class="nb">str</span><span class="p">):</span>
    <span class="k">return</span> <span class="n">EntitySubstitution</span><span class="o">.</span><span class="n">substitute_html</span><span class="p">(</span><span class="nb">str</span><span class="o">.</span><span class="n">upper</span><span class="p">())</span>

<span class="k">print</span><span class="p">(</span><span class="n">soup</span><span class="o">.</span><span class="n">prettify</span><span class="p">(</span><span class="n">formatter</span><span class="o">=</span><span class="n">uppercase_and_substitute_html_entities</span><span class="p">))</span>
<span class="c1"># &lt;html&gt;</span>
<span class="c1">#  &lt;body&gt;</span>
<span class="c1">#   &lt;p&gt;</span>
<span class="c1">#    IL A DIT &amp;lt;&amp;lt;SACR&amp;Eacute; BLEU!&amp;gt;&amp;gt;</span>
<span class="c1">#   &lt;/p&gt;</span>
<span class="c1">#  &lt;/body&gt;</span>
<span class="c1"># &lt;/html&gt;</span>
</pre>
         </div>
        </div>
        <p>
         One last caveat: if you create a
         <code class="docutils literal">
          <span class="pre">
           CData
          </span>
         </code>
         object, the text inside
that object is always presented
         <cite>
          exactly as it appears, with no
formatting
         </cite>
         . Beautiful Soup will call the formatter method, just in
case you’ve written a custom method that counts all the strings in the
document or something, but it will ignore the return value:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="kn">from</span> <span class="nn">bs4.element</span> <span class="kn">import</span> <span class="n">CData</span>
<span class="n">soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="s2">"&lt;a&gt;&lt;/a&gt;"</span><span class="p">)</span>
<span class="n">soup</span><span class="o">.</span><span class="n">a</span><span class="o">.</span><span class="n">string</span> <span class="o">=</span> <span class="n">CData</span><span class="p">(</span><span class="s2">"one &lt; three"</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">soup</span><span class="o">.</span><span class="n">a</span><span class="o">.</span><span class="n">prettify</span><span class="p">(</span><span class="n">formatter</span><span class="o">=</span><span class="s2">"xml"</span><span class="p">))</span>
<span class="c1"># &lt;a&gt;</span>
<span class="c1">#  &lt;![CDATA[one &lt; three]]&gt;</span>
<span class="c1"># &lt;/a&gt;</span>
</pre>
         </div>
        </div>
       </div>
       <div class="section" id="get-text">
        <h2>
         <code class="docutils literal">
          <span class="pre">
           get_text()
          </span>
         </code>
         <a class="headerlink" href="#get-text" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         If you only want the text part of a document or tag, you can use the
         <code class="docutils literal">
          <span class="pre">
           get_text()
          </span>
         </code>
         method. It returns all the text in a document or
beneath a tag, as a single Unicode string:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">markup</span> <span class="o">=</span> <span class="s1">'&lt;a href="http://example.com/"&gt;</span><span class="se">\n</span><span class="s1">I linked to &lt;i&gt;example.com&lt;/i&gt;</span><span class="se">\n</span><span class="s1">&lt;/a&gt;'</span>
<span class="n">soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="n">markup</span><span class="p">)</span>

<span class="n">soup</span><span class="o">.</span><span class="n">get_text</span><span class="p">()</span>
<span class="s1">u'</span><span class="se">\n</span><span class="s1">I linked to example.com</span><span class="se">\n</span><span class="s1">'</span>
<span class="n">soup</span><span class="o">.</span><span class="n">i</span><span class="o">.</span><span class="n">get_text</span><span class="p">()</span>
<span class="s1">u'example.com'</span>
</pre>
         </div>
        </div>
        <p>
         You can specify a string to be used to join the bits of text
together:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="c1"># soup.get_text("|")</span>
<span class="s1">u'</span><span class="se">\n</span><span class="s1">I linked to |example.com|</span><span class="se">\n</span><span class="s1">'</span>
</pre>
         </div>
        </div>
        <p>
         You can tell Beautiful Soup to strip whitespace from the beginning and
end of each bit of text:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="c1"># soup.get_text("|", strip=True)</span>
<span class="s1">u'I linked to|example.com'</span>
</pre>
         </div>
        </div>
        <p>
         But at that point you might want to use the
         <a class="reference internal" href="#string-generators">
          <span>
           .stripped_strings
          </span>
         </a>
         generator instead, and process the text yourself:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="p">[</span><span class="n">text</span> <span class="k">for</span> <span class="n">text</span> <span class="ow">in</span> <span class="n">soup</span><span class="o">.</span><span class="n">stripped_strings</span><span class="p">]</span>
<span class="c1"># [u'I linked to', u'example.com']</span>
</pre>
         </div>
        </div>
       </div>
      </div>
      <div class="section" id="specifying-the-parser-to-use">
       <h1>
        Specifying the parser to use
        <a class="headerlink" href="#specifying-the-parser-to-use" title="Permalink to this headline">
         ¶
        </a>
       </h1>
       <p>
        If you just need to parse some HTML, you can dump the markup into the
        <code class="docutils literal">
         <span class="pre">
          BeautifulSoup
         </span>
        </code>
        constructor, and it’ll probably be fine. Beautiful
Soup will pick a parser for you and parse the data. But there are a
few additional arguments you can pass in to the constructor to change
which parser is used.
       </p>
       <p>
        The first argument to the
        <code class="docutils literal">
         <span class="pre">
          BeautifulSoup
         </span>
        </code>
        constructor is a string or
an open filehandle–the markup you want parsed. The second argument is
        <cite>
         how
        </cite>
        you’d like the markup parsed.
       </p>
       <p>
        If you don’t specify anything, you’ll get the best HTML parser that’s
installed. Beautiful Soup ranks lxml’s parser as being the best, then
html5lib’s, then Python’s built-in parser. You can override this by
specifying one of the following:
       </p>
       <ul class="simple">
        <li>
         What type of markup you want to parse. Currently supported are
“html”, “xml”, and “html5”.
        </li>
        <li>
         The name of the parser library you want to use. Currently supported
options are “lxml”, “html5lib”, and “html.parser” (Python’s
built-in HTML parser).
        </li>
       </ul>
       <p>
        The section
        <a class="reference internal" href="#installing-a-parser">
         Installing a parser
        </a>
        contrasts the supported parsers.
       </p>
       <p>
        If you don’t have an appropriate parser installed, Beautiful Soup will
ignore your request and pick a different parser. Right now, the only
supported XML parser is lxml. If you don’t have lxml installed, asking
for an XML parser won’t give you one, and asking for “lxml” won’t work
either.
       </p>
       <div class="section" id="differences-between-parsers">
        <h2>
         Differences between parsers
         <a class="headerlink" href="#differences-between-parsers" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         Beautiful Soup presents the same interface to a number of different
parsers, but each parser is different. Different parsers will create
different parse trees from the same document. The biggest differences
are between the HTML parsers and the XML parsers. Here’s a short
document, parsed as HTML:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">BeautifulSoup</span><span class="p">(</span><span class="s2">"&lt;a&gt;&lt;b /&gt;&lt;/a&gt;"</span><span class="p">)</span>
<span class="c1"># &lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;a&gt;&lt;b&gt;&lt;/b&gt;&lt;/a&gt;&lt;/body&gt;&lt;/html&gt;</span>
</pre>
         </div>
        </div>
        <p>
         Since an empty &lt;b /&gt; tag is not valid HTML, the parser turns it into a
&lt;b&gt;&lt;/b&gt; tag pair.
        </p>
        <p>
         Here’s the same document parsed as XML (running this requires that you
have lxml installed). Note that the empty &lt;b /&gt; tag is left alone, and
that the document is given an XML declaration instead of being put
into an &lt;html&gt; tag.:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">BeautifulSoup</span><span class="p">(</span><span class="s2">"&lt;a&gt;&lt;b /&gt;&lt;/a&gt;"</span><span class="p">,</span> <span class="s2">"xml"</span><span class="p">)</span>
<span class="c1"># &lt;?xml version="1.0" encoding="utf-8"?&gt;</span>
<span class="c1"># &lt;a&gt;&lt;b/&gt;&lt;/a&gt;</span>
</pre>
         </div>
        </div>
        <p>
         There are also differences between HTML parsers. If you give Beautiful
Soup a perfectly-formed HTML document, these differences won’t
matter. One parser will be faster than another, but they’ll all give
you a data structure that looks exactly like the original HTML
document.
        </p>
        <p>
         But if the document is not perfectly-formed, different parsers will
give different results. Here’s a short, invalid document parsed using
lxml’s HTML parser. Note that the dangling &lt;/p&gt; tag is simply
ignored:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">BeautifulSoup</span><span class="p">(</span><span class="s2">"&lt;a&gt;&lt;/p&gt;"</span><span class="p">,</span> <span class="s2">"lxml"</span><span class="p">)</span>
<span class="c1"># &lt;html&gt;&lt;body&gt;&lt;a&gt;&lt;/a&gt;&lt;/body&gt;&lt;/html&gt;</span>
</pre>
         </div>
        </div>
        <p>
         Here’s the same document parsed using html5lib:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">BeautifulSoup</span><span class="p">(</span><span class="s2">"&lt;a&gt;&lt;/p&gt;"</span><span class="p">,</span> <span class="s2">"html5lib"</span><span class="p">)</span>
<span class="c1"># &lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;a&gt;&lt;p&gt;&lt;/p&gt;&lt;/a&gt;&lt;/body&gt;&lt;/html&gt;</span>
</pre>
         </div>
        </div>
        <p>
         Instead of ignoring the dangling &lt;/p&gt; tag, html5lib pairs it with an
opening &lt;p&gt; tag. This parser also adds an empty &lt;head&gt; tag to the
document.
        </p>
        <p>
         Here’s the same document parsed with Python’s built-in HTML
parser:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">BeautifulSoup</span><span class="p">(</span><span class="s2">"&lt;a&gt;&lt;/p&gt;"</span><span class="p">,</span> <span class="s2">"html.parser"</span><span class="p">)</span>
<span class="c1"># &lt;a&gt;&lt;/a&gt;</span>
</pre>
         </div>
        </div>
        <p>
         Like html5lib, this parser ignores the closing &lt;/p&gt; tag. Unlike
html5lib, this parser makes no attempt to create a well-formed HTML
document by adding a &lt;body&gt; tag. Unlike lxml, it doesn’t even bother
to add an &lt;html&gt; tag.
        </p>
        <p>
         Since the document “&lt;a&gt;&lt;/p&gt;” is invalid, none of these techniques is
the “correct” way to handle it. The html5lib parser uses techniques
that are part of the HTML5 standard, so it has the best claim on being
the “correct” way, but all three techniques are legitimate.
        </p>
        <p>
         Differences between parsers can affect your script. If you’re planning
on distributing your script to other people, or running it on multiple
machines, you should specify a parser in the
         <code class="docutils literal">
          <span class="pre">
           BeautifulSoup
          </span>
         </code>
         constructor. That will reduce the chances that your users parse a
document differently from the way you parse it.
        </p>
       </div>
      </div>
      <div class="section" id="encodings">
       <h1>
        Encodings
        <a class="headerlink" href="#encodings" title="Permalink to this headline">
         ¶
        </a>
       </h1>
       <p>
        Any HTML or XML document is written in a specific encoding like ASCII
or UTF-8.  But when you load that document into Beautiful Soup, you’ll
discover it’s been converted to Unicode:
       </p>
       <div class="highlight-python">
        <div class="highlight">
         <pre><span class="n">markup</span> <span class="o">=</span> <span class="s2">"&lt;h1&gt;Sacr</span><span class="se">\xc3\xa9</span><span class="s2"> bleu!&lt;/h1&gt;"</span>
<span class="n">soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="n">markup</span><span class="p">)</span>
<span class="n">soup</span><span class="o">.</span><span class="n">h1</span>
<span class="c1"># &lt;h1&gt;Sacré bleu!&lt;/h1&gt;</span>
<span class="n">soup</span><span class="o">.</span><span class="n">h1</span><span class="o">.</span><span class="n">string</span>
<span class="c1"># u'Sacr\xe9 bleu!'</span>
</pre>
        </div>
       </div>
       <p>
        It’s not magic. (That sure would be nice.) Beautiful Soup uses a
sub-library called
        <a class="reference internal" href="#unicode-dammit">
         Unicode, Dammit
        </a>
        to detect a document’s encoding
and convert it to Unicode. The autodetected encoding is available as
the
        <code class="docutils literal">
         <span class="pre">
          .original_encoding
         </span>
        </code>
        attribute of the
        <code class="docutils literal">
         <span class="pre">
          BeautifulSoup
         </span>
        </code>
        object:
       </p>
       <div class="highlight-python">
        <div class="highlight">
         <pre><span class="n">soup</span><span class="o">.</span><span class="n">original_encoding</span>
<span class="s1">'utf-8'</span>
</pre>
        </div>
       </div>
       <p>
        Unicode, Dammit guesses correctly most of the time, but sometimes it
makes mistakes. Sometimes it guesses correctly, but only after a
byte-by-byte search of the document that takes a very long time. If
you happen to know a document’s encoding ahead of time, you can avoid
mistakes and delays by passing it to the
        <code class="docutils literal">
         <span class="pre">
          BeautifulSoup
         </span>
        </code>
        constructor
as
        <code class="docutils literal">
         <span class="pre">
          from_encoding
         </span>
        </code>
        .
       </p>
       <p>
        Here’s a document written in ISO-8859-8. The document is so short that
Unicode, Dammit can’t get a good lock on it, and misidentifies it as
ISO-8859-7:
       </p>
       <div class="highlight-python">
        <div class="highlight">
         <pre>markup = b"&lt;h1&gt;\xed\xe5\xec\xf9&lt;/h1&gt;"
soup = BeautifulSoup(markup)
soup.h1
&lt;h1&gt;νεμω&lt;/h1&gt;
soup.original_encoding
'ISO-8859-7'
</pre>
        </div>
       </div>
       <p>
        We can fix this by passing in the correct
        <code class="docutils literal">
         <span class="pre">
          from_encoding
         </span>
        </code>
        :
       </p>
       <div class="highlight-python">
        <div class="highlight">
         <pre>soup = BeautifulSoup(markup, from_encoding="iso-8859-8")
soup.h1
&lt;h1&gt;םולש&lt;/h1&gt;
soup.original_encoding
'iso8859-8'
</pre>
        </div>
       </div>
       <p>
        If you don’t know what the correct encoding is, but you know that
Unicode, Dammit is guessing wrong, you can pass the wrong guesses in
as
        <code class="docutils literal">
         <span class="pre">
          exclude_encodings
         </span>
        </code>
        :
       </p>
       <div class="highlight-python">
        <div class="highlight">
         <pre>soup = BeautifulSoup(markup, exclude_encodings=["ISO-8859-7"])
soup.h1
&lt;h1&gt;םולש&lt;/h1&gt;
soup.original_encoding
'WINDOWS-1255'
</pre>
        </div>
       </div>
       <p>
        Windows-1255 isn’t 100% correct, but that encoding is a compatible
superset of ISO-8859-8, so it’s close enough. (
        <code class="docutils literal">
         <span class="pre">
          exclude_encodings
         </span>
        </code>
        is a new feature in Beautiful Soup 4.4.0.)
       </p>
       <p>
        In rare cases (usually when a UTF-8 document contains text written in
a completely different encoding), the only way to get Unicode may be
to replace some characters with the special Unicode character
“REPLACEMENT CHARACTER” (U+FFFD, �). If Unicode, Dammit needs to do
this, it will set the
        <code class="docutils literal">
         <span class="pre">
          .contains_replacement_characters
         </span>
        </code>
        attribute
to
        <code class="docutils literal">
         <span class="pre">
          True
         </span>
        </code>
        on the
        <code class="docutils literal">
         <span class="pre">
          UnicodeDammit
         </span>
        </code>
        or
        <code class="docutils literal">
         <span class="pre">
          BeautifulSoup
         </span>
        </code>
        object. This
lets you know that the Unicode representation is not an exact
representation of the original–some data was lost. If a document
contains �, but
        <code class="docutils literal">
         <span class="pre">
          .contains_replacement_characters
         </span>
        </code>
        is
        <code class="docutils literal">
         <span class="pre">
          False
         </span>
        </code>
        ,
you’ll know that the � was there originally (as it is in this
paragraph) and doesn’t stand in for missing data.
       </p>
       <div class="section" id="output-encoding">
        <h2>
         Output encoding
         <a class="headerlink" href="#output-encoding" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         When you write out a document from Beautiful Soup, you get a UTF-8
document, even if the document wasn’t in UTF-8 to begin with. Here’s a
document written in the Latin-1 encoding:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">markup</span> <span class="o">=</span> <span class="n">b</span><span class="s1">'''</span>
<span class="s1"> &lt;html&gt;</span>
<span class="s1">  &lt;head&gt;</span>
<span class="s1">   &lt;meta content="text/html; charset=ISO-Latin-1" http-equiv="Content-type" /&gt;</span>
<span class="s1">  &lt;/head&gt;</span>
<span class="s1">  &lt;body&gt;</span>
<span class="s1">   &lt;p&gt;Sacr</span><span class="se">\xe9</span><span class="s1"> bleu!&lt;/p&gt;</span>
<span class="s1">  &lt;/body&gt;</span>
<span class="s1"> &lt;/html&gt;</span>
<span class="s1">'''</span>

<span class="n">soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="n">markup</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">soup</span><span class="o">.</span><span class="n">prettify</span><span class="p">())</span>
<span class="c1"># &lt;html&gt;</span>
<span class="c1">#  &lt;head&gt;</span>
<span class="c1">#   &lt;meta content="text/html; charset=utf-8" http-equiv="Content-type" /&gt;</span>
<span class="c1">#  &lt;/head&gt;</span>
<span class="c1">#  &lt;body&gt;</span>
<span class="c1">#   &lt;p&gt;</span>
<span class="c1">#    Sacré bleu!</span>
<span class="c1">#   &lt;/p&gt;</span>
<span class="c1">#  &lt;/body&gt;</span>
<span class="c1"># &lt;/html&gt;</span>
</pre>
         </div>
        </div>
        <p>
         Note that the &lt;meta&gt; tag has been rewritten to reflect the fact that
the document is now in UTF-8.
        </p>
        <p>
         If you don’t want UTF-8, you can pass an encoding into
         <code class="docutils literal">
          <span class="pre">
           prettify()
          </span>
         </code>
         :
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="k">print</span><span class="p">(</span><span class="n">soup</span><span class="o">.</span><span class="n">prettify</span><span class="p">(</span><span class="s2">"latin-1"</span><span class="p">))</span>
<span class="c1"># &lt;html&gt;</span>
<span class="c1">#  &lt;head&gt;</span>
<span class="c1">#   &lt;meta content="text/html; charset=latin-1" http-equiv="Content-type" /&gt;</span>
<span class="c1"># ...</span>
</pre>
         </div>
        </div>
        <p>
         You can also call encode() on the
         <code class="docutils literal">
          <span class="pre">
           BeautifulSoup
          </span>
         </code>
         object, or any
element in the soup, just as if it were a Python string:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">soup</span><span class="o">.</span><span class="n">p</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="s2">"latin-1"</span><span class="p">)</span>
<span class="c1"># '&lt;p&gt;Sacr\xe9 bleu!&lt;/p&gt;'</span>

<span class="n">soup</span><span class="o">.</span><span class="n">p</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="s2">"utf-8"</span><span class="p">)</span>
<span class="c1"># '&lt;p&gt;Sacr\xc3\xa9 bleu!&lt;/p&gt;'</span>
</pre>
         </div>
        </div>
        <p>
         Any characters that can’t be represented in your chosen encoding will
be converted into numeric XML entity references. Here’s a document
that includes the Unicode character SNOWMAN:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">markup</span> <span class="o">=</span> <span class="s2">u"&lt;b&gt;</span><span class="se">\N{SNOWMAN}</span><span class="s2">&lt;/b&gt;"</span>
<span class="n">snowman_soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="n">markup</span><span class="p">)</span>
<span class="n">tag</span> <span class="o">=</span> <span class="n">snowman_soup</span><span class="o">.</span><span class="n">b</span>
</pre>
         </div>
        </div>
        <p>
         The SNOWMAN character can be part of a UTF-8 document (it looks like
☃), but there’s no representation for that character in ISO-Latin-1 or
ASCII, so it’s converted into “&amp;#9731” for those encodings:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="k">print</span><span class="p">(</span><span class="n">tag</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="s2">"utf-8"</span><span class="p">))</span>
<span class="c1"># &lt;b&gt;☃&lt;/b&gt;</span>

<span class="k">print</span> <span class="n">tag</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="s2">"latin-1"</span><span class="p">)</span>
<span class="c1"># &lt;b&gt;&amp;#9731;&lt;/b&gt;</span>

<span class="k">print</span> <span class="n">tag</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="s2">"ascii"</span><span class="p">)</span>
<span class="c1"># &lt;b&gt;&amp;#9731;&lt;/b&gt;</span>
</pre>
         </div>
        </div>
       </div>
       <div class="section" id="unicode-dammit">
        <h2>
         Unicode, Dammit
         <a class="headerlink" href="#unicode-dammit" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         You can use Unicode, Dammit without using Beautiful Soup. It’s useful
whenever you have data in an unknown encoding and you just want it to
become Unicode:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="kn">from</span> <span class="nn">bs4</span> <span class="kn">import</span> <span class="n">UnicodeDammit</span>
<span class="n">dammit</span> <span class="o">=</span> <span class="n">UnicodeDammit</span><span class="p">(</span><span class="s2">"Sacr</span><span class="se">\xc3\xa9</span><span class="s2"> bleu!"</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">dammit</span><span class="o">.</span><span class="n">unicode_markup</span><span class="p">)</span>
<span class="c1"># Sacré bleu!</span>
<span class="n">dammit</span><span class="o">.</span><span class="n">original_encoding</span>
<span class="c1"># 'utf-8'</span>
</pre>
         </div>
        </div>
        <p>
         Unicode, Dammit’s guesses will get a lot more accurate if you install
the
         <code class="docutils literal">
          <span class="pre">
           chardet
          </span>
         </code>
         or
         <code class="docutils literal">
          <span class="pre">
           cchardet
          </span>
         </code>
         Python libraries. The more data you
give Unicode, Dammit, the more accurately it will guess. If you have
your own suspicions as to what the encoding might be, you can pass
them in as a list:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">dammit</span> <span class="o">=</span> <span class="n">UnicodeDammit</span><span class="p">(</span><span class="s2">"Sacr</span><span class="se">\xe9</span><span class="s2"> bleu!"</span><span class="p">,</span> <span class="p">[</span><span class="s2">"latin-1"</span><span class="p">,</span> <span class="s2">"iso-8859-1"</span><span class="p">])</span>
<span class="k">print</span><span class="p">(</span><span class="n">dammit</span><span class="o">.</span><span class="n">unicode_markup</span><span class="p">)</span>
<span class="c1"># Sacré bleu!</span>
<span class="n">dammit</span><span class="o">.</span><span class="n">original_encoding</span>
<span class="c1"># 'latin-1'</span>
</pre>
         </div>
        </div>
        <p>
         Unicode, Dammit has two special features that Beautiful Soup doesn’t
use.
        </p>
        <div class="section" id="smart-quotes">
         <h3>
          Smart quotes
          <a class="headerlink" href="#smart-quotes" title="Permalink to this headline">
           ¶
          </a>
         </h3>
         <p>
          You can use Unicode, Dammit to convert Microsoft smart quotes to HTML or XML
entities:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">markup</span> <span class="o">=</span> <span class="n">b</span><span class="s2">"&lt;p&gt;I just </span><span class="se">\x93</span><span class="s2">love</span><span class="se">\x94</span><span class="s2"> Microsoft Word</span><span class="se">\x92</span><span class="s2">s smart quotes&lt;/p&gt;"</span>

<span class="n">UnicodeDammit</span><span class="p">(</span><span class="n">markup</span><span class="p">,</span> <span class="p">[</span><span class="s2">"windows-1252"</span><span class="p">],</span> <span class="n">smart_quotes_to</span><span class="o">=</span><span class="s2">"html"</span><span class="p">)</span><span class="o">.</span><span class="n">unicode_markup</span>
<span class="c1"># u'&lt;p&gt;I just &amp;ldquo;love&amp;rdquo; Microsoft Word&amp;rsquo;s smart quotes&lt;/p&gt;'</span>

<span class="n">UnicodeDammit</span><span class="p">(</span><span class="n">markup</span><span class="p">,</span> <span class="p">[</span><span class="s2">"windows-1252"</span><span class="p">],</span> <span class="n">smart_quotes_to</span><span class="o">=</span><span class="s2">"xml"</span><span class="p">)</span><span class="o">.</span><span class="n">unicode_markup</span>
<span class="c1"># u'&lt;p&gt;I just &amp;#x201C;love&amp;#x201D; Microsoft Word&amp;#x2019;s smart quotes&lt;/p&gt;'</span>
</pre>
          </div>
         </div>
         <p>
          You can also convert Microsoft smart quotes to ASCII quotes:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">UnicodeDammit</span><span class="p">(</span><span class="n">markup</span><span class="p">,</span> <span class="p">[</span><span class="s2">"windows-1252"</span><span class="p">],</span> <span class="n">smart_quotes_to</span><span class="o">=</span><span class="s2">"ascii"</span><span class="p">)</span><span class="o">.</span><span class="n">unicode_markup</span>
<span class="c1"># u'&lt;p&gt;I just "love" Microsoft Word\'s smart quotes&lt;/p&gt;'</span>
</pre>
          </div>
         </div>
         <p>
          Hopefully you’ll find this feature useful, but Beautiful Soup doesn’t
use it. Beautiful Soup prefers the default behavior, which is to
convert Microsoft smart quotes to Unicode characters along with
everything else:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">UnicodeDammit</span><span class="p">(</span><span class="n">markup</span><span class="p">,</span> <span class="p">[</span><span class="s2">"windows-1252"</span><span class="p">])</span><span class="o">.</span><span class="n">unicode_markup</span>
<span class="c1"># u'&lt;p&gt;I just \u201clove\u201d Microsoft Word\u2019s smart quotes&lt;/p&gt;'</span>
</pre>
          </div>
         </div>
        </div>
        <div class="section" id="inconsistent-encodings">
         <h3>
          Inconsistent encodings
          <a class="headerlink" href="#inconsistent-encodings" title="Permalink to this headline">
           ¶
          </a>
         </h3>
         <p>
          Sometimes a document is mostly in UTF-8, but contains Windows-1252
characters such as (again) Microsoft smart quotes. This can happen
when a website includes data from multiple sources. You can use
          <code class="docutils literal">
           <span class="pre">
            UnicodeDammit.detwingle()
           </span>
          </code>
          to turn such a document into pure
UTF-8. Here’s a simple example:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">snowmen</span> <span class="o">=</span> <span class="p">(</span><span class="s2">u"</span><span class="se">\N{SNOWMAN}</span><span class="s2">"</span> <span class="o">*</span> <span class="mi">3</span><span class="p">)</span>
<span class="n">quote</span> <span class="o">=</span> <span class="p">(</span><span class="s2">u"</span><span class="se">\N{LEFT DOUBLE QUOTATION MARK}</span><span class="s2">I like snowmen!</span><span class="se">\N{RIGHT DOUBLE QUOTATION MARK}</span><span class="s2">"</span><span class="p">)</span>
<span class="n">doc</span> <span class="o">=</span> <span class="n">snowmen</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="s2">"utf8"</span><span class="p">)</span> <span class="o">+</span> <span class="n">quote</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="s2">"windows_1252"</span><span class="p">)</span>
</pre>
          </div>
         </div>
         <p>
          This document is a mess. The snowmen are in UTF-8 and the quotes are
in Windows-1252. You can display the snowmen or the quotes, but not
both:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="k">print</span><span class="p">(</span><span class="n">doc</span><span class="p">)</span>
<span class="c1"># ☃☃☃�I like snowmen!�</span>

<span class="k">print</span><span class="p">(</span><span class="n">doc</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="s2">"windows-1252"</span><span class="p">))</span>
<span class="c1"># â˜ƒâ˜ƒâ˜ƒ“I like snowmen!”</span>
</pre>
          </div>
         </div>
         <p>
          Decoding the document as UTF-8 raises a
          <code class="docutils literal">
           <span class="pre">
            UnicodeDecodeError
           </span>
          </code>
          , and
decoding it as Windows-1252 gives you gibberish. Fortunately,
          <code class="docutils literal">
           <span class="pre">
            UnicodeDammit.detwingle()
           </span>
          </code>
          will convert the string to pure UTF-8,
allowing you to decode it to Unicode and display the snowmen and quote
marks simultaneously:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="n">new_doc</span> <span class="o">=</span> <span class="n">UnicodeDammit</span><span class="o">.</span><span class="n">detwingle</span><span class="p">(</span><span class="n">doc</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">new_doc</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="s2">"utf8"</span><span class="p">))</span>
<span class="c1"># ☃☃☃“I like snowmen!”</span>
</pre>
          </div>
         </div>
         <p>
          <code class="docutils literal">
           <span class="pre">
            UnicodeDammit.detwingle()
           </span>
          </code>
          only knows how to handle Windows-1252
embedded in UTF-8 (or vice versa, I suppose), but this is the most
common case.
         </p>
         <p>
          Note that you must know to call
          <code class="docutils literal">
           <span class="pre">
            UnicodeDammit.detwingle()
           </span>
          </code>
          on your
data before passing it into
          <code class="docutils literal">
           <span class="pre">
            BeautifulSoup
           </span>
          </code>
          or the
          <code class="docutils literal">
           <span class="pre">
            UnicodeDammit
           </span>
          </code>
          constructor. Beautiful Soup assumes that a document has a single
encoding, whatever it might be. If you pass it a document that
contains both UTF-8 and Windows-1252, it’s likely to think the whole
document is Windows-1252, and the document will come out looking like
          <code class="docutils literal">
           <span class="pre">
            â˜ƒâ˜ƒâ˜ƒ“I
           </span>
           <span class="pre">
            like
           </span>
           <span class="pre">
            snowmen!”
           </span>
          </code>
          .
         </p>
         <p>
          <code class="docutils literal">
           <span class="pre">
            UnicodeDammit.detwingle()
           </span>
          </code>
          is new in Beautiful Soup 4.1.0.
         </p>
        </div>
       </div>
      </div>
      <div class="section" id="comparing-objects-for-equality">
       <h1>
        Comparing objects for equality
        <a class="headerlink" href="#comparing-objects-for-equality" title="Permalink to this headline">
         ¶
        </a>
       </h1>
       <p>
        Beautiful Soup says that two
        <code class="docutils literal">
         <span class="pre">
          NavigableString
         </span>
        </code>
        or
        <code class="docutils literal">
         <span class="pre">
          Tag
         </span>
        </code>
        objects
are equal when they represent the same HTML or XML markup. In this
example, the two &lt;b&gt; tags are treated as equal, even though they live
in different parts of the object tree, because they both look like
“&lt;b&gt;pizza&lt;/b&gt;”:
       </p>
       <div class="highlight-python">
        <div class="highlight">
         <pre><span class="n">markup</span> <span class="o">=</span> <span class="s2">"&lt;p&gt;I want &lt;b&gt;pizza&lt;/b&gt; and more &lt;b&gt;pizza&lt;/b&gt;!&lt;/p&gt;"</span>
<span class="n">soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="n">markup</span><span class="p">,</span> <span class="s1">'html.parser'</span><span class="p">)</span>
<span class="n">first_b</span><span class="p">,</span> <span class="n">second_b</span> <span class="o">=</span> <span class="n">soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="s1">'b'</span><span class="p">)</span>
<span class="k">print</span> <span class="n">first_b</span> <span class="o">==</span> <span class="n">second_b</span>
<span class="c1"># True</span>

<span class="k">print</span> <span class="n">first_b</span><span class="o">.</span><span class="n">previous_element</span> <span class="o">==</span> <span class="n">second_b</span><span class="o">.</span><span class="n">previous_element</span>
<span class="c1"># False</span>
</pre>
        </div>
       </div>
       <p>
        If you want to see whether two variables refer to exactly the same
object, use
        <cite>
         is
        </cite>
        :
       </p>
       <div class="highlight-python">
        <div class="highlight">
         <pre><span class="k">print</span> <span class="n">first_b</span> <span class="ow">is</span> <span class="n">second_b</span>
<span class="c1"># False</span>
</pre>
        </div>
       </div>
      </div>
      <div class="section" id="copying-beautiful-soup-objects">
       <h1>
        Copying Beautiful Soup objects
        <a class="headerlink" href="#copying-beautiful-soup-objects" title="Permalink to this headline">
         ¶
        </a>
       </h1>
       <p>
        You can use
        <code class="docutils literal">
         <span class="pre">
          copy.copy()
         </span>
        </code>
        to create a copy of any
        <code class="docutils literal">
         <span class="pre">
          Tag
         </span>
        </code>
        or
        <code class="docutils literal">
         <span class="pre">
          NavigableString
         </span>
        </code>
        :
       </p>
       <div class="highlight-python">
        <div class="highlight">
         <pre><span class="kn">import</span> <span class="nn">copy</span>
<span class="n">p_copy</span> <span class="o">=</span> <span class="n">copy</span><span class="o">.</span><span class="n">copy</span><span class="p">(</span><span class="n">soup</span><span class="o">.</span><span class="n">p</span><span class="p">)</span>
<span class="k">print</span> <span class="n">p_copy</span>
<span class="c1"># &lt;p&gt;I want &lt;b&gt;pizza&lt;/b&gt; and more &lt;b&gt;pizza&lt;/b&gt;!&lt;/p&gt;</span>
</pre>
        </div>
       </div>
       <p>
        The copy is considered equal to the original, since it represents the
same markup as the original, but it’s not the same object:
       </p>
       <div class="highlight-python">
        <div class="highlight">
         <pre><span class="k">print</span> <span class="n">soup</span><span class="o">.</span><span class="n">p</span> <span class="o">==</span> <span class="n">p_copy</span>
<span class="c1"># True</span>

<span class="k">print</span> <span class="n">soup</span><span class="o">.</span><span class="n">p</span> <span class="ow">is</span> <span class="n">p_copy</span>
<span class="c1"># False</span>
</pre>
        </div>
       </div>
       <p>
        The only real difference is that the copy is completely detached from
the original Beautiful Soup object tree, just as if
        <code class="docutils literal">
         <span class="pre">
          extract()
         </span>
        </code>
        had
been called on it:
       </p>
       <div class="highlight-python">
        <div class="highlight">
         <pre><span class="k">print</span> <span class="n">p_copy</span><span class="o">.</span><span class="n">parent</span>
<span class="c1"># None</span>
</pre>
        </div>
       </div>
       <p>
        This is because two different
        <code class="docutils literal">
         <span class="pre">
          Tag
         </span>
        </code>
        objects can’t occupy the same
space at the same time.
       </p>
      </div>
      <div class="section" id="parsing-only-part-of-a-document">
       <h1>
        Parsing only part of a document
        <a class="headerlink" href="#parsing-only-part-of-a-document" title="Permalink to this headline">
         ¶
        </a>
       </h1>
       <p>
        Let’s say you want to use Beautiful Soup look at a document’s &lt;a&gt;
tags. It’s a waste of time and memory to parse the entire document and
then go over it again looking for &lt;a&gt; tags. It would be much faster to
ignore everything that wasn’t an &lt;a&gt; tag in the first place. The
        <code class="docutils literal">
         <span class="pre">
          SoupStrainer
         </span>
        </code>
        class allows you to choose which parts of an incoming
document are parsed. You just create a
        <code class="docutils literal">
         <span class="pre">
          SoupStrainer
         </span>
        </code>
        and pass it in
to the
        <code class="docutils literal">
         <span class="pre">
          BeautifulSoup
         </span>
        </code>
        constructor as the
        <code class="docutils literal">
         <span class="pre">
          parse_only
         </span>
        </code>
        argument.
       </p>
       <p>
        (Note that
        <em>
         this feature won’t work if you’re using the html5lib parser
        </em>
        .
If you use html5lib, the whole document will be parsed, no
matter what. This is because html5lib constantly rearranges the parse
tree as it works, and if some part of the document didn’t actually
make it into the parse tree, it’ll crash. To avoid confusion, in the
examples below I’ll be forcing Beautiful Soup to use Python’s
built-in parser.)
       </p>
       <div class="section" id="soupstrainer">
        <h2>
         <code class="docutils literal">
          <span class="pre">
           SoupStrainer
          </span>
         </code>
         <a class="headerlink" href="#soupstrainer" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         The
         <code class="docutils literal">
          <span class="pre">
           SoupStrainer
          </span>
         </code>
         class takes the same arguments as a typical
method from
         <a class="reference internal" href="#searching-the-tree">
          Searching the tree
         </a>
         :
         <a class="reference internal" href="#id11">
          <span>
           name
          </span>
         </a>
         ,
         <a class="reference internal" href="#attrs">
          <span>
           attrs
          </span>
         </a>
         ,
         <a class="reference internal" href="#id12">
          <span>
           string
          </span>
         </a>
         , and
         <a class="reference internal" href="#kwargs">
          <span>
           **kwargs
          </span>
         </a>
         . Here are
three
         <code class="docutils literal">
          <span class="pre">
           SoupStrainer
          </span>
         </code>
         objects:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="kn">from</span> <span class="nn">bs4</span> <span class="kn">import</span> <span class="n">SoupStrainer</span>

<span class="n">only_a_tags</span> <span class="o">=</span> <span class="n">SoupStrainer</span><span class="p">(</span><span class="s2">"a"</span><span class="p">)</span>

<span class="n">only_tags_with_id_link2</span> <span class="o">=</span> <span class="n">SoupStrainer</span><span class="p">(</span><span class="nb">id</span><span class="o">=</span><span class="s2">"link2"</span><span class="p">)</span>

<span class="k">def</span> <span class="nf">is_short_string</span><span class="p">(</span><span class="n">string</span><span class="p">):</span>
    <span class="k">return</span> <span class="nb">len</span><span class="p">(</span><span class="n">string</span><span class="p">)</span> <span class="o">&lt;</span> <span class="mi">10</span>

<span class="n">only_short_strings</span> <span class="o">=</span> <span class="n">SoupStrainer</span><span class="p">(</span><span class="n">string</span><span class="o">=</span><span class="n">is_short_string</span><span class="p">)</span>
</pre>
         </div>
        </div>
        <p>
         I’m going to bring back the “three sisters” document one more time,
and we’ll see what the document looks like when it’s parsed with these
three
         <code class="docutils literal">
          <span class="pre">
           SoupStrainer
          </span>
         </code>
         objects:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">html_doc</span> <span class="o">=</span> <span class="s2">"""</span>
<span class="s2">&lt;html&gt;&lt;head&gt;&lt;title&gt;The Dormouse's story&lt;/title&gt;&lt;/head&gt;</span>
<span class="s2">&lt;body&gt;</span>
<span class="s2">&lt;p class="title"&gt;&lt;b&gt;The Dormouse's story&lt;/b&gt;&lt;/p&gt;</span>

<span class="s2">&lt;p class="story"&gt;Once upon a time there were three little sisters; and their names were</span>
<span class="s2">&lt;a href="http://example.com/elsie" class="sister" id="link1"&gt;Elsie&lt;/a&gt;,</span>
<span class="s2">&lt;a href="http://example.com/lacie" class="sister" id="link2"&gt;Lacie&lt;/a&gt; and</span>
<span class="s2">&lt;a href="http://example.com/tillie" class="sister" id="link3"&gt;Tillie&lt;/a&gt;;</span>
<span class="s2">and they lived at the bottom of a well.&lt;/p&gt;</span>

<span class="s2">&lt;p class="story"&gt;...&lt;/p&gt;</span>
<span class="s2">"""</span>

<span class="k">print</span><span class="p">(</span><span class="n">BeautifulSoup</span><span class="p">(</span><span class="n">html_doc</span><span class="p">,</span> <span class="s2">"html.parser"</span><span class="p">,</span> <span class="n">parse_only</span><span class="o">=</span><span class="n">only_a_tags</span><span class="p">)</span><span class="o">.</span><span class="n">prettify</span><span class="p">())</span>
<span class="c1"># &lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;</span>
<span class="c1">#  Elsie</span>
<span class="c1"># &lt;/a&gt;</span>
<span class="c1"># &lt;a class="sister" href="http://example.com/lacie" id="link2"&gt;</span>
<span class="c1">#  Lacie</span>
<span class="c1"># &lt;/a&gt;</span>
<span class="c1"># &lt;a class="sister" href="http://example.com/tillie" id="link3"&gt;</span>
<span class="c1">#  Tillie</span>
<span class="c1"># &lt;/a&gt;</span>

<span class="k">print</span><span class="p">(</span><span class="n">BeautifulSoup</span><span class="p">(</span><span class="n">html_doc</span><span class="p">,</span> <span class="s2">"html.parser"</span><span class="p">,</span> <span class="n">parse_only</span><span class="o">=</span><span class="n">only_tags_with_id_link2</span><span class="p">)</span><span class="o">.</span><span class="n">prettify</span><span class="p">())</span>
<span class="c1"># &lt;a class="sister" href="http://example.com/lacie" id="link2"&gt;</span>
<span class="c1">#  Lacie</span>
<span class="c1"># &lt;/a&gt;</span>

<span class="k">print</span><span class="p">(</span><span class="n">BeautifulSoup</span><span class="p">(</span><span class="n">html_doc</span><span class="p">,</span> <span class="s2">"html.parser"</span><span class="p">,</span> <span class="n">parse_only</span><span class="o">=</span><span class="n">only_short_strings</span><span class="p">)</span><span class="o">.</span><span class="n">prettify</span><span class="p">())</span>
<span class="c1"># Elsie</span>
<span class="c1"># ,</span>
<span class="c1"># Lacie</span>
<span class="c1"># and</span>
<span class="c1"># Tillie</span>
<span class="c1"># ...</span>
<span class="c1">#</span>
</pre>
         </div>
        </div>
        <p>
         You can also pass a
         <code class="docutils literal">
          <span class="pre">
           SoupStrainer
          </span>
         </code>
         into any of the methods covered
in
         <a class="reference internal" href="#searching-the-tree">
          Searching the tree
         </a>
         . This probably isn’t terribly useful, but I
thought I’d mention it:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="n">html_doc</span><span class="p">)</span>
<span class="n">soup</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="n">only_short_strings</span><span class="p">)</span>
<span class="c1"># [u'\n\n', u'\n\n', u'Elsie', u',\n', u'Lacie', u' and\n', u'Tillie',</span>
<span class="c1">#  u'\n\n', u'...', u'\n']</span>
</pre>
         </div>
        </div>
       </div>
      </div>
      <div class="section" id="troubleshooting">
       <h1>
        Troubleshooting
        <a class="headerlink" href="#troubleshooting" title="Permalink to this headline">
         ¶
        </a>
       </h1>
       <div class="section" id="diagnose">
        <span id="id15">
        </span>
        <h2>
         <code class="docutils literal">
          <span class="pre">
           diagnose()
          </span>
         </code>
         <a class="headerlink" href="#diagnose" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         If you’re having trouble understanding what Beautiful Soup does to a
document, pass the document into the
         <code class="docutils literal">
          <span class="pre">
           diagnose()
          </span>
         </code>
         function. (New in
Beautiful Soup 4.2.0.)  Beautiful Soup will print out a report showing
you how different parsers handle the document, and tell you if you’re
missing a parser that Beautiful Soup could be using:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="kn">from</span> <span class="nn">bs4.diagnose</span> <span class="kn">import</span> <span class="n">diagnose</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s2">"bad.html"</span><span class="p">)</span> <span class="k">as</span> <span class="n">fp</span><span class="p">:</span>
    <span class="n">data</span> <span class="o">=</span> <span class="n">fp</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
<span class="n">diagnose</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>

<span class="c1"># Diagnostic running on Beautiful Soup 4.2.0</span>
<span class="c1"># Python version 2.7.3 (default, Aug  1 2012, 05:16:07)</span>
<span class="c1"># I noticed that html5lib is not installed. Installing it may help.</span>
<span class="c1"># Found lxml version 2.3.2.0</span>
<span class="c1">#</span>
<span class="c1"># Trying to parse your data with html.parser</span>
<span class="c1"># Here's what html.parser did with the document:</span>
<span class="c1"># ...</span>
</pre>
         </div>
        </div>
        <p>
         Just looking at the output of diagnose() may show you how to solve the
problem. Even if not, you can paste the output of
         <code class="docutils literal">
          <span class="pre">
           diagnose()
          </span>
         </code>
         when
asking for help.
        </p>
       </div>
       <div class="section" id="errors-when-parsing-a-document">
        <h2>
         Errors when parsing a document
         <a class="headerlink" href="#errors-when-parsing-a-document" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         There are two different kinds of parse errors. There are crashes,
where you feed a document to Beautiful Soup and it raises an
exception, usually an
         <code class="docutils literal">
          <span class="pre">
           HTMLParser.HTMLParseError
          </span>
         </code>
         . And there is
unexpected behavior, where a Beautiful Soup parse tree looks a lot
different than the document used to create it.
        </p>
        <p>
         Almost none of these problems turn out to be problems with Beautiful
Soup. This is not because Beautiful Soup is an amazingly well-written
piece of software. It’s because Beautiful Soup doesn’t include any
parsing code. Instead, it relies on external parsers. If one parser
isn’t working on a certain document, the best solution is to try a
different parser. See
         <a class="reference internal" href="#installing-a-parser">
          Installing a parser
         </a>
         for details and a parser
comparison.
        </p>
        <p>
         The most common parse errors are
         <code class="docutils literal">
          <span class="pre">
           HTMLParser.HTMLParseError:
          </span>
          <span class="pre">
           malformed
          </span>
          <span class="pre">
           start
          </span>
          <span class="pre">
           tag
          </span>
         </code>
         and
         <code class="docutils literal">
          <span class="pre">
           HTMLParser.HTMLParseError:
          </span>
          <span class="pre">
           bad
          </span>
          <span class="pre">
           end
          </span>
          <span class="pre">
           tag
          </span>
         </code>
         . These are both generated by Python’s built-in HTML parser
library, and the solution is to
         <a class="reference internal" href="#parser-installation">
          <span>
           install lxml or
html5lib.
          </span>
         </a>
        </p>
        <p>
         The most common type of unexpected behavior is that you can’t find a
tag that you know is in the document. You saw it going in, but
         <code class="docutils literal">
          <span class="pre">
           find_all()
          </span>
         </code>
         returns
         <code class="docutils literal">
          <span class="pre">
           []
          </span>
         </code>
         or
         <code class="docutils literal">
          <span class="pre">
           find()
          </span>
         </code>
         returns
         <code class="docutils literal">
          <span class="pre">
           None
          </span>
         </code>
         . This is
another common problem with Python’s built-in HTML parser, which
sometimes skips tags it doesn’t understand.  Again, the solution is to
         <a class="reference internal" href="#parser-installation">
          <span>
           install lxml or html5lib.
          </span>
         </a>
        </p>
       </div>
       <div class="section" id="version-mismatch-problems">
        <h2>
         Version mismatch problems
         <a class="headerlink" href="#version-mismatch-problems" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <ul class="simple">
         <li>
          <code class="docutils literal">
           <span class="pre">
            SyntaxError:
           </span>
           <span class="pre">
            Invalid
           </span>
           <span class="pre">
            syntax
           </span>
          </code>
          (on the line
          <code class="docutils literal">
           <span class="pre">
            ROOT_TAG_NAME
           </span>
           <span class="pre">
            =
           </span>
           <span class="pre">
            u'[document]'
           </span>
          </code>
          ): Caused by running the Python 2 version of
Beautiful Soup under Python 3, without converting the code.
         </li>
         <li>
          <code class="docutils literal">
           <span class="pre">
            ImportError:
           </span>
           <span class="pre">
            No
           </span>
           <span class="pre">
            module
           </span>
           <span class="pre">
            named
           </span>
           <span class="pre">
            HTMLParser
           </span>
          </code>
          - Caused by running the
Python 2 version of Beautiful Soup under Python 3.
         </li>
         <li>
          <code class="docutils literal">
           <span class="pre">
            ImportError:
           </span>
           <span class="pre">
            No
           </span>
           <span class="pre">
            module
           </span>
           <span class="pre">
            named
           </span>
           <span class="pre">
            html.parser
           </span>
          </code>
          - Caused by running the
Python 3 version of Beautiful Soup under Python 2.
         </li>
         <li>
          <code class="docutils literal">
           <span class="pre">
            ImportError:
           </span>
           <span class="pre">
            No
           </span>
           <span class="pre">
            module
           </span>
           <span class="pre">
            named
           </span>
           <span class="pre">
            BeautifulSoup
           </span>
          </code>
          - Caused by running
Beautiful Soup 3 code on a system that doesn’t have BS3
installed. Or, by writing Beautiful Soup 4 code without knowing that
the package name has changed to
          <code class="docutils literal">
           <span class="pre">
            bs4
           </span>
          </code>
          .
         </li>
         <li>
          <code class="docutils literal">
           <span class="pre">
            ImportError:
           </span>
           <span class="pre">
            No
           </span>
           <span class="pre">
            module
           </span>
           <span class="pre">
            named
           </span>
           <span class="pre">
            bs4
           </span>
          </code>
          - Caused by running Beautiful
Soup 4 code on a system that doesn’t have BS4 installed.
         </li>
        </ul>
       </div>
       <div class="section" id="parsing-xml">
        <span id="id16">
        </span>
        <h2>
         Parsing XML
         <a class="headerlink" href="#parsing-xml" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         By default, Beautiful Soup parses documents as HTML. To parse a
document as XML, pass in “xml” as the second argument to the
         <code class="docutils literal">
          <span class="pre">
           BeautifulSoup
          </span>
         </code>
         constructor:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="n">soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="n">markup</span><span class="p">,</span> <span class="s2">"xml"</span><span class="p">)</span>
</pre>
         </div>
        </div>
        <p>
         You’ll need to
         <a class="reference internal" href="#parser-installation">
          <span>
           have lxml installed
          </span>
         </a>
         .
        </p>
       </div>
       <div class="section" id="other-parser-problems">
        <h2>
         Other parser problems
         <a class="headerlink" href="#other-parser-problems" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <ul class="simple">
         <li>
          If your script works on one computer but not another, or in one
virtual environment but not another, or outside the virtual
environment but not inside, it’s probably because the two
environments have different parser libraries available. For example,
you may have developed the script on a computer that has lxml
installed, and then tried to run it on a computer that only has
html5lib installed. See
          <a class="reference internal" href="#differences-between-parsers">
           Differences between parsers
          </a>
          for why this
matters, and fix the problem by mentioning a specific parser library
in the
          <code class="docutils literal">
           <span class="pre">
            BeautifulSoup
           </span>
          </code>
          constructor.
         </li>
         <li>
          Because
          <a class="reference external" href="http://www.w3.org/TR/html5/syntax.html#syntax">
           HTML tags and attributes are case-insensitive
          </a>
          , all three HTML
parsers convert tag and attribute names to lowercase. That is, the
markup &lt;TAG&gt;&lt;/TAG&gt; is converted to &lt;tag&gt;&lt;/tag&gt;. If you want to
preserve mixed-case or uppercase tags and attributes, you’ll need to
          <a class="reference internal" href="#parsing-xml">
           <span>
            parse the document as XML.
           </span>
          </a>
         </li>
        </ul>
       </div>
       <div class="section" id="miscellaneous">
        <span id="misc">
        </span>
        <h2>
         Miscellaneous
         <a class="headerlink" href="#miscellaneous" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <ul class="simple">
         <li>
          <code class="docutils literal">
           <span class="pre">
            UnicodeEncodeError:
           </span>
           <span class="pre">
            'charmap'
           </span>
           <span class="pre">
            codec
           </span>
           <span class="pre">
            can't
           </span>
           <span class="pre">
            encode
           </span>
           <span class="pre">
            character
           </span>
           <span class="pre">
            u'\xfoo'
           </span>
           <span class="pre">
            in
           </span>
           <span class="pre">
            position
           </span>
           <span class="pre">
            bar
           </span>
          </code>
          (or just about any other
          <code class="docutils literal">
           <span class="pre">
            UnicodeEncodeError
           </span>
          </code>
          ) - This is not a problem with Beautiful Soup.
This problem shows up in two main situations. First, when you try to
print a Unicode character that your console doesn’t know how to
display. (See
          <a class="reference external" href="http://wiki.python.org/moin/PrintFails">
           this page on the Python wiki
          </a>
          for help.) Second, when
you’re writing to a file and you pass in a Unicode character that’s
not supported by your default encoding.  In this case, the simplest
solution is to explicitly encode the Unicode string into UTF-8 with
          <code class="docutils literal">
           <span class="pre">
            u.encode("utf8")
           </span>
          </code>
          .
         </li>
         <li>
          <code class="docutils literal">
           <span class="pre">
            KeyError:
           </span>
           <span class="pre">
            [attr]
           </span>
          </code>
          - Caused by accessing
          <code class="docutils literal">
           <span class="pre">
            tag['attr']
           </span>
          </code>
          when the
tag in question doesn’t define the
          <code class="docutils literal">
           <span class="pre">
            attr
           </span>
          </code>
          attribute. The most
common errors are
          <code class="docutils literal">
           <span class="pre">
            KeyError:
           </span>
           <span class="pre">
            'href'
           </span>
          </code>
          and
          <code class="docutils literal">
           <span class="pre">
            KeyError:
           </span>
           <span class="pre">
            'class'
           </span>
          </code>
          . Use
          <code class="docutils literal">
           <span class="pre">
            tag.get('attr')
           </span>
          </code>
          if you’re not sure
          <code class="docutils literal">
           <span class="pre">
            attr
           </span>
          </code>
          is
defined, just as you would with a Python dictionary.
         </li>
         <li>
          <code class="docutils literal">
           <span class="pre">
            AttributeError:
           </span>
           <span class="pre">
            'ResultSet'
           </span>
           <span class="pre">
            object
           </span>
           <span class="pre">
            has
           </span>
           <span class="pre">
            no
           </span>
           <span class="pre">
            attribute
           </span>
           <span class="pre">
            'foo'
           </span>
          </code>
          - This
usually happens because you expected
          <code class="docutils literal">
           <span class="pre">
            find_all()
           </span>
          </code>
          to return a
single tag or string. But
          <code class="docutils literal">
           <span class="pre">
            find_all()
           </span>
          </code>
          returns a _list_ of tags
and strings–a
          <code class="docutils literal">
           <span class="pre">
            ResultSet
           </span>
          </code>
          object. You need to iterate over the
list and look at the
          <code class="docutils literal">
           <span class="pre">
            .foo
           </span>
          </code>
          of each one. Or, if you really only
want one result, you need to use
          <code class="docutils literal">
           <span class="pre">
            find()
           </span>
          </code>
          instead of
          <code class="docutils literal">
           <span class="pre">
            find_all()
           </span>
          </code>
          .
         </li>
         <li>
          <code class="docutils literal">
           <span class="pre">
            AttributeError:
           </span>
           <span class="pre">
            'NoneType'
           </span>
           <span class="pre">
            object
           </span>
           <span class="pre">
            has
           </span>
           <span class="pre">
            no
           </span>
           <span class="pre">
            attribute
           </span>
           <span class="pre">
            'foo'
           </span>
          </code>
          - This
usually happens because you called
          <code class="docutils literal">
           <span class="pre">
            find()
           </span>
          </code>
          and then tried to
access the
          <cite>
           .foo`
          </cite>
          attribute of the result. But in your case,
          <code class="docutils literal">
           <span class="pre">
            find()
           </span>
          </code>
          didn’t find anything, so it returned
          <code class="docutils literal">
           <span class="pre">
            None
           </span>
          </code>
          , instead of
returning a tag or a string. You need to figure out why your
          <code class="docutils literal">
           <span class="pre">
            find()
           </span>
          </code>
          call isn’t returning anything.
         </li>
        </ul>
       </div>
       <div class="section" id="improving-performance">
        <h2>
         Improving Performance
         <a class="headerlink" href="#improving-performance" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         Beautiful Soup will never be as fast as the parsers it sits on top
of. If response time is critical, if you’re paying for computer time
by the hour, or if there’s any other reason why computer time is more
valuable than programmer time, you should forget about Beautiful Soup
and work directly atop
         <a class="reference external" href="http://lxml.de/">
          lxml
         </a>
         .
        </p>
        <p>
         That said, there are things you can do to speed up Beautiful Soup. If
you’re not using lxml as the underlying parser, my advice is to
         <a class="reference internal" href="#parser-installation">
          <span>
           start
          </span>
         </a>
         . Beautiful Soup parses documents
significantly faster using lxml than using html.parser or html5lib.
        </p>
        <p>
         You can speed up encoding detection significantly by installing the
         <a class="reference external" href="http://pypi.python.org/pypi/cchardet/">
          cchardet
         </a>
         library.
        </p>
        <p>
         <a class="reference internal" href="#parsing-only-part-of-a-document">
          Parsing only part of a document
         </a>
         won’t save you much time parsing
the document, but it can save a lot of memory, and it’ll make
         <cite>
          searching
         </cite>
         the document much faster.
        </p>
       </div>
      </div>
      <div class="section" id="id17">
       <h1>
        Beautiful Soup 3
        <a class="headerlink" href="#id17" title="Permalink to this headline">
         ¶
        </a>
       </h1>
       <p>
        Beautiful Soup 3 is the previous release series, and is no longer
being actively developed. It’s currently packaged with all major Linux
distributions:
       </p>
       <p>
        <code class="kbd docutils literal">
         <span class="pre">
          $
         </span>
         <span class="pre">
          apt-get
         </span>
         <span class="pre">
          install
         </span>
         <span class="pre">
          python-beautifulsoup
         </span>
        </code>
       </p>
       <p>
        It’s also published through PyPi as
        <code class="docutils literal">
         <span class="pre">
          BeautifulSoup
         </span>
        </code>
        .:
       </p>
       <p>
        <code class="kbd docutils literal">
         <span class="pre">
          $
         </span>
         <span class="pre">
          easy_install
         </span>
         <span class="pre">
          BeautifulSoup
         </span>
        </code>
       </p>
       <p>
        <code class="kbd docutils literal">
         <span class="pre">
          $
         </span>
         <span class="pre">
          pip
         </span>
         <span class="pre">
          install
         </span>
         <span class="pre">
          BeautifulSoup
         </span>
        </code>
       </p>
       <p>
        You can also
        <a class="reference external" href="http://www.crummy.com/software/BeautifulSoup/bs3/download/3.x/BeautifulSoup-3.2.0.tar.gz">
         download a tarball of Beautiful Soup 3.2.0
        </a>
        .
       </p>
       <p>
        If you ran
        <code class="docutils literal">
         <span class="pre">
          easy_install
         </span>
         <span class="pre">
          beautifulsoup
         </span>
        </code>
        or
        <code class="docutils literal">
         <span class="pre">
          easy_install
         </span>
         <span class="pre">
          BeautifulSoup
         </span>
        </code>
        , but your code doesn’t work, you installed Beautiful
Soup 3 by mistake. You need to run
        <code class="docutils literal">
         <span class="pre">
          easy_install
         </span>
         <span class="pre">
          beautifulsoup4
         </span>
        </code>
        .
       </p>
       <p>
        <a class="reference external" href="http://www.crummy.com/software/BeautifulSoup/bs3/documentation.html">
         The documentation for Beautiful Soup 3 is archived online
        </a>
        .
       </p>
       <div class="section" id="porting-code-to-bs4">
        <h2>
         Porting code to BS4
         <a class="headerlink" href="#porting-code-to-bs4" title="Permalink to this headline">
          ¶
         </a>
        </h2>
        <p>
         Most code written against Beautiful Soup 3 will work against Beautiful
Soup 4 with one simple change. All you should have to do is change the
package name from
         <code class="docutils literal">
          <span class="pre">
           BeautifulSoup
          </span>
         </code>
         to
         <code class="docutils literal">
          <span class="pre">
           bs4
          </span>
         </code>
         . So this:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="kn">from</span> <span class="nn">BeautifulSoup</span> <span class="kn">import</span> <span class="n">BeautifulSoup</span>
</pre>
         </div>
        </div>
        <p>
         becomes this:
        </p>
        <div class="highlight-python">
         <div class="highlight">
          <pre><span class="kn">from</span> <span class="nn">bs4</span> <span class="kn">import</span> <span class="n">BeautifulSoup</span>
</pre>
         </div>
        </div>
        <ul class="simple">
         <li>
          If you get the
          <code class="docutils literal">
           <span class="pre">
            ImportError
           </span>
          </code>
          “No module named BeautifulSoup”, your
problem is that you’re trying to run Beautiful Soup 3 code, but you
only have Beautiful Soup 4 installed.
         </li>
         <li>
          If you get the
          <code class="docutils literal">
           <span class="pre">
            ImportError
           </span>
          </code>
          “No module named bs4”, your problem
is that you’re trying to run Beautiful Soup 4 code, but you only
have Beautiful Soup 3 installed.
         </li>
        </ul>
        <p>
         Although BS4 is mostly backwards-compatible with BS3, most of its
methods have been deprecated and given new names for
         <a class="reference external" href="http://www.python.org/dev/peps/pep-0008/">
          PEP 8 compliance
         </a>
         . There are numerous other
renames and changes, and a few of them break backwards compatibility.
        </p>
        <p>
         Here’s what you’ll need to know to convert your BS3 code and habits to BS4:
        </p>
        <div class="section" id="you-need-a-parser">
         <h3>
          You need a parser
          <a class="headerlink" href="#you-need-a-parser" title="Permalink to this headline">
           ¶
          </a>
         </h3>
         <p>
          Beautiful Soup 3 used Python’s
          <code class="docutils literal">
           <span class="pre">
            SGMLParser
           </span>
          </code>
          , a module that was
deprecated and removed in Python 3.0. Beautiful Soup 4 uses
          <code class="docutils literal">
           <span class="pre">
            html.parser
           </span>
          </code>
          by default, but you can plug in lxml or html5lib and
use that instead. See
          <a class="reference internal" href="#installing-a-parser">
           Installing a parser
          </a>
          for a comparison.
         </p>
         <p>
          Since
          <code class="docutils literal">
           <span class="pre">
            html.parser
           </span>
          </code>
          is not the same parser as
          <code class="docutils literal">
           <span class="pre">
            SGMLParser
           </span>
          </code>
          , you
may find that Beautiful Soup 4 gives you a different parse tree than
Beautiful Soup 3 for the same markup. If you swap out
          <code class="docutils literal">
           <span class="pre">
            html.parser
           </span>
          </code>
          for lxml or html5lib, you may find that the parse tree changes yet
again. If this happens, you’ll need to update your scraping code to
deal with the new tree.
         </p>
        </div>
        <div class="section" id="method-names">
         <h3>
          Method names
          <a class="headerlink" href="#method-names" title="Permalink to this headline">
           ¶
          </a>
         </h3>
         <ul class="simple">
          <li>
           <code class="docutils literal">
            <span class="pre">
             renderContents
            </span>
           </code>
           -&gt;
           <code class="docutils literal">
            <span class="pre">
             encode_contents
            </span>
           </code>
          </li>
          <li>
           <code class="docutils literal">
            <span class="pre">
             replaceWith
            </span>
           </code>
           -&gt;
           <code class="docutils literal">
            <span class="pre">
             replace_with
            </span>
           </code>
          </li>
          <li>
           <code class="docutils literal">
            <span class="pre">
             replaceWithChildren
            </span>
           </code>
           -&gt;
           <code class="docutils literal">
            <span class="pre">
             unwrap
            </span>
           </code>
          </li>
          <li>
           <code class="docutils literal">
            <span class="pre">
             findAll
            </span>
           </code>
           -&gt;
           <code class="docutils literal">
            <span class="pre">
             find_all
            </span>
           </code>
          </li>
          <li>
           <code class="docutils literal">
            <span class="pre">
             findAllNext
            </span>
           </code>
           -&gt;
           <code class="docutils literal">
            <span class="pre">
             find_all_next
            </span>
           </code>
          </li>
          <li>
           <code class="docutils literal">
            <span class="pre">
             findAllPrevious
            </span>
           </code>
           -&gt;
           <code class="docutils literal">
            <span class="pre">
             find_all_previous
            </span>
           </code>
          </li>
          <li>
           <code class="docutils literal">
            <span class="pre">
             findNext
            </span>
           </code>
           -&gt;
           <code class="docutils literal">
            <span class="pre">
             find_next
            </span>
           </code>
          </li>
          <li>
           <code class="docutils literal">
            <span class="pre">
             findNextSibling
            </span>
           </code>
           -&gt;
           <code class="docutils literal">
            <span class="pre">
             find_next_sibling
            </span>
           </code>
          </li>
          <li>
           <code class="docutils literal">
            <span class="pre">
             findNextSiblings
            </span>
           </code>
           -&gt;
           <code class="docutils literal">
            <span class="pre">
             find_next_siblings
            </span>
           </code>
          </li>
          <li>
           <code class="docutils literal">
            <span class="pre">
             findParent
            </span>
           </code>
           -&gt;
           <code class="docutils literal">
            <span class="pre">
             find_parent
            </span>
           </code>
          </li>
          <li>
           <code class="docutils literal">
            <span class="pre">
             findParents
            </span>
           </code>
           -&gt;
           <code class="docutils literal">
            <span class="pre">
             find_parents
            </span>
           </code>
          </li>
          <li>
           <code class="docutils literal">
            <span class="pre">
             findPrevious
            </span>
           </code>
           -&gt;
           <code class="docutils literal">
            <span class="pre">
             find_previous
            </span>
           </code>
          </li>
          <li>
           <code class="docutils literal">
            <span class="pre">
             findPreviousSibling
            </span>
           </code>
           -&gt;
           <code class="docutils literal">
            <span class="pre">
             find_previous_sibling
            </span>
           </code>
          </li>
          <li>
           <code class="docutils literal">
            <span class="pre">
             findPreviousSiblings
            </span>
           </code>
           -&gt;
           <code class="docutils literal">
            <span class="pre">
             find_previous_siblings
            </span>
           </code>
          </li>
          <li>
           <code class="docutils literal">
            <span class="pre">
             nextSibling
            </span>
           </code>
           -&gt;
           <code class="docutils literal">
            <span class="pre">
             next_sibling
            </span>
           </code>
          </li>
          <li>
           <code class="docutils literal">
            <span class="pre">
             previousSibling
            </span>
           </code>
           -&gt;
           <code class="docutils literal">
            <span class="pre">
             previous_sibling
            </span>
           </code>
          </li>
         </ul>
         <p>
          Some arguments to the Beautiful Soup constructor were renamed for the
same reasons:
         </p>
         <ul class="simple">
          <li>
           <code class="docutils literal">
            <span class="pre">
             BeautifulSoup(parseOnlyThese=...)
            </span>
           </code>
           -&gt;
           <code class="docutils literal">
            <span class="pre">
             BeautifulSoup(parse_only=...)
            </span>
           </code>
          </li>
          <li>
           <code class="docutils literal">
            <span class="pre">
             BeautifulSoup(fromEncoding=...)
            </span>
           </code>
           -&gt;
           <code class="docutils literal">
            <span class="pre">
             BeautifulSoup(from_encoding=...)
            </span>
           </code>
          </li>
         </ul>
         <p>
          I renamed one method for compatibility with Python 3:
         </p>
         <ul class="simple">
          <li>
           <code class="docutils literal">
            <span class="pre">
             Tag.has_key()
            </span>
           </code>
           -&gt;
           <code class="docutils literal">
            <span class="pre">
             Tag.has_attr()
            </span>
           </code>
          </li>
         </ul>
         <p>
          I renamed one attribute to use more accurate terminology:
         </p>
         <ul class="simple">
          <li>
           <code class="docutils literal">
            <span class="pre">
             Tag.isSelfClosing
            </span>
           </code>
           -&gt;
           <code class="docutils literal">
            <span class="pre">
             Tag.is_empty_element
            </span>
           </code>
          </li>
         </ul>
         <p>
          I renamed three attributes to avoid using words that have special
meaning to Python. Unlike the others, these changes are
          <em>
           not backwards
compatible.
          </em>
          If you used these attributes in BS3, your code will break
on BS4 until you change them.
         </p>
         <ul class="simple">
          <li>
           <code class="docutils literal">
            <span class="pre">
             UnicodeDammit.unicode
            </span>
           </code>
           -&gt;
           <code class="docutils literal">
            <span class="pre">
             UnicodeDammit.unicode_markup
            </span>
           </code>
          </li>
          <li>
           <code class="docutils literal">
            <span class="pre">
             Tag.next
            </span>
           </code>
           -&gt;
           <code class="docutils literal">
            <span class="pre">
             Tag.next_element
            </span>
           </code>
          </li>
          <li>
           <code class="docutils literal">
            <span class="pre">
             Tag.previous
            </span>
           </code>
           -&gt;
           <code class="docutils literal">
            <span class="pre">
             Tag.previous_element
            </span>
           </code>
          </li>
         </ul>
        </div>
        <div class="section" id="generators">
         <h3>
          Generators
          <a class="headerlink" href="#generators" title="Permalink to this headline">
           ¶
          </a>
         </h3>
         <p>
          I gave the generators PEP 8-compliant names, and transformed them into
properties:
         </p>
         <ul class="simple">
          <li>
           <code class="docutils literal">
            <span class="pre">
             childGenerator()
            </span>
           </code>
           -&gt;
           <code class="docutils literal">
            <span class="pre">
             children
            </span>
           </code>
          </li>
          <li>
           <code class="docutils literal">
            <span class="pre">
             nextGenerator()
            </span>
           </code>
           -&gt;
           <code class="docutils literal">
            <span class="pre">
             next_elements
            </span>
           </code>
          </li>
          <li>
           <code class="docutils literal">
            <span class="pre">
             nextSiblingGenerator()
            </span>
           </code>
           -&gt;
           <code class="docutils literal">
            <span class="pre">
             next_siblings
            </span>
           </code>
          </li>
          <li>
           <code class="docutils literal">
            <span class="pre">
             previousGenerator()
            </span>
           </code>
           -&gt;
           <code class="docutils literal">
            <span class="pre">
             previous_elements
            </span>
           </code>
          </li>
          <li>
           <code class="docutils literal">
            <span class="pre">
             previousSiblingGenerator()
            </span>
           </code>
           -&gt;
           <code class="docutils literal">
            <span class="pre">
             previous_siblings
            </span>
           </code>
          </li>
          <li>
           <code class="docutils literal">
            <span class="pre">
             recursiveChildGenerator()
            </span>
           </code>
           -&gt;
           <code class="docutils literal">
            <span class="pre">
             descendants
            </span>
           </code>
          </li>
          <li>
           <code class="docutils literal">
            <span class="pre">
             parentGenerator()
            </span>
           </code>
           -&gt;
           <code class="docutils literal">
            <span class="pre">
             parents
            </span>
           </code>
          </li>
         </ul>
         <p>
          So instead of this:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="k">for</span> <span class="n">parent</span> <span class="ow">in</span> <span class="n">tag</span><span class="o">.</span><span class="n">parentGenerator</span><span class="p">():</span>
    <span class="o">...</span>
</pre>
          </div>
         </div>
         <p>
          You can write this:
         </p>
         <div class="highlight-python">
          <div class="highlight">
           <pre><span class="k">for</span> <span class="n">parent</span> <span class="ow">in</span> <span class="n">tag</span><span class="o">.</span><span class="n">parents</span><span class="p">:</span>
    <span class="o">...</span>
</pre>
          </div>
         </div>
         <p>
          (But the old code will still work.)
         </p>
         <p>
          Some of the generators used to yield
          <code class="docutils literal">
           <span class="pre">
            None
           </span>
          </code>
          after they were done, and
then stop. That was a bug. Now the generators just stop.
         </p>
         <p>
          There are two new generators,
          <a class="reference internal" href="#string-generators">
           <span>
            .strings and
.stripped_strings
           </span>
          </a>
          .
          <code class="docutils literal">
           <span class="pre">
            .strings
           </span>
          </code>
          yields
NavigableString objects, and
          <code class="docutils literal">
           <span class="pre">
            .stripped_strings
           </span>
          </code>
          yields Python
strings that have had whitespace stripped.
         </p>
        </div>
        <div class="section" id="xml">
         <h3>
          XML
          <a class="headerlink" href="#xml" title="Permalink to this headline">
           ¶
          </a>
         </h3>
         <p>
          There is no longer a
          <code class="docutils literal">
           <span class="pre">
            BeautifulStoneSoup
           </span>
          </code>
          class for parsing XML. To
parse XML you pass in “xml” as the second argument to the
          <code class="docutils literal">
           <span class="pre">
            BeautifulSoup
           </span>
          </code>
          constructor. For the same reason, the
          <code class="docutils literal">
           <span class="pre">
            BeautifulSoup
           </span>
          </code>
          constructor no longer recognizes the
          <code class="docutils literal">
           <span class="pre">
            isHTML
           </span>
          </code>
          argument.
         </p>
         <p>
          Beautiful Soup’s handling of empty-element XML tags has been
improved. Previously when you parsed XML you had to explicitly say
which tags were considered empty-element tags. The
          <code class="docutils literal">
           <span class="pre">
            selfClosingTags
           </span>
          </code>
          argument to the constructor is no longer recognized. Instead,
Beautiful Soup considers any empty tag to be an empty-element tag. If
you add a child to an empty-element tag, it stops being an
empty-element tag.
         </p>
        </div>
        <div class="section" id="entities">
         <h3>
          Entities
          <a class="headerlink" href="#entities" title="Permalink to this headline">
           ¶
          </a>
         </h3>
         <p>
          An incoming HTML or XML entity is always converted into the
corresponding Unicode character. Beautiful Soup 3 had a number of
overlapping ways of dealing with entities, which have been
removed. The
          <code class="docutils literal">
           <span class="pre">
            BeautifulSoup
           </span>
          </code>
          constructor no longer recognizes the
          <code class="docutils literal">
           <span class="pre">
            smartQuotesTo
           </span>
          </code>
          or
          <code class="docutils literal">
           <span class="pre">
            convertEntities
           </span>
          </code>
          arguments. (
          <a class="reference internal" href="#unicode-dammit">
           Unicode,
Dammit
          </a>
          still has
          <code class="docutils literal">
           <span class="pre">
            smart_quotes_to
           </span>
          </code>
          , but its default is now to turn
smart quotes into Unicode.) The constants
          <code class="docutils literal">
           <span class="pre">
            HTML_ENTITIES
           </span>
          </code>
          ,
          <code class="docutils literal">
           <span class="pre">
            XML_ENTITIES
           </span>
          </code>
          , and
          <code class="docutils literal">
           <span class="pre">
            XHTML_ENTITIES
           </span>
          </code>
          have been removed, since they
configure a feature (transforming some but not all entities into
Unicode characters) that no longer exists.
         </p>
         <p>
          If you want to turn Unicode characters back into HTML entities on
output, rather than turning them into UTF-8 characters, you need to
use an
          <a class="reference internal" href="#output-formatters">
           <span>
            output formatter
           </span>
          </a>
          .
         </p>
        </div>
        <div class="section" id="id18">
         <h3>
          Miscellaneous
          <a class="headerlink" href="#id18" title="Permalink to this headline">
           ¶
          </a>
         </h3>
         <p>
          <a class="reference internal" href="#string">
           <span>
            Tag.string
           </span>
          </a>
          now operates recursively. If tag A
contains a single tag B and nothing else, then A.string is the same as
B.string. (Previously, it was None.)
         </p>
         <p>
          <a class="reference internal" href="#multi-valued-attributes">
           Multi-valued attributes
          </a>
          like
          <code class="docutils literal">
           <span class="pre">
            class
           </span>
          </code>
          have lists of strings as
their values, not strings. This may affect the way you search by CSS
class.
         </p>
         <p>
          If you pass one of the
          <code class="docutils literal">
           <span class="pre">
            find*
           </span>
          </code>
          methods both
          <a class="reference internal" href="#id12">
           <span>
            string
           </span>
          </a>
          <cite>
           and
          </cite>
          a tag-specific argument like
          <a class="reference internal" href="#id11">
           <span>
            name
           </span>
          </a>
          , Beautiful Soup will
search for tags that match your tag-specific criteria and whose
          <a class="reference internal" href="#string">
           <span>
            Tag.string
           </span>
          </a>
          matches your value for
          <a class="reference internal" href="#id12">
           <span>
            string
           </span>
          </a>
          . It will
          <cite>
           not
          </cite>
          find the strings themselves. Previously,
Beautiful Soup ignored the tag-specific arguments and looked for
strings.
         </p>
         <p>
          The
          <code class="docutils literal">
           <span class="pre">
            BeautifulSoup
           </span>
          </code>
          constructor no longer recognizes the
          <cite>
           markupMassage
          </cite>
          argument. It’s now the parser’s responsibility to
handle markup correctly.
         </p>
         <p>
          The rarely-used alternate parser classes like
          <code class="docutils literal">
           <span class="pre">
            ICantBelieveItsBeautifulSoup
           </span>
          </code>
          and
          <code class="docutils literal">
           <span class="pre">
            BeautifulSOAP
           </span>
          </code>
          have been
removed. It’s now the parser’s decision how to handle ambiguous
markup.
         </p>
         <p>
          The
          <code class="docutils literal">
           <span class="pre">
            prettify()
           </span>
          </code>
          method now returns a Unicode string, not a bytestring.
         </p>
        </div>
       </div>
      </div>
     </div>
    </div>
   </div>
   <div aria-label="main navigation" class="sphinxsidebar" role="navigation">
    <div class="sphinxsidebarwrapper">
     <h3>
      <a href="#">
       Table Of Contents
      </a>
     </h3>
     <ul>
      <li>
       <a class="reference internal" href="#">
        Beautiful Soup Documentation
       </a>
       <ul>
        <li>
         <a class="reference internal" href="#getting-help">
          Getting help
         </a>
        </li>
       </ul>
      </li>
      <li>
       <a class="reference internal" href="#quick-start">
        Quick Start
       </a>
      </li>
      <li>
       <a class="reference internal" href="#installing-beautiful-soup">
        Installing Beautiful Soup
       </a>
       <ul>
        <li>
         <a class="reference internal" href="#problems-after-installation">
          Problems after installation
         </a>
        </li>
        <li>
         <a class="reference internal" href="#installing-a-parser">
          Installing a parser
         </a>
        </li>
       </ul>
      </li>
      <li>
       <a class="reference internal" href="#making-the-soup">
        Making the soup
       </a>
      </li>
      <li>
       <a class="reference internal" href="#kinds-of-objects">
        Kinds of objects
       </a>
       <ul>
        <li>
         <a class="reference internal" href="#tag">
          <code class="docutils literal">
           <span class="pre">
            Tag
           </span>
          </code>
         </a>
         <ul>
          <li>
           <a class="reference internal" href="#name">
            Name
           </a>
          </li>
          <li>
           <a class="reference internal" href="#attributes">
            Attributes
           </a>
           <ul>
            <li>
             <a class="reference internal" href="#multi-valued-attributes">
              Multi-valued attributes
             </a>
            </li>
           </ul>
          </li>
         </ul>
        </li>
        <li>
         <a class="reference internal" href="#navigablestring">
          <code class="docutils literal">
           <span class="pre">
            NavigableString
           </span>
          </code>
         </a>
        </li>
        <li>
         <a class="reference internal" href="#beautifulsoup">
          <code class="docutils literal">
           <span class="pre">
            BeautifulSoup
           </span>
          </code>
         </a>
        </li>
        <li>
         <a class="reference internal" href="#comments-and-other-special-strings">
          Comments and other special strings
         </a>
        </li>
       </ul>
      </li>
      <li>
       <a class="reference internal" href="#navigating-the-tree">
        Navigating the tree
       </a>
       <ul>
        <li>
         <a class="reference internal" href="#going-down">
          Going down
         </a>
         <ul>
          <li>
           <a class="reference internal" href="#navigating-using-tag-names">
            Navigating using tag names
           </a>
          </li>
          <li>
           <a class="reference internal" href="#contents-and-children">
            <code class="docutils literal">
             <span class="pre">
              .contents
             </span>
            </code>
            and
            <code class="docutils literal">
             <span class="pre">
              .children
             </span>
            </code>
           </a>
          </li>
          <li>
           <a class="reference internal" href="#descendants">
            <code class="docutils literal">
             <span class="pre">
              .descendants
             </span>
            </code>
           </a>
          </li>
          <li>
           <a class="reference internal" href="#string">
            <code class="docutils literal">
             <span class="pre">
              .string
             </span>
            </code>
           </a>
          </li>
          <li>
           <a class="reference internal" href="#strings-and-stripped-strings">
            <code class="docutils literal">
             <span class="pre">
              .strings
             </span>
            </code>
            and
            <code class="docutils literal">
             <span class="pre">
              stripped_strings
             </span>
            </code>
           </a>
          </li>
         </ul>
        </li>
        <li>
         <a class="reference internal" href="#going-up">
          Going up
         </a>
         <ul>
          <li>
           <a class="reference internal" href="#parent">
            <code class="docutils literal">
             <span class="pre">
              .parent
             </span>
            </code>
           </a>
          </li>
          <li>
           <a class="reference internal" href="#parents">
            <code class="docutils literal">
             <span class="pre">
              .parents
             </span>
            </code>
           </a>
          </li>
         </ul>
        </li>
        <li>
         <a class="reference internal" href="#going-sideways">
          Going sideways
         </a>
         <ul>
          <li>
           <a class="reference internal" href="#next-sibling-and-previous-sibling">
            <code class="docutils literal">
             <span class="pre">
              .next_sibling
             </span>
            </code>
            and
            <code class="docutils literal">
             <span class="pre">
              .previous_sibling
             </span>
            </code>
           </a>
          </li>
          <li>
           <a class="reference internal" href="#next-siblings-and-previous-siblings">
            <code class="docutils literal">
             <span class="pre">
              .next_siblings
             </span>
            </code>
            and
            <code class="docutils literal">
             <span class="pre">
              .previous_siblings
             </span>
            </code>
           </a>
          </li>
         </ul>
        </li>
        <li>
         <a class="reference internal" href="#going-back-and-forth">
          Going back and forth
         </a>
         <ul>
          <li>
           <a class="reference internal" href="#next-element-and-previous-element">
            <code class="docutils literal">
             <span class="pre">
              .next_element
             </span>
            </code>
            and
            <code class="docutils literal">
             <span class="pre">
              .previous_element
             </span>
            </code>
           </a>
          </li>
          <li>
           <a class="reference internal" href="#next-elements-and-previous-elements">
            <code class="docutils literal">
             <span class="pre">
              .next_elements
             </span>
            </code>
            and
            <code class="docutils literal">
             <span class="pre">
              .previous_elements
             </span>
            </code>
           </a>
          </li>
         </ul>
        </li>
       </ul>
      </li>
      <li>
       <a class="reference internal" href="#searching-the-tree">
        Searching the tree
       </a>
       <ul>
        <li>
         <a class="reference internal" href="#kinds-of-filters">
          Kinds of filters
         </a>
         <ul>
          <li>
           <a class="reference internal" href="#a-string">
            A string
           </a>
          </li>
          <li>
           <a class="reference internal" href="#a-regular-expression">
            A regular expression
           </a>
          </li>
          <li>
           <a class="reference internal" href="#a-list">
            A list
           </a>
          </li>
          <li>
           <a class="reference internal" href="#true">
            <code class="docutils literal">
             <span class="pre">
              True
             </span>
            </code>
           </a>
          </li>
          <li>
           <a class="reference internal" href="#a-function">
            A function
           </a>
          </li>
         </ul>
        </li>
        <li>
         <a class="reference internal" href="#find-all">
          <code class="docutils literal">
           <span class="pre">
            find_all()
           </span>
          </code>
         </a>
         <ul>
          <li>
           <a class="reference internal" href="#the-name-argument">
            The
            <code class="docutils literal">
             <span class="pre">
              name
             </span>
            </code>
            argument
           </a>
          </li>
          <li>
           <a class="reference internal" href="#the-keyword-arguments">
            The keyword arguments
           </a>
          </li>
          <li>
           <a class="reference internal" href="#searching-by-css-class">
            Searching by CSS class
           </a>
          </li>
          <li>
           <a class="reference internal" href="#the-string-argument">
            The
            <code class="docutils literal">
             <span class="pre">
              string
             </span>
            </code>
            argument
           </a>
          </li>
          <li>
           <a class="reference internal" href="#the-limit-argument">
            The
            <code class="docutils literal">
             <span class="pre">
              limit
             </span>
            </code>
            argument
           </a>
          </li>
          <li>
           <a class="reference internal" href="#the-recursive-argument">
            The
            <code class="docutils literal">
             <span class="pre">
              recursive
             </span>
            </code>
            argument
           </a>
          </li>
         </ul>
        </li>
        <li>
         <a class="reference internal" href="#calling-a-tag-is-like-calling-find-all">
          Calling a tag is like calling
          <code class="docutils literal">
           <span class="pre">
            find_all()
           </span>
          </code>
         </a>
        </li>
        <li>
         <a class="reference internal" href="#find">
          <code class="docutils literal">
           <span class="pre">
            find()
           </span>
          </code>
         </a>
        </li>
        <li>
         <a class="reference internal" href="#find-parents-and-find-parent">
          <code class="docutils literal">
           <span class="pre">
            find_parents()
           </span>
          </code>
          and
          <code class="docutils literal">
           <span class="pre">
            find_parent()
           </span>
          </code>
         </a>
        </li>
        <li>
         <a class="reference internal" href="#find-next-siblings-and-find-next-sibling">
          <code class="docutils literal">
           <span class="pre">
            find_next_siblings()
           </span>
          </code>
          and
          <code class="docutils literal">
           <span class="pre">
            find_next_sibling()
           </span>
          </code>
         </a>
        </li>
        <li>
         <a class="reference internal" href="#find-previous-siblings-and-find-previous-sibling">
          <code class="docutils literal">
           <span class="pre">
            find_previous_siblings()
           </span>
          </code>
          and
          <code class="docutils literal">
           <span class="pre">
            find_previous_sibling()
           </span>
          </code>
         </a>
        </li>
        <li>
         <a class="reference internal" href="#find-all-next-and-find-next">
          <code class="docutils literal">
           <span class="pre">
            find_all_next()
           </span>
          </code>
          and
          <code class="docutils literal">
           <span class="pre">
            find_next()
           </span>
          </code>
         </a>
        </li>
        <li>
         <a class="reference internal" href="#find-all-previous-and-find-previous">
          <code class="docutils literal">
           <span class="pre">
            find_all_previous()
           </span>
          </code>
          and
          <code class="docutils literal">
           <span class="pre">
            find_previous()
           </span>
          </code>
         </a>
        </li>
        <li>
         <a class="reference internal" href="#css-selectors">
          CSS selectors
         </a>
        </li>
       </ul>
      </li>
      <li>
       <a class="reference internal" href="#modifying-the-tree">
        Modifying the tree
       </a>
       <ul>
        <li>
         <a class="reference internal" href="#changing-tag-names-and-attributes">
          Changing tag names and attributes
         </a>
        </li>
        <li>
         <a class="reference internal" href="#modifying-string">
          Modifying
          <code class="docutils literal">
           <span class="pre">
            .string
           </span>
          </code>
         </a>
        </li>
        <li>
         <a class="reference internal" href="#append">
          <code class="docutils literal">
           <span class="pre">
            append()
           </span>
          </code>
         </a>
        </li>
        <li>
         <a class="reference internal" href="#navigablestring-and-new-tag">
          <code class="docutils literal">
           <span class="pre">
            NavigableString()
           </span>
          </code>
          and
          <code class="docutils literal">
           <span class="pre">
            .new_tag()
           </span>
          </code>
         </a>
        </li>
        <li>
         <a class="reference internal" href="#insert">
          <code class="docutils literal">
           <span class="pre">
            insert()
           </span>
          </code>
         </a>
        </li>
        <li>
         <a class="reference internal" href="#insert-before-and-insert-after">
          <code class="docutils literal">
           <span class="pre">
            insert_before()
           </span>
          </code>
          and
          <code class="docutils literal">
           <span class="pre">
            insert_after()
           </span>
          </code>
         </a>
        </li>
        <li>
         <a class="reference internal" href="#clear">
          <code class="docutils literal">
           <span class="pre">
            clear()
           </span>
          </code>
         </a>
        </li>
        <li>
         <a class="reference internal" href="#extract">
          <code class="docutils literal">
           <span class="pre">
            extract()
           </span>
          </code>
         </a>
        </li>
        <li>
         <a class="reference internal" href="#decompose">
          <code class="docutils literal">
           <span class="pre">
            decompose()
           </span>
          </code>
         </a>
        </li>
        <li>
         <a class="reference internal" href="#replace-with">
          <code class="docutils literal">
           <span class="pre">
            replace_with()
           </span>
          </code>
         </a>
        </li>
        <li>
         <a class="reference internal" href="#wrap">
          <code class="docutils literal">
           <span class="pre">
            wrap()
           </span>
          </code>
         </a>
        </li>
        <li>
         <a class="reference internal" href="#unwrap">
          <code class="docutils literal">
           <span class="pre">
            unwrap()
           </span>
          </code>
         </a>
        </li>
       </ul>
      </li>
      <li>
       <a class="reference internal" href="#output">
        Output
       </a>
       <ul>
        <li>
         <a class="reference internal" href="#pretty-printing">
          Pretty-printing
         </a>
        </li>
        <li>
         <a class="reference internal" href="#non-pretty-printing">
          Non-pretty printing
         </a>
        </li>
        <li>
         <a class="reference internal" href="#output-formatters">
          Output formatters
         </a>
        </li>
        <li>
         <a class="reference internal" href="#get-text">
          <code class="docutils literal">
           <span class="pre">
            get_text()
           </span>
          </code>
         </a>
        </li>
       </ul>
      </li>
      <li>
       <a class="reference internal" href="#specifying-the-parser-to-use">
        Specifying the parser to use
       </a>
       <ul>
        <li>
         <a class="reference internal" href="#differences-between-parsers">
          Differences between parsers
         </a>
        </li>
       </ul>
      </li>
      <li>
       <a class="reference internal" href="#encodings">
        Encodings
       </a>
       <ul>
        <li>
         <a class="reference internal" href="#output-encoding">
          Output encoding
         </a>
        </li>
        <li>
         <a class="reference internal" href="#unicode-dammit">
          Unicode, Dammit
         </a>
         <ul>
          <li>
           <a class="reference internal" href="#smart-quotes">
            Smart quotes
           </a>
          </li>
          <li>
           <a class="reference internal" href="#inconsistent-encodings">
            Inconsistent encodings
           </a>
          </li>
         </ul>
        </li>
       </ul>
      </li>
      <li>
       <a class="reference internal" href="#comparing-objects-for-equality">
        Comparing objects for equality
       </a>
      </li>
      <li>
       <a class="reference internal" href="#copying-beautiful-soup-objects">
        Copying Beautiful Soup objects
       </a>
      </li>
      <li>
       <a class="reference internal" href="#parsing-only-part-of-a-document">
        Parsing only part of a document
       </a>
       <ul>
        <li>
         <a class="reference internal" href="#soupstrainer">
          <code class="docutils literal">
           <span class="pre">
            SoupStrainer
           </span>
          </code>
         </a>
        </li>
       </ul>
      </li>
      <li>
       <a class="reference internal" href="#troubleshooting">
        Troubleshooting
       </a>
       <ul>
        <li>
         <a class="reference internal" href="#diagnose">
          <code class="docutils literal">
           <span class="pre">
            diagnose()
           </span>
          </code>
         </a>
        </li>
        <li>
         <a class="reference internal" href="#errors-when-parsing-a-document">
          Errors when parsing a document
         </a>
        </li>
        <li>
         <a class="reference internal" href="#version-mismatch-problems">
          Version mismatch problems
         </a>
        </li>
        <li>
         <a class="reference internal" href="#parsing-xml">
          Parsing XML
         </a>
        </li>
        <li>
         <a class="reference internal" href="#other-parser-problems">
          Other parser problems
         </a>
        </li>
        <li>
         <a class="reference internal" href="#miscellaneous">
          Miscellaneous
         </a>
        </li>
        <li>
         <a class="reference internal" href="#improving-performance">
          Improving Performance
         </a>
        </li>
       </ul>
      </li>
      <li>
       <a class="reference internal" href="#id17">
        Beautiful Soup 3
       </a>
       <ul>
        <li>
         <a class="reference internal" href="#porting-code-to-bs4">
          Porting code to BS4
         </a>
         <ul>
          <li>
           <a class="reference internal" href="#you-need-a-parser">
            You need a parser
           </a>
          </li>
          <li>
           <a class="reference internal" href="#method-names">
            Method names
           </a>
          </li>
          <li>
           <a class="reference internal" href="#generators">
            Generators
           </a>
          </li>
          <li>
           <a class="reference internal" href="#xml">
            XML
           </a>
          </li>
          <li>
           <a class="reference internal" href="#entities">
            Entities
           </a>
          </li>
          <li>
           <a class="reference internal" href="#id18">
            Miscellaneous
           </a>
          </li>
         </ul>
        </li>
       </ul>
      </li>
     </ul>
     <div aria-label="source link" role="note">
      <h3>
       This Page
      </h3>
      <ul class="this-page-menu">
       <li>
        <a href="_sources/index.txt" rel="nofollow">
         Show Source
        </a>
       </li>
      </ul>
     </div>
     <div id="searchbox" role="search" style="display: none">
      <h3>
       Quick search
      </h3>
      <form action="search.html" class="search" method="get">
       <input name="q" type="text"/>
       <input type="submit" value="Go"/>
       <input name="check_keywords" type="hidden" value="yes"/>
       <input name="area" type="hidden" value="default"/>
      </form>
      <p class="searchtip" style="font-size: 90%">
       Enter search terms or a module, class or function name.
      </p>
     </div>
     <script type="text/javascript">
      $('#searchbox').show(0);
     </script>
    </div>
   </div>
   <div class="clearer">
   </div>
  </div>
  <div aria-label="related navigation" class="related" role="navigation">
   <h3>
    Navigation
   </h3>
   <ul>
    <li class="right" style="margin-right: 10px">
     <a href="genindex.html" title="General Index">
      index
     </a>
    </li>
    <li class="nav-item nav-item-0">
     <a href="#">
      Beautiful Soup 4.4.0 documentation
     </a>
     »
    </li>
   </ul>
  </div>
  <div class="footer" role="contentinfo">
   © Copyright 2004-2015, Leonard Richardson.
      Created using
   <a href="http://sphinx-doc.org/">
    Sphinx
   </a>
   1.3.6.
  </div>
 </body>
</html>
