Skip to content
Snippets Groups Projects
Unverified Commit 82c0ca0d authored by Daniel Francis's avatar Daniel Francis Committed by GitHub
Browse files

Fixing implicit parser name for Beautiful Soup (lms, openedx) (#24100)

Fixing 56 GuessedAtParserWarnings, in commit edx#24098

Background: BeautifulSoup automatically picks the fastest parser available. By default, it picks the "lxml" parser.

Per the [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser) documentation:

> Beautiful Soup supports the HTML parser included in Python’s standard library, but it also supports a number of third-party Python parsers. One is the lxml parser. Depending on your setup, you might install lxml with one of these commands.
> Another alternative is the pure-Python html5lib parser, which parses HTML the way a web browser does. 

Context: We changed two statements, one in lms and another in openedx. Both statements fire up BeautifulSoup. Now we explicitly ask for "lxml," following the recommendation on BeautifulSoup's documentation:

> If you can, I recommend you install and use lxml for speed. If you’re using a very old version of Python – earlier than 2.7.3 or 3.2.2 – it’s essential that you install lxml or html5lib. Python’s built-in HTML parser is just not very good in those old versions.

Before:
`soup = BeautifulSoup(content)`

After:
`soup = BeautifulSoup(markup=content, features="lxml")`

The warnings are gone, tests are passing in local.
parent 5cedc64f
No related branches found
Tags release-2020-06-18-17.25
No related merge requests found
......@@ -999,7 +999,7 @@ class TestPayAndVerifyView(UrlResetMixin, ModuleStoreTestCase, XssTestMixin, Tes
def _get_page_data(self, response):
"""Retrieve the data attributes rendered on the page. """
soup = BeautifulSoup(response.content)
soup = BeautifulSoup(markup=response.content, features="lxml")
pay_and_verify_div = soup.find(id="pay-and-verify-container")
self.assertIsNot(
......
......@@ -449,7 +449,7 @@ class XBlockTestCase(XBlockStudentTestCaseMixin,
'''
usage_id = self.xblocks[urlname].scope_ids.usage_id
# First, we get out our <div>
soup_html = BeautifulSoup(content)
soup_html = BeautifulSoup(markup=content, features="lxml")
xblock_html = six.text_type(soup_html.find(id="seq_contents_0"))
# Now, we get out the text of the <div>
try:
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment