Fixing implicit parser name for Beautiful Soup (lms, openedx) (#24100)
Fixing 56 GuessedAtParserWarnings, in commit edx#24098 Background: BeautifulSoup automatically picks the fastest parser available. By default, it picks the "lxml" parser. Per the [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser) documentation: > Beautiful Soup supports the HTML parser included in Python’s standard library, but it also supports a number of third-party Python parsers. One is the lxml parser. Depending on your setup, you might install lxml with one of these commands. > Another alternative is the pure-Python html5lib parser, which parses HTML the way a web browser does. Context: We changed two statements, one in lms and another in openedx. Both statements fire up BeautifulSoup. Now we explicitly ask for "lxml," following the recommendation on BeautifulSoup's documentation: > If you can, I recommend you install and use lxml for speed. If you’re using a very old version of Python – earlier than 2.7.3 or 3.2.2 – it’s essential that you install lxml or html5lib. Python’s built-in HTML parser is just not very good in those old versions. Before: `soup = BeautifulSoup(content)` After: `soup = BeautifulSoup(markup=content, features="lxml")` The warnings are gone, tests are passing in local.
Please register or sign in to comment