Monday, May 29, 2023

BeautifulSoap

 BeautifulSoap Library in Python

    The Beautiful Soup library is a popular Python library used for web scraping and parsing HTML or XML documents. It provides a convenient way to extract data from web pages by navigating the parsed document tree and searching for specific elements or patterns.

    To use Beautiful Soup, you'll first need to install it. You can do this using pip, the Python package manager, by running the following command:

pip install beautifulsoup4

    Once installed, you can import Beautiful Soup into your Python script or interactive session using the following import statement:

from bs4 import BeautifulSoup




Example :-
from bs4 import BeautifulSoup

# HTML document
html_doc = '''
<html>
<head>
    <title>Example</title>
</head>
<body>
    <h1>Heading</h1>
    <p class="content">Paragraph 1</p>
    <p>Paragraph 2</p>
</body>
</html>
'''

# Create a BeautifulSoup object
soup = BeautifulSoup(html_doc, 'html.parser')

# Extract specific elements
title = soup.title
heading = soup.h1
paragraphs = soup.find_all('p')

# Print the extracted data
print("Title:", title.string)
print("Heading:", heading.string)
print("Paragraphs:")
for p in paragraphs:
    print(p.string)

Output :-


    In the above example, we create a BeautifulSoup object by passing the HTML document and the parser type ('html.parser') to the constructor. We can then use various methods and attributes provided by Beautiful Soup to navigate and extract data from the parsed document.
    In this case, we extract the title element using soup.title and the heading element using soup.h1. We also find all the <p> elements using soup.find_all('p'). The extracted data can be accessed through the string attribute of each element.

Beautiful Soup also provides a wide range of methods and features for searching, filtering, and manipulating the parsed document tree. 

No comments:

Post a Comment

Multiprocessing

What is Multiprocessing?  Multiprocessing refers to the ability of a system to support more than one processor at the same time. Application...

Popular