Web-Crawler for "Food4rhino"


(Valmir) #1

Hey guys.

I was trying to create a list of all the add-on apps that exist for grasshopper in food4rhino, and since I’ve been doing some python coding lately (by creating simple things directly in Grasshopper) I thought, I might as well just write a short code, “a webcrawler” in pycharm to get the information all at once, and not manually 1 by 1.

I was looking to get, Title, Description and Link.

I was looking at this tutorial how to write a spider/webcrawler but something seems to not work, and I literally have no Idea what.

here is the code, if someone can help, thanks a lot.

import requests
from bs4 import BeautifulSoup

def addons_spider(max_pages):
page = 0
while page <= max_pages:
url = “http://www.food4rhino.com/browse?searchText=&page=0%2C” + str(page)
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text)
for link in soup.findAll(‘a’, {‘class’: ‘f4r_list_content_title’}):
href = link.get(‘href’)
print(href)
page += 1

addons_spider(5)


(Will Pearson) #5

Hey @valmirkastrati, can I ask what you want to do once you have this list of Grasshopper plug-ins?

P.S. Quick tip, formatting your code will help people to help you!