Editorial Workflows

Scrape Glassdoor

public workflow

Install Workflow...

This workflow contains at least one Python script. Only use it if you trust the person who shared this with you, and if you know exactly what it does.

I understand, install the workflow!

This is a workflow for Editorial, a Markdown and plain text editor for iOS. To download it, you need to view this page on a device that has the app installed.

Description: scrapes 2 pages of Glassdoor website

Shared by: Jerry He

Comments: Comment Feed (RSS)

There are no comments yet.

+ Add Comment

Workflow Preview
Run Python Script ?
Source Code
#coding: utf-8
import workflow

workflow.set_output('https://www.glassdoor.com/Reviews/Walt-Disney-Company-Reviews-E717.htm')
Open URL ?
Open in
  • In-App Browser
  • Default App / Safari
URL
Input
Tab
  • Last-used Tab
  • New Tab
  • Tab with ID:
glassdoorloggedin
Wait until Loaded
ON
Reveal Browser Automatically
ON
Run Python Script ?
Source Code
from time import sleep

sleep(4)
Evaluate JavaScript ?
Source
//This sets the output of the workflow action: function first_tab(url_base) { var winFront = safari.windows[0] var tab = winFront.tabs[0] safari.doJavaScript("location.href='"+url_base+"'", { in: tab }) return tab } function str(fun) { return "(" + fun + ")()" } function scrapeThisPageReviews(i) { sessionStorage.clear() $('div.hreview').each(function() { var _res = {} var subratings = {} $(this).find('div.subRatings').find('ul.undecorated').find('li').each(function() { var cat_name = $(this).find('.minor').text() var rating = $(this).find('span.gdBars').attr('title'); subratings[cat_name]=rating }) _res['subratings']=subratings _res['helpful'] = $(this).find('.helpfulCount').text() _res['pros'] = $(this).find('div p.strong:contains(Pro)').next('p').text() _res['cons'] = $(this).find('div p.strong:contains(Con)').next('p').text() _res['advice_to_mgmt'] = $(this).find('div p.strong:contains(Advice)').next('p').text() _res['positive_boxes'] = $(this).find('i.green').parent('div').next('div').find('.middle').map(function() { return this.innerText }).toArray() _res['neutral_boxes'] = $(this).find('i.yellow').parent('div').next('div').find('.middle').map(function() { return this.innerText }).toArray() _res['negative_boxes'] = $(this).find('i.red').parent('div').next('div').find('.middle').map(function() { return this.innerText }).toArray() _res['author_jobtitle'] = $(this).find('span.authorJobTitle').text() _res['author_location'] = $(this).find('span.authorLocation').text() _res['review_title'] = $(this).find('a.reviewLink span.summary').text() _res['rating'] = $(this).find('span.rating span.value-title').attr('title') _res['date'] = $(this).find('time').text() i++; sessionStorage.setItem(i.toString(), JSON.stringify(_res)) }) return i } var num_scraped = scrapeThisPageReviews(0) var next_link; next_link = $($('li.next').find('a')[0]).attr('href') if(true) { //location.href = next_link setTimeout(function() { num_scraped= scrapeThisPageReviews(num_scraped)}, 8000) $('li.next').find('a').click() } window.output = next_link //num_scraped
Console Output ?
Text
Input
Run Python Script ?
Source Code
from time import sleep

sleep(4)
Evaluate JavaScript ?
Source
function fetchScrapedDataFromThisPage() { var res=[],i=0; while(sessionStorage.key(i) != null) { res.push(JSON.parse(sessionStorage.getItem(sessionStorage.key(i)))); i++; } return JSON.stringify(res) } window.output = fetchScrapedDataFromThisPage()
Console Output ?
Text
Input