11 Money laws children should learn from a young age

I come from a normal family. My parents grew up with very little and worked very hard to give me a very good childhood. But since they weren´t thought a lot of those rules they only thought me some…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Creating new Model Objects with Nokogiri

This blog post provides a simple tutorial about how to scrape a website with Nokogiri and create new model objects with that scraped data in Rails. I assume basic knowledge of Ruby on Rails and ActiveRecord.

Nokogiri is the Japanese translation for a fine-toothed saw used in woodwork. It’s also a Ruby gem that allows us to parse HTML, ripping through a massive string and allowing us to access the finer nested nodes within it.

Here’s a look at my DanceClass and Instructor table:

Notice that my dance class table includes a foreign key for instructors. Next, I establish this basic has many/belongs to relationship in the Instructor and Dance Class model.

If you haven’t added the Nokogiri gem to your Gemfile, bundled, and ran your ActiveRecord migrations, do that now. Time for the fun part.

Add a DanceClassScraper model and require ‘nokogiri’ and ‘open-uri’ at the top of the file.

Let’s define a new custom model method inside our Dance Class Scraper that will grab the HTML string at my desired URL:

Whenever using Nokogiri, the DevTools in your Chrome console are a godsend. If the HTML elements on the webpage you’re trying to scrape have defining attributes, your life will be so much easier. Unfortunately mine doesn’t, but where there’s creativity, there’s a way.

Each day of the week is represented in an <h2> element. Under each day of the week there is a div containing four side-by-side <ul> elements with a class name of “pricing-table” and several <li>’s representing the start and end time, name, level, and instructor of each dance class for that day. I’m going to ignore the level property for the purpose of my app but I definitely want the rest of that data, including the day of the week. Because the data structure is identical for each day of the week I’m going to collect an array of the <h2> elements (i.e. days) and iterate through each day to scrape its dance class data. Let’s create a new method in our scraper model:

I slice my array of <h2> elements, because the webpage contains an additional <h2> element at the bottom of the page about something unrelated. Now I want to define a separate method that takes in a day as an argument with the sole concern of scraping the dance class data for that day:

Understanding this, I’ll define a counter variable and until loop which will extract corresponding data from my information hash and use that to create new dance classes.

My plan was to use Nokogiri to scrape the dance class schedule data from three websites, not one. Unfortunately, The Movement and Playground dance schedules are JS-rendered, so I was unable to use Nokogiri to scrape them. I manually seeded the regular weekly dance class schedule at Movement and Playground. Figuring out how to scrape those websites will be a future project to improve my app. If you have any advice on how to get started, please let me know!

A tip when using Nokogiri: To view what’s scrapable on any webpage, disable JavaScript in Chrome Developer Tools. JS-rendered HTML will no longer be visible on the page, and anything remaining can be scraped with Nokogiri.

Add a comment

Related posts:

Kamu Boleh Salah Jurusan Asal Tidak Salah Belajar

Buat yang mengalami salah jurusan waktu kuliah, jangan khawatir. Di dunia ini bukan hanya kamu yang mengalami. Beberapa orang yang berhasil rata-rata adalah mereka yang pernah salah jurusan. Lebih…

Embracing Body Positivity

Being body positive is about embracing and honoring your body and all its imperfections. Once a movement of plus-size people seeking to find body peace, anyone with a body can be body positive. In…