/tech/ - Technology

Where proprietary software comes to die

Posting mode: Reply

Drawing x size canvas

Remember to follow the rules

Max file size: 350.00 MB

Max files: 5

Max message length: 4096

Manage Board | Moderate Thread

Return | Catalog | Bottom

Expand All Images

(88.79 KB 350x350 N461785025KM.jpg)
Anonymous 04/16/2017 (Sun) 20:27:47 [Preview] No. 8336
So I have this blog site which hasn't had any new entries in some time and I want to read it all one post a day.
How can I get rss feed which pushes chronologically all posts starting from the first one ?

I'm thinking of creating a subreddit, using automoderator and getting the feed for that subreddit but it's somewhat inconvenient since I have to get some karma on that account and wait 30 days.

Any other way I can be going at this ?

Anonymous 04/18/2017 (Tue) 07:52:36 [Preview] No. 8345 del
Just shooting ideas here, wouldn't it be easier to pull all the entries with wget, then have a cron job pull the text chronologically and delete the page after.
Why do you want an rss feed, to read it from your phone? Not sure how you would upload the text and then get an rss feed from that you could access anywhere.

Anonymous 04/18/2017 (Tue) 07:53:55 [Preview] No. 8346 del
dude just get a gf

Anonymous 04/18/2017 (Tue) 12:06:13 [Preview] No. 8348 del
>Just shooting ideas here, wouldn't it be easier to pull all the entries with wget, then have a cron job pull the text chronologically and delete the page after.
Interesting, can you please expand on the second cron related part ?
>dude just get a gf
never ever

Anonymous 04/18/2017 (Tue) 15:56:28 [Preview] No. 8349 del
I think what anon was getting at to generate a page everyday from the archive you're downloading. It shouldn't be too hard depending on the blog.

Anonymous 04/18/2017 (Tue) 16:13:15 [Preview] No. 8350 del
This might be better suited to someone more skilled than I, but wouldn't that be a task for sed.
Use a while loop with an if statement and variables that work as date entries. Script searches for date, fails, increases an increment until it finds the next dated entry, pulls the text, deletes the file, outputs current date variable to file and exits. Cron job runs that script everyday.

Anonymous 04/18/2017 (Tue) 17:56:26 [Preview] No. 8351 del
Ok, I got the idea. I need to read up on some shit first because I have no idea how to use sed.

Anonymous 04/19/2017 (Wed) 00:32:20 [Preview] No. 8352 del
Sed isn't going to help you in this scenario. If the HTML is simple enough Grep would work but I would suggest you use a web scraping library in your favorite scripting language to make it easy.

What is the actual blog in question?

Anonymous 04/21/2017 (Fri) 13:22:40 [Preview] No. 8372 del
Uhm that one liner doesn't seem to work.

Anonymous 04/21/2017 (Fri) 14:16:12 [Preview] No. 8373 del
Oh yeah, clever, I didn't realise the posts were titled like that, much easier than by date.

Anonymous 04/21/2017 (Fri) 17:00:41 [Preview] No. 8376 del
What about it? What did you do, what happened, and what did you expect to happen?

Anonymous 04/21/2017 (Fri) 18:37:41 [Preview] No. 8377 del
Nevermind I'm retarded.

Anonymous 04/25/2017 (Tue) 02:32:53 [Preview] No. 8386 del
Here fam. This will downloads all the posts to text files. I'm sure you can figure it out from there

#!/usr/bin/env ruby
require "open-uri"
require "nokogiri"

body = Nokogiri::HTML(open("https://parahumans.wordpress.com/table-of-contents/"))
body.css("strong a").each do |link|
text = []
page = Nokogiri::HTML(open(link["href"]))
page.css("div[class=entry-content] p")[2..-2].each do |p|
p.text.gsub! "<em>" ""
text << p.text
title = page.at_css("title").text
puts title
File.open("#{title}.txt","w") do |f|
sleep 3

Anonymous 04/25/2017 (Tue) 02:36:40 [Preview] No. 8387 del
oops formatting is wrong. Here https://ghostbin.com/paste/3uj5w

Top | Return | Catalog | Post a reply