Quantcast

Jump to content


Photo

Mass Image Grabber (Mac)


  • Please log in to reply
22 replies to this topic

#1 Eyesore

Eyesore
  • 259 posts

Posted 20 December 2007 - 02:52 AM

So I have a program that can pull down massive images from message board like /b/ (iScooper).
I need one that can pull all the images, in full size not thumbnails, off a page like macrochan.
If anyone knows a program for Mac that can do this, preferably free, much obliged.

#2 Hydrogen

Hydrogen
  • Neocodex Co-Founder

  • 22213 posts


Users Awards

Posted 20 December 2007 - 03:34 AM

I haven't seen any of the stuff you are talking about but have you tried downthemall for firefox? You can give it a regular expression to match files to download. This can be your images or whatever. I don't know if this will work out for you though.

https://addons.mozil...refox/addon/201

Again, not sure if this is what you are looking for.

#3 Eyesore

Eyesore
  • 259 posts

Posted 20 December 2007 - 03:48 AM

Thats kind of what I'm looking for, just tested and unless I can find an option for it, its not exactly what I'm looking for.

Essentially, I see rows of "thumbnails"(not tiny thumbnails but they aren't the pictures true size) and I need to download the full size pic that I could manually download by clicking the picture to enlarge it and then saving the image etc.


Like my avatar for example, it uses the full url of the picture off of macrochan's servers. On macrochan it displays slightly larger than the avatar size allowed here. I need something that sees that thumbnail and it's respective location and downloads the original picture not the scaled down version.

Edited by Eyesore, 20 December 2007 - 04:04 AM.


#4 Dan

Dan
  • Resident Know-It-All

  • 6382 posts


Users Awards

Posted 20 December 2007 - 04:01 AM

You'll get too much CP

#5 Eyesore

Eyesore
  • 259 posts

Posted 20 December 2007 - 04:07 AM

QUOTE(SuperDan @ Dec 20 2007, 06:01 AM) View Post
You'll get too much CP


-_-, I already have stuff to yank full scale pics off of imageboards.....not a whole lotta CP goin on in macrochan's motivational poster category

#6 Dan

Dan
  • Resident Know-It-All

  • 6382 posts


Users Awards

Posted 20 December 2007 - 04:07 AM

QUOTE(Eyesore @ Dec 20 2007, 12:07 PM) View Post
-_-, I already have stuff to yank full scale pics off of imageboards.....not a whole lotta CP goin on in macrochan's motivational poster category


It can be arranged.

#7 Eyesore

Eyesore
  • 259 posts

Posted 20 December 2007 - 04:09 AM

QUOTE(SuperDan @ Dec 20 2007, 06:07 AM) View Post
It can be arranged.


-_-, er....well thats still not my cup of tea so.....I just want me some funny posters....it's hard weeding through them all on /b/....

#8 shabba

shabba

Posted 20 December 2007 - 01:29 PM

its abit more difficult than just grabbing the image since you want them to be the full image not the thumbnail, you need like a greasemonkey script and then use DownThemAll me thinks

#9 Hydrogen

Hydrogen
  • Neocodex Co-Founder

  • 22213 posts


Users Awards

Posted 20 December 2007 - 02:38 PM

QUOTE(Eyesore @ Dec 20 2007, 04:48 AM) View Post
Thats kind of what I'm looking for, just tested and unless I can find an option for it, its not exactly what I'm looking for.

Essentially, I see rows of "thumbnails"(not tiny thumbnails but they aren't the pictures true size) and I need to download the full size pic that I could manually download by clicking the picture to enlarge it and then saving the image etc.
Like my avatar for example, it uses the full url of the picture off of macrochan's servers. On macrochan it displays slightly larger than the avatar size allowed here. I need something that sees that thumbnail and it's respective location and downloads the original picture not the scaled down version.

Well if they have put the full size image on the page and just scaled it down manually with the HTML rather than creating a special thumbnail version of the image, then downthemall will work great for you.

#10 DudeOnline

DudeOnline
  • 1897 posts

Posted 20 December 2007 - 02:53 PM

DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU DESU

#11 Eyesore

Eyesore
  • 259 posts

Posted 20 December 2007 - 09:35 PM

QUOTE(Hydrogen @ Dec 20 2007, 04:38 PM) View Post
Well if they have put the full size image on the page and just scaled it down manually with the HTML rather than creating a special thumbnail version of the image, then downthemall will work great for you.


Well I see what is actually happening now. And therein lies the problem.
Here is an example of the source for a picture.(Just so happens its the one for my avatar)
<a href="get.py?sha1=T6WA26DSX43TTW3KYUBT2BF35HFM6MGB">
<img src="http://img.macrochan.org/thumbs/T/6/T6WA26DSX43TTW3KYUBT2BF35HFM6MGB.jpeg"
alt="urn:sha1:T6WA26DSX43TTW3KYUBT2BF35HFM6MGB" /></a>The actual a tag links to a page containing the full scale picture. The img tag however
is the thumb version. So it displays the thumb version on the screen and it seems the thumbs are stored in their own little directory. So essentially I need either a script or a program that can grab only the pictures linked to with the a tag....

So I can use downthemall to either get the thumbnails
<img src="http://img.macrochan.org/thumbs/T/6/T6WA26DSX43TTW3KYUBT2BF35HFM6MGB.jpeg"
alt="urn:sha1:T6WA26DSX43TTW3KYUBT2BF35HFM6MGB" />

OR

I can use it to get the links to the pictures which are actually html pages.
<a href="get.py?sha1=T6WA26DSX43TTW3KYUBT2BF35HFM6MGB">
The latter works in a sense that I get the full picture but I also get the
rest of the page associated with it. I guess since I do get the full picture
I could just DTA all the pages and I would essentially have the pages with
the pictures embedded in them(still linking to macrochan) and not have local
copies of the pics.

So essentially I need something that goes to the link specified in the a tag
and grabs the img tag from that page. Alternatively, since the only thing
that makes it a different directory is
http://img.macrochan.org/thumbs/T/6/T6...
if I there was some way to grab all the image links and omit the "thumbs/"
from it.

Additionally, yes DESU!


#12 Hydrogen

Hydrogen
  • Neocodex Co-Founder

  • 22213 posts


Users Awards

Posted 20 December 2007 - 09:45 PM

I'm confused, did DownThemAll work for you?

What if you used one of those site rippers which rip entire sites?

#13 Eyesore

Eyesore
  • 259 posts

Posted 20 December 2007 - 09:48 PM

QUOTE(Hydrogen @ Dec 20 2007, 11:45 PM) View Post
I'm confused, did DownThemAll work for you?

What if you used one of those site rippers which rip entire sites?



Well it kind of works, I can either get the web page in full html form that has the embedded image(in which case I don't get a local copy of the image). Or I can get thumbnails(in which case I have a very small local copy of the image). The "simplest" way would be for me to "somehow" figure out how to tell downthemall to get all image tags and omit any "thumbs/" strings from it.

As in this is the source for the thumb
http://img.macrochan.org/thumbs/T/6/T6WA26DSX43TTW3KYUBT2BF35HFM6MGB.jpeg

And this is the source for the fulls scale picture
http://img.macrochan...F35HFM6MGB.jpeg

Edited by Eyesore, 20 December 2007 - 09:50 PM.


#14 Hydrogen

Hydrogen
  • Neocodex Co-Founder

  • 22213 posts


Users Awards

Posted 20 December 2007 - 11:21 PM

I'll see if I can work on a regular expression that will do that. It doesn't seem too hard. Just need to test it to make sure I get it right. I'll edit this post once I get one tongue.gif.

Sorry I got caught up with something, but I'm gonna attempt to get that regular expression for you now. One sec.

When I tried to simulate downloading the images...I could only see the thumbs links in the downthemall screen. Perhaps you could explain the process you were using to get both.

In any case, I've done a little towards getting the proper regular expression. This is what I have so far:
/((?!thumbs)\b\w+\W+)/

It uses forward lookahead since apparently excluding a word in a match is very hard to do...who would have known? It still matches the other parts of the the strings even though it doesn't match the word thumbs. Hopefully you can take it from here since I need to do something else sad.gif. Sorry sad.gif.

#15 Eyesore

Eyesore
  • 259 posts

Posted 20 December 2007 - 11:41 PM

Well since the browse pages are structured using <a href="page"><img src="thumb" /></a>
Where "page" is a web page with a navigation structure and has essentially <img src="nonthumb" />
Where "thumb" is the thumbnail and "nonthumb" is the full scale.

Most of that is irrelevant since upon inspection of the actual image locations I found that
the only key difference between the location of the two pictures is the insertion of "thumbs/" in the image's directory.

i.e <img src="img.etc.com/thumbs/Pic/1" /> = thumb
<img src="img.etc.com/Pic/1" /> = fullscale

I'll see if I can figure something off of what you have started. Don't have a whole lot of programming background yet, just a little bit of basic javascript. So I'll just google around. Thanks for your effort though, its much appreciated.

Edited by Eyesore, 20 December 2007 - 11:42 PM.


#16 Hydrogen

Hydrogen
  • Neocodex Co-Founder

  • 22213 posts


Users Awards

Posted 20 December 2007 - 11:45 PM

QUOTE(Eyesore @ Dec 21 2007, 12:41 AM) View Post
Well since the browse pages are structured using <a href="page"><img src="thumb" /></a>
Where "page" is a web page with a navigation structure and has essentially <img src="nonthumb" />
Where "thumb" is the thumbnail and "nonthumb" is the full scale.

Most of that is irrelevant since upon inspection of the actual image locations I found that
the only key difference between the location of the two pictures is the insertion of "thumbs/" in the image's directory.

i.e <img src="img.etc.com/thumbs/Pic/1" /> = thumb
<img src="img.etc.com/Pic/1" /> = fullscale

I'll see if I can figure something off of what you have started. Don't have a whole lot of programming background yet, just a little bit of basic javascript. So I'll just google around. Thanks for your effort though, its much appreciated.

If the only difference is that there is a thumbs string in the address of the image, it would be quite easy to write a perl script to just rip every image from the site.

I thought what you thought too and tried to remove the thumbs link and it didn't work out for me. Perhaps I did it wrong though.

#17 Eyesore

Eyesore
  • 259 posts

Posted 20 December 2007 - 11:54 PM

QUOTE(Eyesore @ Dec 20 2007, 11:48 PM) View Post
As in this is the source for the thumb
http://img.macrochan.org/thumbs/T/6/T6WA26DSX43TTW3KYUBT2BF35HFM6MGB.jpeg

And this is the source for the fulls scale picture
http://img.macrochan...F35HFM6MGB.jpeg



QUOTE(Hydrogen @ Dec 21 2007, 01:45 AM) View Post
If the only difference is that there is a thumbs string in the address of the image, it would be quite easy to write a perl script to just rip every image from the site.

I thought what you thought too and tried to remove the thumbs link and it didn't work out for me. Perhaps I did it wrong though.


In that post I made earlier those are two actual links. The first is the full url for the thumbnail. If you remove "thumbs/"(the word thumbs and the following / ) it will link to the full image. I hope I am being understandable, because this is tricky to explain in words I guess.

Edited by Eyesore, 20 December 2007 - 11:56 PM.


#18 Hydrogen

Hydrogen
  • Neocodex Co-Founder

  • 22213 posts


Users Awards

Posted 20 December 2007 - 11:59 PM

QUOTE(Eyesore @ Dec 21 2007, 12:54 AM) View Post
In that post I made earlier those are two actual links. The first is the full url for the thumbnail. If you remove "thumbs/"(the word thumbs and the following / ) it will link to the full image. I hope I am being understandable, because this is tricky to explain in words I guess.

Nah you are explaining it fine smile.gif. Here's an example though:
CODE
http://img.macrochan.org/thumbs/I/B/IBRGZHSVYEEU2POQKZNYWSELTG35XYTE.jpeg
and
CODE
http://img.macrochan.org/I/B/IBRGZHSVYEEU2POQKZNYWSELTG35XYTE.jpeg



#19 Eyesore

Eyesore
  • 259 posts

Posted 21 December 2007 - 12:20 AM

The one that you referenced does have an error.
It seems the second link, full scale one, should be .jpg not .jpeg.
So my theory was almost correct, but I guess not true for all of them.

Edit: I feel dumb now. img.macrochan.org is an open directory that I can browse....
Now I just need something to go to each of the individual directories and download them all.

Edited by Eyesore, 21 December 2007 - 12:31 AM.


#20 Hydrogen

Hydrogen
  • Neocodex Co-Founder

  • 22213 posts


Users Awards

Posted 21 December 2007 - 12:30 AM

QUOTE(Eyesore @ Dec 21 2007, 01:20 AM) View Post
The one that you referenced does have an error.
It seems the second link, full scale one, should be .jpg not .jpeg.
So my theory was almost correct, but I guess not true for all of them.

If you have some missing, is that a problem? tongue.gif

#21 Eyesore

Eyesore
  • 259 posts

Posted 21 December 2007 - 12:34 AM

It wouldn't be a problem no, I'm just trying to get as many as I can. Now that I found out that I can actually just access the image server alone though, I think I'm looking for more of a spidering application. Something that will go into each directory and download everything in it.

It seems the structure on the image server is
Directories
2
3
etc to 7
A
B
C etc to Z

and each directory has a sub-directory with the same specifications.
So there are pictures in 2/2/ through Z/Z.....thats a lot of pictures....

*If that is hard to understand I'm sorry.*

#22 Hydrogen

Hydrogen
  • Neocodex Co-Founder

  • 22213 posts


Users Awards

Posted 21 December 2007 - 12:38 AM

QUOTE(Eyesore @ Dec 21 2007, 01:34 AM) View Post
It wouldn't be a problem no, I'm just trying to get as many as I can. Now that I found out that I can actually just access the image server alone though, I think I'm looking for more of a spidering application. Something that will go into each directory and download everything in it.

It seems the structure on the image server is
Directories
2
3
etc to 7
A
B
C etc to Z

and each directory has a sub-directory with the same specifications.
So there are pictures in 2/2/ through Z/Z.....thats a lot of pictures....

*If that is hard to understand I'm sorry.*

This actually makes it easier since they have open directory support. You can write a perl script to go and scrape the entire site then. I can work on a perl script that does this for you if you like...

#23 Eyesore

Eyesore
  • 259 posts

Posted 21 December 2007 - 12:42 AM

QUOTE(Hydrogen @ Dec 21 2007, 02:38 AM) View Post
This actually makes it easier since they have open directory support. You can write a perl script to go and scrape the entire site then. I can work on a perl script that does this for you if you like...



If you could that would be super super awesome. It's not a major thing that I get it immediately either so feel free to do it just whenever you get time to do so, no rush. If I hit their entire site I'm going to have to do it off of my Linux box anyway (which I still have to unpack since college is out for the semester) because my laptop would cry to take down that many images lol. Thank you Thank you Thanks


0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users