PDA

View Full Version : Web Crawler or Spider Software



prayami
26-06-2011, 02:51 PM
Hi,

Please suggest the best software(web crawler or spider or whatever it is called) to copy the data from a website and save it on our computer in ".csv" file.

Thanks.

Erayd
26-06-2011, 03:56 PM
Could you explain a bit more about what you're trying to do?

Note also that CSV is not generally a very good format for storing this kind of data.

Wget is a great tool for this kind of thing, provided you're happy using a CLI utility.

prayami
26-06-2011, 04:16 PM
Thanks for reply.
I want some software which can read a large data base from a dynamic website. And store those data in some file or database.
e.g.
If there is large data of "Name, Address and Phone" number etc on a particular website. Rather than copying and paste name, address and phone for each person on the excel file, I want a software, which can read each page and copy the indicated data and allow me to save on PC.
Thanks.

Erayd
26-06-2011, 04:20 PM
Do you have permission from the website owner to do this?

What do you want the data for?

If I'm understanding you correctly, you effectively want to dump a site's entire user database, and use those details for your own purposes. Note that there are legal ramifications here - you need to be a bit clearer about what you're actually wanting to do.

prayami
27-06-2011, 01:48 PM
Thanks for the reply.
We are not doing any illegal things. let me tell you in little more details.
Our supplier adding lots of products and removing many products every week. Now they provide us the list of the product and its price in CSV file. But they don't provide the product specifications and other details in CSV file because it is too big in side. But they are fine if we copy ourselves from their website.
Similarly they don't provide the products images. But they are fine if we copy from their website and put on our website.
Thanks.

Erayd
27-06-2011, 08:10 PM
Products from a supplier's website don't have addresses or phone numbers :devil.

What you're asking for is effectively a scraper that will fetch organised data from a single specific website, and merge it with an existing dataset. There is no pre-existing product that can do this for you; you'll need to commission someone to write one.

Noting the type of data involved, and the fairly simple requirements, this is something that can be done reasonably quickly (and therefore for a correspondingly low budget).

kahawai chaser
27-06-2011, 09:13 PM
I wonder if he's referring to affilate data feeds (http://www.dazzlindonna.com/blog/making-money-online/affiliate-sales-making-money-online/affiliate-datafeeds-an-introduction/), where shopping merchants, comparison shopping sites, and manufacturers legitimately supply them, in CSV, XML format to their affiliates to download. I have some for my hotel site, though I have not used - only small packets manually. Large feeds are transferred to server database and converted to html to display on your site. Can generate 1000's of category pages with products, url images, destinations, geo data, ratings, price, etc.

Often involves sever side scripting (php/sql) or a third part utility - such as the popular webmerge (http://www.fourthworld.com/products/webmerge/index.html) converter/processor. Unless if he can script, depending on no. of data records, you can buy tools, convert online or pay monthly fees (e.g. datafeedr (http://www.datafeedr.com/)) for a company to transfer/convert the feeds to web pages. Mostly used by internet affiliate marketers/niche website product publishers. Lately duplication penalties (from search engines - i.e. Google) I think have occurred - since feeds are duplicated by dozens of web sites.

Though the web site owner should supply them, there are means to import data yourself.