Today's Posts Follow Us On Twitter! TFL Members on Twitter  
Forum search: Advanced Search  
  Members Login:
Lost password?
  Forum Statistics:
Forum Members: 24,254
Total Threads: 80,792
Total Posts: 566,458
There are 93 users currently browsing (tf).
  Our Partners:
  TalkFreelance     Design and Development     Programming     PHP and MySQL :

Grabbing content (aka. page scraping) from a website

Thread title: Grabbing content (aka. page scraping) from a website
    Thread tools Search this thread Display Modes  
04-18-2011, 12:44 AM
360 is offline 360
360's Avatar
Status: I'm new around here
Join date: Sep 2010
Location: Australia
Expertise: Design
Software: Adobe Photoshop
Posts: 8
iTrader: 0 / 0%

360 is on a distinguished road

  Old  Grabbing content (aka. page scraping) from a website

As an alternative to using an XML feed (possibly if a website doesnt offer any feeds) you can use the following method to load the website into PHP & then grab certain content:

// Create DOM from URL or file 
$html = file_get_html('');  

// Find all images 
foreach($html->find('img') as $element)
echo $element->src . '<br>'; 

// Find all links
foreach($html->find('a') as $element) 
echo $element->href . '<br>';
Note that this is not my original code, i've sourced this through google searches. Although it's definately handy so i wanted to share it with you all.


Thanked by 2 users:
DDS (04-18-2011), rocoso (06-09-2011)
06-19-2011, 05:33 AM
scriptbazaar is offline scriptbazaar
Status: I'm new around here
Join date: Nov 2008
Posts: 12
iTrader: 0 / 0%

scriptbazaar is on a distinguished road


Thanks for sharing his is really very handy.

Reply With Quote
06-19-2011, 10:12 PM
Wildhoney is offline Wildhoney
Wildhoney's Avatar
Status: Request a custom title
Join date: Feb 2006
Location: Nottingham
Posts: 1,648
iTrader: 18 / 95%

Wildhoney is on a distinguished road

Send a message via AIM to Wildhoney Send a message via MSN to Wildhoney Send a message via Yahoo to Wildhoney


Thank you. In extension to this, I see many people use regular expressions to scrape content from websites, when really XPath is the way to be heading. Especially since DOMDocument provides a DOMXPath class.

As always, W3Schools covers XPath nicely.

Reply With Quote

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
Thread Tools
Display Modes

  Posting Rules  
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump:
  Contains New Posts Forum Contains New Posts   Contains No New Posts Forum Contains No New Posts   A Closed Forum Forum is Closed