This is an exported version of the JIRA issue tracker. Please use the Google Code site to open new tickets or report updates to these existing tickets. Feel free to contact the mailing list with any questions.

[SCB-25] support for fetching non - HTML/Text files content-type: application/vnd.ms-word
Created: Fri, 28 Jul 2006 21:39:32 -0700 (PDT)  Updated: Fri, 28 Jul 2006 21:39:32 -0700 (PDT)

Status:Open
Project:Scraping
Component/s:scraping-engine
Affects Version/s:
Fix Version/s:

Type:TaskPriority: Major
Reporter:ravi kantAssignee:Henri Yandell
Resolution:Unresolved 
Environment:


 Description   
support for fetching non - HTML/Text files
i was scraping a web page and on that page i found a link when i click on that link through the browser i got a dialog box that asks for whether i m intersted in downloading or opening that file................
and when i try to fetch the same link i got following exception with a POST method :-
java.lang.RuntimeException: org.osjava.scraping.FetchingException:
---------------------TRACE-----------------------
 Not going to fetch a non-text file from http://www.abc.com/abc.WebApp/servlet/common.DocServeServlet. Type is: content-type: application/msword

at scraper.fetch.ConnectionUtilities.main(ConnectionUtilities.java:158)
Caused by: org.osjava.scraping.FetchingException: Not going to fetch a non-text file from http://www.abc.com/abc.WebApp/servlet/common.DocServeServlet. Type is: content-type: application/msword

at org.osjava.scraping.AbstractHttpFetcher.fetch(AbstractHttpFetcher.java:157)
at scraper.fetch.ConnectionUtilities.scrapPage(ConnectionUtilities.java:80)
at scraper.fetch.ConnectionUtilities.main(ConnectionUtilities.java:155)
------------------------------------------------------------------------------------------------------------------------------------------------------------------
and when i fetch the same link with GET method i got following trace:-

java.lang.RuntimeException: org.osjava.scraping.FetchingException: Unable to fetch from http://www.timesjobs.com/timesJobWebApp/servlet/common.DocServeServlet?adId=43745975&name=vivek tiwari&resumePath=" due to error code 400
at scraper.fetch.ConnectionUtilities.main(ConnectionUtilities.java:158)
Caused by: org.osjava.scraping.FetchingException: Unable to fetch from http://www.timesjobs.com/timesJobWebApp/servlet/common.DocServeServlet?adId=43745975&name=vivek tiwari&resumePath=" due to error code 400
at org.osjava.scraping.AbstractHttpFetcher.fetch(AbstractHttpFetcher.java:144)
at scraper.fetch.ConnectionUtilities.scrapPage(ConnectionUtilities.java:80)
at scraper.fetch.ConnectionUtilities.main(ConnectionUtilities.java:155)
------------------------------------------------------------------------------------------------------------------------------------------------------------------
please tell me how can i fetch this page..............
reply me at ravis22@gmail.com