[SCB-25] support for fetching non - HTML/Text files content-type: application/vnd.ms-word | |
| Status: | Open |
| Project: | Scraping |
| Component/s: | scraping-engine |
| Affects Version/s: | |
| Fix Version/s: | |
| Type: | Task | Priority: | Major |
| Reporter: | ravi kant | Assignee: | Henri Yandell |
| Resolution: | Unresolved | ||
| Environment: | |||
| Description |
| support for fetching non - HTML/Text files i was scraping a web page and on that page i found a link when i click on that link through the browser i got a dialog box that asks for whether i m intersted in downloading or opening that file................ and when i try to fetch the same link i got following exception with a POST method :- java.lang.RuntimeException: org.osjava.scraping.FetchingException: ---------------------TRACE----------------------- Not going to fetch a non-text file from http://www.abc.com/abc.WebApp/servlet/common.DocServeServlet. Type is: content-type: application/msword at scraper.fetch.ConnectionUtilities.main(ConnectionUtilities.java:158) Caused by: org.osjava.scraping.FetchingException: Not going to fetch a non-text file from http://www.abc.com/abc.WebApp/servlet/common.DocServeServlet. Type is: content-type: application/msword at org.osjava.scraping.AbstractHttpFetcher.fetch(AbstractHttpFetcher.java:157) at scraper.fetch.ConnectionUtilities.scrapPage(ConnectionUtilities.java:80) at scraper.fetch.ConnectionUtilities.main(ConnectionUtilities.java:155) ------------------------------------------------------------------------------------------------------------------------------------------------------------------ and when i fetch the same link with GET method i got following trace:- java.lang.RuntimeException: org.osjava.scraping.FetchingException: Unable to fetch from http://www.timesjobs.com/timesJobWebApp/servlet/common.DocServeServlet?adId=43745975&name=vivek tiwari&resumePath=" due to error code 400 at scraper.fetch.ConnectionUtilities.main(ConnectionUtilities.java:158) Caused by: org.osjava.scraping.FetchingException: Unable to fetch from http://www.timesjobs.com/timesJobWebApp/servlet/common.DocServeServlet?adId=43745975&name=vivek tiwari&resumePath=" due to error code 400 at org.osjava.scraping.AbstractHttpFetcher.fetch(AbstractHttpFetcher.java:144) at scraper.fetch.ConnectionUtilities.scrapPage(ConnectionUtilities.java:80) at scraper.fetch.ConnectionUtilities.main(ConnectionUtilities.java:155) ------------------------------------------------------------------------------------------------------------------------------------------------------------------ please tell me how can i fetch this page.............. reply me at ravis22@gmail.com |