05-1-2012, 07:36 PM | #21 |
FFR Veteran
|
Re: [JAVA] Site Search Help
It should work. It's just really really slow. In particular the URL.openStream() method seems RIDICULOUSLY slow. Let it run a while and see if it eventually finds anything.
Oh and the site it pulls up may have broken gifs, jpegs, etc because it's not pulling down additional resources, but you should be seeing something after a while at least. If after letting it run for like 5 or 10 minutes it's still not finding anything, see if Code:
firefox -new-tab http://www.google.com As I've coded it, it works exactly as you wanted, on my machine, but RIDICULOUSLY SLOW! Here's a somewhat nicened-up version Code:
import java.io.*; import java.net.*; public class URLReader{ public static void main(String[] args) throws Exception{ new URLReader(); } public URLReader() throws Exception{ URL site = new URL("http://kaction.com/badfanfiction/"); BufferedReader in = new BufferedReader( new InputStreamReader(site.openStream())); BufferedWriter out = new BufferedWriter( new OutputStreamWriter( new FileOutputStream("savedsite.html"))); String inputLine; boolean found = false; while((inputLine = in.readLine()) != null){ out.write(inputLine, 0, inputLine.length()); if(inputLine.contains("the")){ if(!found){ System.out.println(inputLine); found = true; } } } out.close(); if(found){ Runtime.getRuntime().exec("firefox -new-tab savedsite.html"); } else{ Thread.sleep(2000); Thread.sleep(2000); in.close(); new URLReader(); } } } ================================================================= As modified, here is exactly what the program does. 1: Open a stream to http://kaction.com/badfanfiction/ 1b: This takes a really long time. This is where the holdup is. 2: Open an output file named savedsite.html 3: Writes the contents of the HTML stream to savedsite.html 4: If it finds the desired word, it prints the line to standard out 4b: Then it loads savedsite.html in a new tab in Firefox and program ends. 5: If it didn't find anything 5b: Waits around a little both to avoid hammering the website and give it a bit more time to "randomize". 5c: This is NOT where the holdup is. This is merely a courtesy. 6: Then it tries again back from the start. Notes: It is not a bug that the input stream isn't explicitly closed on a successful find since the program stops at that point and the JVM will automatically close any still-open handles. The Thread.sleep calls are not ... I repeat are not the cause of the massive holdup. In fact you could remove them if you want and still have the massive holdup. They are there for courtesy to the site itself. The contents of savedsite.html are overwritten on each round, so once it does find something you'll only see the HTML for the instance where it found it. Since only the HTML itself is raw-pulled, but not any other files linked within the HTML, any gifs, jpegs, etc will be broken, some JavaScript may be broken (depends how it's called/stored), and so forth. If you wanted to save the entire page, INCLUDING all externally-linked resources, that would be a lot more code. Additionally if you wanted to account for JavaScript-generated text, that would be a lot more code. You will need to manually delete savedsite.html when you are finished (unless you wanted to keep it). URL.openStream() is just SLOW -- so if you want to speed up the program, you'll need to replace that with some other means of opening/reading the website. This is nothing fixable unless you work for Oracle or something. Hopefully this gives you some idea of where you want to start for improvements. And yes it does work as-is. You just need to be patient because it's slow.
__________________
[SIGPIC][/SIGPIC] Last edited by UserNameGoesHere; 05-2-2012 at 06:27 PM.. |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Display Modes | |
|
|