Prev Previous Post   Next Post Next
Old 05-1-2012, 07:36 PM   #21
UserNameGoesHere
FFR Veteran
FFR Veteran
 
UserNameGoesHere's Avatar
 
Join Date: May 2008
Posts: 1,114
Send a message via AIM to UserNameGoesHere
Default Re: [JAVA] Site Search Help

It should work. It's just really really slow. In particular the URL.openStream() method seems RIDICULOUSLY slow. Let it run a while and see if it eventually finds anything.

Oh and the site it pulls up may have broken gifs, jpegs, etc because it's not pulling down additional resources, but you should be seeing something after a while at least.

If after letting it run for like 5 or 10 minutes it's still not finding anything, see if
Code:
firefox -new-tab http://www.google.com
pulls up Google in a new tab within your Firefox. Because if not, then maybe it's your version of Firefox causing the problem. If that doesn't work, try removing the -new-tab and just let it pop up its own window in Firefox.

As I've coded it, it works exactly as you wanted, on my machine, but RIDICULOUSLY SLOW!

Here's a somewhat nicened-up version
Code:
import java.io.*;
import java.net.*;

public class URLReader{
	public static void main(String[] args) throws Exception{
		new URLReader();
	}
	
	public URLReader() throws Exception{
		URL site = new URL("http://kaction.com/badfanfiction/");
		BufferedReader in = new BufferedReader(
				new InputStreamReader(site.openStream()));
		BufferedWriter out = new BufferedWriter(
				new OutputStreamWriter(
				new FileOutputStream("savedsite.html")));
		
		String inputLine;
		boolean found = false;
		while((inputLine = in.readLine()) != null){
			out.write(inputLine, 0, inputLine.length());
			if(inputLine.contains("the")){
				if(!found){
					System.out.println(inputLine);
					found = true;
				}
			}
		}
		out.close();
		if(found){
			Runtime.getRuntime().exec("firefox -new-tab savedsite.html");
		}
		else{
			Thread.sleep(2000);
			Thread.sleep(2000);
			in.close();
			new URLReader();
		}
	}
}
In particular that should clear up a few compiler warnings.

=================================================================

As modified, here is exactly what the program does.

1: Open a stream to http://kaction.com/badfanfiction/
1b: This takes a really long time. This is where the holdup is.
2: Open an output file named savedsite.html
3: Writes the contents of the HTML stream to savedsite.html
4: If it finds the desired word, it prints the line to standard out
4b: Then it loads savedsite.html in a new tab in Firefox and program ends.
5: If it didn't find anything
5b: Waits around a little both to avoid hammering the website and give it a bit more time to "randomize".
5c: This is NOT where the holdup is. This is merely a courtesy.
6: Then it tries again back from the start.

Notes: It is not a bug that the input stream isn't explicitly closed on a successful find since the program stops at that point and the JVM will automatically close any still-open handles.
The Thread.sleep calls are not ... I repeat are not the cause of the massive holdup. In fact you could remove them if you want and still have the massive holdup. They are there for courtesy to the site itself.
The contents of savedsite.html are overwritten on each round, so once it does find something you'll only see the HTML for the instance where it found it.
Since only the HTML itself is raw-pulled, but not any other files linked within the HTML, any gifs, jpegs, etc will be broken, some JavaScript may be broken (depends how it's called/stored), and so forth.
If you wanted to save the entire page, INCLUDING all externally-linked resources, that would be a lot more code.
Additionally if you wanted to account for JavaScript-generated text, that would be a lot more code.
You will need to manually delete savedsite.html when you are finished (unless you wanted to keep it).

URL.openStream() is just SLOW -- so if you want to speed up the program, you'll need to replace that with some other means of opening/reading the website. This is nothing fixable unless you work for Oracle or something.

Hopefully this gives you some idea of where you want to start for improvements.

And yes it does work as-is. You just need to be patient because it's slow.
__________________
Quote:
Originally Posted by Crashfan3 View Post
Man, what would we do without bored rednecks?
[SIGPIC][/SIGPIC]

Last edited by UserNameGoesHere; 05-2-2012 at 06:27 PM..
UserNameGoesHere is offline   Reply With Quote
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is On
HTML code is Off

Forum Jump



All times are GMT -5. The time now is 09:13 PM.


Powered by vBulletin® Version 3.8.1
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright FlashFlashRevolution