[JAVA] Site Search Help - Page 2

**UserNameGoesHere** · 05-1-2012, 07:36 PM

It should work. It's just really really slow. In particular the URL.openStream() method seems RIDICULOUSLY slow. Let it run a while and see if it eventually finds anything.

Oh and the site it pulls up may have broken gifs, jpegs, etc because it's not pulling down additional resources, but you should be seeing something after a while at least.

If after letting it run for like 5 or 10 minutes it's still not finding anything, see if

Code:

firefox -new-tab http://www.google.com

pulls up Google in a new tab within your Firefox. Because if not, then maybe it's your version of Firefox causing the problem. If that doesn't work, try removing the -new-tab and just let it pop up its own window in Firefox.

As I've coded it, it works exactly as you wanted, on my machine, but RIDICULOUSLY SLOW!

Here's a somewhat nicened-up version

Code:

import java.io.*;
import java.net.*;

public class URLReader{
	public static void main(String[] args) throws Exception{
		new URLReader();
	}
	
	public URLReader() throws Exception{
		URL site = new URL("http://kaction.com/badfanfiction/");
		BufferedReader in = new BufferedReader(
				new InputStreamReader(site.openStream()));
		BufferedWriter out = new BufferedWriter(
				new OutputStreamWriter(
				new FileOutputStream("savedsite.html")));
		
		String inputLine;
		boolean found = false;
		while((inputLine = in.readLine()) != null){
			out.write(inputLine, 0, inputLine.length());
			if(inputLine.contains("the")){
				if(!found){
					System.out.println(inputLine);
					found = true;
				}
			}
		}
		out.close();
		if(found){
			Runtime.getRuntime().exec("firefox -new-tab savedsite.html");
		}
		else{
			Thread.sleep(2000);
			Thread.sleep(2000);
			in.close();
			new URLReader();
		}
	}
}

In particular that should clear up a few compiler warnings.

=================================================================

As modified, here is exactly what the program does.

1: Open a stream to http://kaction.com/badfanfiction/
1b: This takes a really long time. This is where the holdup is.
2: Open an output file named savedsite.html
3: Writes the contents of the HTML stream to savedsite.html
4: If it finds the desired word, it prints the line to standard out
4b: Then it loads savedsite.html in a new tab in Firefox and program ends.
5: If it didn't find anything
5b: Waits around a little both to avoid hammering the website and give it a bit more time to "randomize".
5c: This is NOT where the holdup is. This is merely a courtesy.
6: Then it tries again back from the start.

Notes: It is not a bug that the input stream isn't explicitly closed on a successful find since the program stops at that point and the JVM will automatically close any still-open handles.
The Thread.sleep calls are not ... I repeat are not the cause of the massive holdup. In fact you could remove them if you want and still have the massive holdup. They are there for courtesy to the site itself.
The contents of savedsite.html are overwritten on each round, so once it does find something you'll only see the HTML for the instance where it found it.
Since only the HTML itself is raw-pulled, but not any other files linked within the HTML, any gifs, jpegs, etc will be broken, some JavaScript may be broken (depends how it's called/stored), and so forth.
If you wanted to save the entire page, INCLUDING all externally-linked resources, that would be a lot more code.
Additionally if you wanted to account for JavaScript-generated text, that would be a lot more code.
You will need to manually delete savedsite.html when you are finished (unless you wanted to keep it).

URL.openStream() is just SLOW -- so if you want to speed up the program, you'll need to replace that with some other means of opening/reading the website. This is nothing fixable unless you work for Oracle or something.

Hopefully this gives you some idea of where you want to start for improvements.

And yes it does work as-is. You just need to be patient because it's slow.

**SKG_Scintill** · 05-3-2012, 06:00 AM

Upon trying, it did find the text, but also gave an error:

Code:

Exception in thread "main" java.io.IOException: Cannot run program "firefox": CreateProcess error=2, The system cannot find the file specified
	at java.lang.ProcessBuilder.start(Unknown Source)
	at java.lang.Runtime.exec(Unknown Source)
	at java.lang.Runtime.exec(Unknown Source)
	at java.lang.Runtime.exec(Unknown Source)
	at URLReader.<init>(URLReader.java:30)
	at URLReader.main(URLReader.java:6)
Caused by: java.io.IOException: CreateProcess error=2, The system cannot find the file specified
	at java.lang.ProcessImpl.create(Native Method)
	at java.lang.ProcessImpl.<init>(Unknown Source)
	at java.lang.ProcessImpl.start(Unknown Source)
	... 6 more

**UserNameGoesHere** · 05-3-2012, 05:34 PM

Open up a command prompt. In the command prompt type

Code:

firefox -new-tab http://www.google.com

Does that work for you? The code assumes that works for you (It works for me).

Because it looks like your system can't find Firefox (which I assumed you were using as your browser.)

If you are using Firefox, you'll need to do one of two things then, if it doesn't work. You could find the full path to your firefox installation and replace the call using the full path. You could change your PATH environment variable to include Firefox's path prior to running the program.

Because it works perfectly fine as-is (although slow) on Ubuntu using GCJ and a default Firefox install. It should work perfectly fine as-is on Windows using Oracle's Java and a recentish Firefox install as well. It should also work perfectly fine as-is on a Mac OSX which has some version of Java and a Firefox install.

Assuming Windows, pretend you installed firefox to C:\Browsers\Firefox (probably not, but for example). Then you'd do something like (in a command prompt)

Code:

PATH=%PATH%;C:\Browsers

or wherever the firefox.exe binary is actually installed to. Then run the program from that same command prompt.

So the full sequence would look something like this

Code:

PATH=%PATH%;C:\wherevertheheckyouinstalledfirefox
javac URLReader.java
java URLReader

Wait a minute or two (since it's slow)
Results. :p

**SKG_Scintill** · 05-3-2012, 07:31 PM

It's working now, somehow...

I have to say, I honestly appreciate all the effort you've put into solving this.
The thing is, I mostly wanted to see if I overlooked a small mistake. It has come to a point where I can no longer understand the code xD
This code has now sidetracked to a point where I'm not really learning from my mistakes, but rather following a crash course in programming this one thing.
I think I'll leave it for what it is and continue my programming class regularly until I understand it myself :)

**UserNameGoesHere** · 05-3-2012, 08:26 PM

Ah, well at least you gave it a shot.

Once you have learned more and feel you can understand the code, there are some simple improvements that can be made to it (and some difficult ones).

Simple improvements left as an exercise for when you're more ready for it:
1: Have it get the desired word from the user instead of hardcode.
2: Use a regular expression on the word to allow it to accept variants (for example, both lower and upper case, first letter capitol, etc...)
3: Have it get the desired URL from the user instead of hardcode.
4: Error handling -- as-is of course there is no error handling but it's really better to include it.

Difficult improvements:
1: Have it pull down the entire website AND all linked resources. (gifs, jpegs, CSS files, etc...)
2: Have it properly parse JavaScript to find and pull additional resources that the JavaScript links, including recursively doing so for JavaScript linking JavaScript.
3: Have it properly parse JavaScript to see if the desired word is constructed on-the-fly via JavaScript.
4: Speed up the execution time.

And keep in mind that Java is just slow too. It's quite known for this. Java is not built for speed.

**SKG_Scintill** · 05-7-2012, 07:47 AM

"You gave it a shot" wouldn't do, so I gave it another couple of shots.

I started with the code in the OP and expanded from that.
First thing I noticed is that the InputStreamReader read the entire code of the site. Every time the in.readLine() put something inside my inputLine it was one line further down the entire code.
So unless the word was in the first line of the site code (which is <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"), it would refresh the site, instead of looking through the rest of the code.

There are two classes again.
This is the code that works for me and does what I want it to do:

Code:

import java.awt.*;
import java.awt.event.*;

public class Tabswitch {
	public static void main(String[] args) throws Exception{
		Robot r = new Robot();
		r.keyPress(KeyEvent.VK_ALT);
		r.keyPress(KeyEvent.VK_TAB);
		r.delay(100);
		r.keyRelease(KeyEvent.VK_ALT);
		r.keyRelease(KeyEvent.VK_TAB);
		URLReader a = new URLReader();
	}
}

--------------------------------------------------------------------------------------

import java.io.*;
import java.net.*;
import java.awt.*;
import java.awt.event.*;

public class URLReader{
	public URLReader() throws Exception{
		URL site = new URL("http://kaction.com/badfanfiction/");
		BufferedReader in = new BufferedReader(
				new InputStreamReader(site.openStream()));
		
		String inputLine;
		while((inputLine = in.readLine()) != null){
			if(inputLine.contains("</b> and")){
				if(inputLine.contains("Final")){
					System.exit(0);
				}
			}
			if(inputLine.contains("</b>.")){
				if(inputLine.contains("Final")){
					System.exit(0);
				}
				else{
					Robot s = new Robot();
					Thread.sleep(2000);
					s.keyPress(KeyEvent.VK_F5);
					s.keyRelease(KeyEvent.VK_F5);
					Thread.sleep(2000);
					URLReader a = new URLReader();
				}
			}
		}
	}
}

It's probably a very unorthodox way of programming, but it's a way I could understand it myself.

**UserNameGoesHere** · 05-7-2012, 08:32 AM

Here is what your code does. Note, your code assumes you already have Firefox open AND that you already have the desired URL page loaded once. It also assumes Alt Tabbing will get focus to Firefox.

Switch focus over to Firefox.
Open and read the site, looking for specific things, exiting if they are found.
If they're not found, refresh the page in Firefox and try again.

Problem is, since the page is dynamically-generated, the page your program parses and the page Firefox loads may not be the same. There is no connection between your Robot code (which scripts the refreshing of Firefox) and your parsing code.

What you have is known as a race condition. Basically, if it works, it's only lucky that it worked. You are banking on the two accesses (once by your program, once by Firefox) to be close enough together such that the randomizer on the site will "randomize" identical results. If the site based randomization on access rather than time, you would almost never get correct results.

The code I presented has no such race conditions and accesses the site only once per round, feeding the results directly into a new tab in firefox.

dAnceguy117 · 05-7-2012, 10:17 AM

I just gotta say, very nicely done, UserName. that's some comprehensive stuff on solving this problem. the code shows off some nice methods I've never taken a look at, too. *standing ovation*

05-7-2012, 10:17 AM	#28
dAnceguy117 new hand moves = dab Join Date: Dec 2002 Location: he/they Age: 33 Posts: 10,094	Re: [JAVA] Site Search Help I just gotta say, very nicely done, UserName. that's some comprehensive stuff on solving this problem. the code shows off some nice methods I've never taken a look at, too. standing ovation

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)