Welcome to Techtadka
Web Crawler in Java
A very basic webcrawler source code in java:-
The source code for the webcrawler for getting stated with the web parsing projects.
import java.io.InputStream;
import java.net.URL;
import java.net.URLConnection;
public class Crawler
{
public static void main(String argv[]) {
URL url = null;
try {
url = new URL("http://www.google.com");
URLConnection urlConnection = url.openConnection();
urlConnection.setAllowUserInteraction(false);
InputStream urlStream = url.openStream();
//urlConnection.guessContentTypeFromStream(urlStream);
byte b[] = new byte[4];
int numRead = urlStream.read(b);
String content = new String(b, 0, numRead);
while (numRead != -1)
{
numRead = urlStream.read(b);
if (numRead != -1)
{
String newContent = new String(b, 0, numRead);
content += newContent;
}
}
urlStream.close();
System.out.println(content);
} catch (Exception e) {
e.printStackTrace();
}
}
}
In this code a url is crawled and the source code of the webpage(http://www.google.com) is stored in the string and then it is displayed by the system.out.println()








