CS-151 Labs > Lab 10. Graph traversals
Part 1. Program Menu & Reading Files
You will be writing a class called BaconNumber that will read a data file and allow you to interactively query the system for the Bacon Number and path for any actor in the database. The program should require a single argument which is the filename containing the information on people and the roles they played in a movie. An optional second argument can be used to specify the initial center.
If the filename argument begins with “http:” you should treat it as an URL and
read the file from the network. This will enable you to play the game without
having to download the entire file.
To open a Scanner from an URL, put the statement import java.net.*;
at the top of your file, then do something similar to the following:
Scanner s = new Scanner(new URL("http://www.cs.oberlin.edu/").openStream());
After reading in the data file, the program should then prompt the user for
commands until the user enters CTRL-D
. You will use a scanner to read in
user input from standard in. CTRL-D
is the end of file character, and will
cause hasNextLine()
to return false
.
Sample arguments
imdb.full.txt
- plays the game with the full data set centered at “Kevin Bacon (I)”
imdb.pre1950.txt "Bela Lugosi"
- plays the game with the center set to “Bela Lugosi”
http://www.cs.oberlin.edu/~gr151/imdb/imdb.no-tv-v.txt
- plays the game with the no TV/V data set centered at “Kevin Bacon (I)”
File Format
The movie data file contains information on what movies a performer appears in. Every line contains information on one person appearing in one movie. The lines are formatted as follows:
<performer name>|<movie title>
The vertical pipe character |
can be used to determine where the name ends and the title begins. There will only be one |
on a line and there are no empty names or titles. java.lang.String
has a number of methods that can be used to divide up the line (e.g., split("\\|")
).
I have supplied several data files of varying sizes for you to work with. (Don’t download them to your CS account, see below.)
- imdb.cslam.txt
- an 11 line file to test your program on.
- imdb.small.txt
- a 1817 line file with just a handful of performers (161), fully connected
- imdb.top250.txt
- a 14339 line file listing just the top 250 movies on IMDB. (Disconnected groups of foreign films.)
- imdb.pre1950.txt
- a 1014465 line file with movies made before 1950
- imdb.post1950.txt
- a 8159857 line file with the movies made after 1950
- imdb.only-tv-v.txt
- a 2302907 line file with only made for TV and direct to video movies
- imdb.no-tv-v.txt
- a 6871415 line file without the made for TV and direct to video movies (best for the canonical Kevin Bacon game)
- imdb.afi100.txt
- American Film Institute’s “100 Years…100 Movies”
- imdb.noir.txt
- noir films
- imdb.post2000.txt
- movies after 2000
- imdb.pre2000.txt
- movies before 2000
- imdb.full.txt
- all 9174322 lines of IMDB for you to search through
Rather than cluttering up your hard drive with these files, you can use the
links above for URLs. Other than the small database, you may
need to increase the amount of memory that Eclipse uses.
To do this, got the “Run Configurations” menu. In the “Arguments” tab, there is a box for “VM Arguments.”
Enter -Xmx4g
to increase the memory allocation to 4 GB. You won’t need more than 4 GB.