TheFreeSite.com!

Monday, July 28, 2008

Collections and iterators

If you don’t know how many objects you’re going to need to solve a particular problem, or
how long they will last, you also don’t know how to store those objects. How can you know
how much space to create for those objects? You can’t, since that information isn’t known
until run time.
The solution to most problems in object-oriented design seems flippant: you create another
type of object. The new type of object that solves this particular problem holds handles to
other objects. Of course, you can do the same thing with an array, which is available in most
languages. But there’s more. This new object, generally called a collection (also called a
container, but the AWT uses that term in a different sense so this book will use “collection”),
will expand itself whenever necessary to accommodate everything you place inside it. So you
don’t need to know how many objects you’re going to hold in a collection. Just create a
collection object and let it take care of the details.
Fortunately, a good OOP language comes with a set of collections as part of the package. In
C++, it’s the Standard Template Library (STL). Object Pascal has collections in its Visual
Component Library (VCL). Smalltalk has a very complete set of collections. Java also has
collections in its standard library. In some libraries, a generic collection is considered good
enough for all needs, and in others (C++ in particular) the library has different types of
collections for different needs: a vector for consistent access to all elements, and a linked list
for consistent insertion at all elements, for example, so you can choose the particular type
that fits your needs. These may include sets, queues, hash tables, trees, stacks, etc.
All collections have some way to put things in and get things out. The way that you place
something into a collection is fairly obvious. There’s a function called “push” or “add” or a
similar name. Fetching things out of a collection is not always as apparent; if it’s an arraylike
entity such as a vector, you might be able to use an indexing operator or function. But in
many situations this doesn’t make sense. Also, a single-selection function is restrictive. What
if you want to manipulate or compare a set of elements in the collection instead of just one?
The solution is an iterator, which is an object whose job is to select the elements within a
collection and present them to the user of the iterator. As a class, it also provides a level of
52 Thinking in Java www.BruceEckel.com
abstraction. This abstraction can be used to separate the details of the collection from the
code that’s accessing that collection. The collection, via the iterator, is abstracted to be simply
a sequence. The iterator allows you to traverse that sequence without worrying about the
underlying structure – that is, whether it’s a vector, a linked list, a stack or something else.
This gives you the flexibility to easily change the underlying data structure without
disturbing the code in your program. Java began (in version 1.0 and 1.1) with a standard
iterator, called Enumeration, for all of its collection classes. Java 1.2 has added a much
more complete collection library which contains an iterator called Iterator that does more
than the older Enumeration.
From the design standpoint, all you really want is a sequence that can be manipulated to
solve your problem. If a single type of sequence satisfied all of your needs, there’d be no
reason to have different kinds. There are two reasons that you need a choice of collections.
First, collections provide different types of interfaces and external behavior. A stack has a
different interface and behavior than that of a queue, which is different than that of a set or
a list. One of these might provide a more flexible solution to your problem than the other.
Second, different collections have different efficiencies for certain operations. The best
example is a vector and a list. Both are simple sequences that can have identical interfaces
and external behaviors. But certain operations can have radically different costs. Randomly
accessing elements in a vector is a constant-time operation; it takes the same amount of time
regardless of the element you select. However, in a linked list it is expensive to move through
the list to randomly select an element, and it takes longer to find an element if it is further
down the list. On the other hand, if you want to insert an element in the middle of a
sequence, it’s much cheaper in a list than in a vector. These and other operations have
different efficiencies depending upon the underlying structure of the sequence. In the design
phase, you might start with a list and, when tuning for performance, change to a vector.
Because of the abstraction via iterators, you can change from one to the other with minimal
impact on your code.
In the end, remember that a collection is only a storage cabinet to put objects in. If that
cabinet solves all of your needs, it doesn’t really matter how it is implemented (a basic
concept with most types of objects). If you’re working in a programming environment that
has built-in overhead due to other factors (running under Windows, for example, or the cost
of a garbage collector), then the cost difference between a vector and a linked list might not
matter. You might need only one type of sequence. You can even imagine the “perfect”
collection abstraction, which can automatically change its underlying implementation
according to the way it is used.

0 comments:

Tell Me Doubts In Java