tech-userlevel archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
[GSoC 2011] [Status Report] Apropos Replacement
Hello NetBSD!!
The official coding period of GSoC 2011 has ended, therefore I am writing a
final status report on the progress of the project. I will try to summarise
what were the initial goals of the project and what goals have been achieved by
this deadline.
1. OBJECTIVE: The main objective of the project was to develop a
replacement tool
for apropos(1) which would provide a better search experience. We
often encounter
situations where we are faced with a problem whose solution is easily answered
somewhere in some man page but due to the lack of a search tool, we either turn
towards Google or seek the advice of an expert. The aim of this project was to
try to develop such a search tool, which would point the user towards
the solution.
2. DELIVERABLES PROPOSED & DELIVERED:
1. A utility for parsing and indexing the man pages. (makemandb.c)
2. A utility for searching the index thus created. (apropos.c_
3. A ranking algorithm to find more relevant results.
3. A mechanism to update the index when new man pages are installed or
old
ones are removed.
4. Using the database to manage the man page aliases.
5. A library like interface to built applications on top of it.
6. Documentation in the form of man pages.
3. DELIVERABLES PROPOSED & NOT DELIVERED: :
1. I proposed to provide line number or references to specific sections
of
the man pages in the search results but at the time of
implementation it
did not seem trivial.
2. A CGI based interface: I did not have enough time left at the end to
try
this out. Although the ground work for this work has been done in
the form
of a library like interface and a function run_query_html() which
provides
the search results in the form of HTML fragment. So it should be
trivial
to write a CGI application to perform the searches from a web
browser.
4. DETAILS ABOUT THE DELIVERABLES PRODUCED
There are two command line utilities 'makemandb' and 'apropos'. You
would
first need to build the Full Text Search (FTS) Index using
makemandb(1) and then
you can use apropos(1) (the one provided by this project) to
perform searches.
4.1 makemandb: Simply running makemandb will build the FTS index
and tell you
the number of pages indexed. Some of the pages might not get indexed on
the way which will be indicated by error messages on the screen but
nothing to worry about that.
NOTE: The default behavior of makemandb is incremental updation. That is to
say it will try to add only those pages to the index which it did not
have previously and also it will remove those pages from the
index which
are no more on the file system. Of course if there is no existing index
it will build it from scratch.
makemandb supports following options:
[-f]: The option 'f' will tell makemandb(1) to prune the existing index
(if there exists one) and rebuild the database from scratch.
[-l]: The option 'l' will tell makemand(1) to limit the
indexing to only
the NAME section of the man pages. This option can be used to mimic the
behavior of the "classical apropos" although with improved search
capabilities. This option might be useful if you want to save few MB of
disk space.
[-o]: The option 'o' is for optimizing the index. makemand(1) will try
to optimize the FTS index for faster search performance and
also it will
optimize the storage of the data to optimize disk space usage.
makemandb also builds and maintains an aliases table for
managing the man
page aliases which are scattered through the file system in the form of
symlinks or hardlinks. I have provided a patch to man.c so that man(1)
looks up this table to identify the target page which it needs
to render.
Thus, it should be possible to get rid of these symlinks and hardlinks.
4.2 apropos: Once you have built the database you can fire apropos(1) and
pass a query to do a search. For example:
$apropos "add a new user"
apropos supports following options:
[-1234569]: You can pass section numbers as options to apropos which
will make apropos to search only within the specified set of sections.
[-p]: By default apropos(1) will display the top 10 ranked results on
stdout. So if you would like to see more results then use 'p'. It will
allow apropos(1) to display all the results and also it will pipe the
results to a pager (more(1)).
5. OTHER DELIVERABLES:
Besides the two command line tools, I have also developed a very small
library to allow and build a search application on top of the FTS
index built
by makemandb. It has following public functions:
4.1 init_db(): To initialize a connection to the database. It takes care of
registering some custom functions with the connection, and also it will
recreate the database schema in case the database file does not exist and
you provided the right flags.
4.2 run_query(): To run a query as entered by the user and process the rows
obtained in a callback function (apropos.c uses it).
4.3 run_query_html(): Similar to run_query() but it formats the results
obtained in the form of an HTML fragment. This can be used to build a CGI
application to do searches from a browser.
4.4 run_query_pager(): Similar to run_query_html but it formats the results
so that the matching text appears highlighted when piped to a pager.
apropos.c uses it when the -p option is specified.
4.5 close_db(): To close the database connection and release any resources.
For more detailed documentation you can read up the man pages of the individual
components.
6. REQUIREMENTS FOR BUILDING & RUNNING:
Following are the requirements for building and running it on NetBSD:
2.1 -CURRENT version of NetBSD (or at least -CURRENT man pages and -CURRENT
version of man(1) ).
2.2 libmandoc from mdocml.
7. SCREENSHOTS:
I uploaded some screenshots of the output on my blog. Here are the
links:
http://4.bp.blogspot.com/-q5uy81DqUmE/TlPFTdweyXI/AAAAAAAACDw/Du06YrCBnEQ/s1600/add-user.png
http://3.bp.blogspot.com/-nj0SRZVZ0HU/TlPFc46KbrI/AAAAAAAACD0/D7vaaR4wuy0/s1600/password-hash.png
http://3.bp.blogspot.com/-lt0chLf9TjU/TlPFmwLo1vI/AAAAAAAACD4/F_Xhen1L5Rw/s1600/psignal.png
http://2.bp.blogspot.com/-VLnGy27-ecw/TlPF3zj40wI/AAAAAAAACD8/pWQqYHm1dZ8/s1600/log.png
http://2.bp.blogspot.com/-HS7eDup9B-w/TlPGF4IH2aI/AAAAAAAACEA/oieShZiX_co/s1600/realtek.png
8. ACKNOWLEDGEMENTS:
I owe a big chunk of the success to my mentor Jörg Sonnenberger who was
always
there to answer my questions, offer advice and review the code. I have
learnt
a great deal from him and I am sure I have improved as a programmer.
The best
thing about working with him was that he never really disclosed the
solution,
instead he gently guided towards the direction of the solution, so I
never
lost a learning opportunity :-)
David Young also offered valuable guidance during the project. He
provided some
clever insights and tips to improve the search and ranking of the
results.
I decided to decompose the database into more columns based on different
sections in a man page based on his idea only.
Thanks to Kristaps Dzonsons as well who is responsible for the mdocml
project.
He also reviewed the code related to parsing of the pages and pointed
out bugs
in the code. I implemented makemandb based on his utility "mandocdb",
so that
was also a huge help.
Special thanks goes to Thomas Klausner for reviewing the man pages I
wrote
and also proving patches for the errors/mistakes I had made in them.
I must also thank Julio Merino, Jan Schaumann, Jukka Ruohonen,
S.P.Zeidler
for the interest they showed in the project and offered help throughout
:-)
And thanks to lots of other people in the community as well whose names
I
forgot to mention. It was encouraging to see responses to each status
report
I made and kept me excited.
9. WHAT NEXT ?
I thoroughly enjoyed my experience while working on this project. I
would definitely like to continue working in the NetBSD community, in
fact I
was discussing with Joerg about some of the projects I could work on. I
have
interest in systems programming but not enough knowledge, but I don't
mind
learning ;-)
Thanks for reading this far :-)
--
Abhinav
http://abhinav-upadhyay.blogspot.com/2011/08/final-report-netbsd-gsoc-2011-apropos.html
Home |
Main Index |
Thread Index |
Old Index