Friday, September 24, 2004

Searching for Math Equations and Symbols on the Web: Part 1

NB: I derive an almost perverse pleasure in torturing search engines. Please also note that I'd like to form this into a paper for publication so I'm calling dibs on my ideas (link, disagree, comment, expand upon ... but please don't copy and republish)

It came to me that I'm sure there are a ton of web pages out there with papers talking about div, curl, grad, summation, integration.... all these things best described by their equations and symbols. Mathematicians frequently solve and prove solutions to equations divorced from the applications of the equations to describe nature. Physicists develop new equations to describe nature and then try to solve them, engineers look up solutions to equations to solve real, immediate, application problems. So, how can physicists, engineers, and librarians search for math stuff on the internet if the writeups and the equations themselves do not use the language of the physical phenomena the equations describe? Or if the equations are not yet named and famous? Or are not recognizable as belonging to the physical phenomena?

You can search in Safari by code snippet and in chemistry databases (like the ChemIDplus from NLM) by substructure. (You can also try the ACM Portal with code snippets, but there's no real way to search the open internet for chemical structures.) That's pretty cool, but what about searching for a particular mathematical formula? In Inspec and MathSci Net, the subject headings are probably the best access points and there are rules for representing symbols (like /sub/ depending sometimes on the vendor) – but this doesn't help if the mathematical formula is not linked to the particular phenomenon you're studying. I discussed in an earlier post how there are various ways to represent math on the web. So, what happens when you search for ∑ or & sum; or & #8721; ? Maybe sum isn't the best choice because normally you'd have an n=a somewhere attached so I'll also search for the partial derivative symbol ∂ and the gradient ∇.
Google
Graphic symbol pasted in – does nothing, not even an error
Name code or entity (starting an & and end with a ; ) – returns mostly tables and posts on how to use MathML or the codes or various things from computer programming. For the partial derivative symbol (∂), Google returns nothing on the code (& #8706;) and returns non-math stuff for the entity with one exception - an improperly created abstract that actually shows the code, not the symbol.

Yahoo
Graphic symbol pasted in – zero results.
Name code or entity – Yahoo ignores the & and the ; and searches for Sum, or part, for the entity and the numbers for the code. You end up with phone numbers, part numbers, etc.

Teoma
Graphic symbol pasted in – lots of results, it's seeing them all as empty boxes, so it's finding non-romanized language pages.
Name code or entity – Teoma apparently ignores the & and the ; and searches for sum, or part, for the entity and the numbers for the code. Quotes don't help.

1 comment:

Stefan said...

Hi Christina

I 100% agree with you. I am extremely tempted to start a Search Engine project that attempts to find matches for equations. The trick is that any constants and symbols must be replaced by a placeholder. For example, if I am looking for "\int e^{-x^2} dx", it must also find stuff like "int e^(-a^2)" or whatever all the OpenOffice, latex, MathML, ... ways of writing this equation are. Are you by any chance aware of such a project (would not want to reinvent the wheel)?

I am currently looking for "int_0^t exp(-(x-a)^2/b^2) dx"...

No idea how to find this... although it must be related to the error function.